We’re told data science is the key to unlocking the value in big data, but nobody seems to agree just what it is. Is it engineering, statistics. . .both? David Donoho’s “50 Years of Data Science”, which is itself a survey of Tukey’s “Future of Data Analysis”, offers one of the best criticisms of the hype around data science from a statistics perspective, arguing that data science is not new (if it’s anything at all) and calling statistics to action (again) to take back the field with a more practical, modern view of what it means to teach statistics and data science.
Drawing on his blog post, Sean Owen responds, offering counterpoints from an engineer, in search of a better understanding of how to teach and practice data science in 2017. Sean explores some key points in the history of data science from the past 50 years in order to build up a more complete view of how data science sprung out of statistics and merged with computer engineering and concludes by comparing Donoho’s view of what it means to build data science capability with one taken from the experience organizations doing so in the context of Apache Hadoop, Spark, and other big data tools.