Data Science, Past & Future

9:55 - 10:40 | Friday, May 24

This talk explores big themes — the challenges in industry, the advances in technology — which brought us to this point. Origins of our field followed a simple formula at nearly every step along the way. We can use that formula as a lens to understand changes that are emerging.

Looking through six decades since Tukey first described “data analytics”, challenges and advances have often upset the status quo. Hardware capabilities evolved in dramatic leaps. Software layers provided new kinds of control systems and conceptual abstractions in response. These manifested as surges in data rates and compute resources. Then industry teams applied increasingly advanced mathematics to solve for novel business cases. That’s the formula.

For example, as spinny disks gave way to SSDs and commodity CPUs became multicore, Hadoop use cases gave way to Spark which fit the hardware better. Cluster computing workloads that had been ETL or clickstream, gave way to more more complex math used for recommender systems, anti-fraud, anti-churn, and other advanced predictive analytics.

We’re at a point now where more of the predictive analytics are moving to Python or R in-memory processing (Arrow), while more advanced workloads such as deep learning take over the clusters. On the horizon, even more complex use cases such as knowledge graph work will be consuming what new hardware provides.

Let’s examine this “lens” into how our field evolves. Also keep in mind that growing security threats and increasingly complex regulatory requirements drive from the top, placing even more premium on novel business cases. We’ll look through the trends in history, leading up to now, consider examples from this conference, then look at what’s on the horizon.

Paco Nathan
Managing Partner, Derwen, Inc.

Known as a “player/coach”, with core expertise in data science, natural language processing, machine learning, cloud computing; 35+ years tech industry experience, ranging from Bell Labs to early-stage start-ups. Co-chair Rev. Advisor for Amplify Partners, Deep Learning Analytics, Recognai, Data Spartan. Recent roles: Director, Learning Group @ O’Reilly Media; Director, Community Evangelism @ Databricks and Apache Spark. Cited in 2015 as one of the Top 30 People in Big Data and Analytics by Innovation Enterprise.