Building & Scaling Data Science Infrastructure as a Service

11:50 - 12:20 | Thursday, May 23

At LinkedIn, we have been working on the next generation of our data science infrastructure as a service. This talk will describe our journey to build a centralized infrastructure platform that scales to hundreds of users, thousands of metrics and supports applications ranging from simple dashboarding to experimentation and anomaly detection. We will discuss how a combination of technology and processes has allowed us to scale our user base while preserving trust in our data and metrics. We will also cover in detail how our platform solves challenges such as:

(a) Efficiently managing operations of thousands of data pipelines, data dependencies and SLA’s

(b) Building and managing data constructs across streaming and batch

(c) Enforcing and maintaining data craftsmanship

(d) Preserving trust, governance, and consistency across data science ecosystem

Ameya Kanitkar
Engineering Manager, Data Science Infrastructure, LinkedIn

Ameya Kanitkar is an engineering leader at LinkedIn where he manages multiple teams that work on infrastructure and platforms to support LinkedIn’s data science needs. These platforms have helped scale data science to hundreds of engineers while preserving trust in data. Earlier, he oversaw various Hadoop open source initiatives such as Azkaban and Dr. Elephant - both are in the process of Apache Incubation. Prior to LinkedIn, he worked on large scale relevance infrastructure at Groupon. Ameya holds advanced degrees from Carnegie Mellon University and Pune University.