how to build consistent and scalable workspaces for data science teams

Post on 22-Jan-2018

219 Views

Category:

Software

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

How to build consistent, scalable workspaces for data science teams

Elaine Lee

Data science is hard. Doing data science is even harder.

Ensuring enough resourcesManaging dependencies

http://www.seriouseats.com/assets_c/2014/06/20140525-294370-best-deep-dish-pizza-art-of-pizza-primary-thumb-1500xauto-404176.jpghttps://s-media-cache-ak0.pinimg.com/736x/91/6b/f0/916bf0f23660fc7019353800668060af.jpg

Nail it down

Identify system requirements for base Docker imageStabilize dependencies for data science work environment Increase test coverageGet continuous integration (CI) platform on the same page

Scale it up

Create a pool of worker machines ready to accept jobsSet up an asynchronous task queueProvide a simple command line interface for data scientists

Putting it all together

Pull changes Start Docker container

Run test suite Report Pass/Fail Export image for commit

Commit pushed to Github

Report resultGet image for commit

Start container from image

Run task

Request arrives in queue

workers

123abc…123abc…

123abc…123abc…

s3

Benefits

Flexible to any composition of EC2 instances-Extensible to EMR

Task environment guaranteed-Isolated from other tasks-Identical to conditions at time of development

One-time configuration-EC2 AMI

Extensible command line interface-R interface-Cluster management-Job monitoring

Use case: Quality assurance

CI testing

Other tests- Data validation

- Model consistency

http://img.pandawhale.com/post-52368-thanks-obama-making-sandwich-m-whnc.jpeg

Use case: Parallelizable tasks

Data manipulation- Feature engineering

Model builds- Advanced machine learning algorithms

- Hyperparameter search

https://pbs.twimg.com/media/Buw8Bz6IIAAxgxg.png

Elaine LeeData Engineer

elaine@elaineklee.com@elaineklee

avant.com

Elaine LeeData Engineer

elaine@elaineklee.com@elaineklee

avant.com

top related