![Page 1: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/1.jpg)
Watching Pigs Fly with the Netflix Hadoop Toolkit
Hadoop Summit 2013San Jose, CA
![Page 2: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/2.jpg)
Data should be accessible, easy to discover, and easy to process for everyone.
Our Motivation
![Page 3: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/3.jpg)
Our Users
Analysts Engineers
![Page 4: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/4.jpg)
Hadoop Platform as a Service
![Page 5: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/5.jpg)
Hadoop Platform as a Service
S3
![Page 6: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/6.jpg)
Hadoop Platform as a ServiceData Platform
![Page 7: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/7.jpg)
Data Platform as a Service
Franklin(Metadata API)
Sting(Adhoc Visualization)
Forklift (Data Movement)
Looper(Backloading)
Ignite(A/B Test Analytics)
Spock(Data Auditing)
Genie(Hadoop PaaS)
Lipstick(Pig Workflow Visualization)
Event Service(Orchestration)
Hadoop
S3
Other Processing
![Page 8: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/8.jpg)
Let’s solve a problem using the data!
![Page 9: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/9.jpg)
Build a recommender.
![Page 10: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/10.jpg)
But, what makes good recommendations?Similarity
Personalization
![Page 11: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/11.jpg)
COLORS!
![Page 12: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/12.jpg)
COLORS!Box art is colorful…
![Page 13: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/13.jpg)
We’re Sorry
COLORS!Box art is colorful…
![Page 14: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/14.jpg)
Where can I find the data?
![Page 15: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/15.jpg)
Hadoop Platform as a Service
S3
![Page 16: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/16.jpg)
Hadoop Platform as a Service
S3Cassandra TeradataRedshiftRDS
![Page 17: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/17.jpg)
Data Platform as a Service
Franklin(Metadata API)
S3Cassandra TeradataRedshiftRDS
![Page 18: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/18.jpg)
Data Platform as a Service
Franklin(Metadata API)
![Page 19: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/19.jpg)
Create a dataset for box art and color.
![Page 20: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/20.jpg)
Whether your dataset is large or small, being able to visualize it makes it easier to explain.
![Page 21: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/21.jpg)
Data Platform as a Service
Franklin(Metadata API)
Sting(Adhoc Visualization)
![Page 22: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/22.jpg)
Sting
• Allows users to cache the results of a genie job in memory
• Sub second response to OLAP style operations (slicing, dicing, aggregations).
• Adhoc / recurring schedule• Easy to use!
![Page 23: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/23.jpg)
HiveQuery
Schema
![Page 24: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/24.jpg)
% Content Consumed / Hour
![Page 25: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/25.jpg)
HemlockGrove
House ofCards
ArrestedDevelopment
![Page 26: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/26.jpg)
Similarity
![Page 27: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/27.jpg)
![Page 28: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/28.jpg)
![Page 29: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/29.jpg)
House ofCards Macbeth
![Page 30: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/30.jpg)
![Page 31: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/31.jpg)
![Page 32: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/32.jpg)
Toddlers& Tiaras
Star Trek:Voyager
![Page 33: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/33.jpg)
Personalization
![Page 34: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/34.jpg)
# of subscribers X # of titles = ???,000,…,000 (big data)
Big Data
![Page 35: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/35.jpg)
Netflix Apache Pig
![Page 36: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/36.jpg)
![Page 37: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/37.jpg)
Lipstick
Data Platform as a Service
Franklin(Metadata API)
Sting(Adhoc Visualization)
![Page 38: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/38.jpg)
Lipstick
• Allows users to visualize their data flow• Allows users to see common errors• Allows users to easily monitor their jobs• Empowers users to support themselves• Facilitates communication between
infrastructure team and users
![Page 39: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/39.jpg)
Lipstick
![Page 40: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/40.jpg)
Overall JobProgress
![Page 41: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/41.jpg)
LogicalPlan
Overall JobProgress
![Page 42: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/42.jpg)
Logical Operator(reduce side)
Logical Operator(map side)
Map/Reduce Job
Intermediate Row Count
RecordsLoaded
![Page 43: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/43.jpg)
HadoopCounters
![Page 44: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/44.jpg)
My Job has stalled.
Common Problem #1
![Page 45: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/45.jpg)
![Page 46: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/46.jpg)
Unoptimized/OptimizedLogical Plan Toggle
Dangling Operator
![Page 47: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/47.jpg)
I didn’t get the data I was expecting
Common Problem #2
![Page 48: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/48.jpg)
![Page 49: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/49.jpg)
![Page 50: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/50.jpg)
I don’t understand why my job failed.
Common Problem #3
![Page 51: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/51.jpg)
Failed Job(light red background)
Successful Job(light blue background)
![Page 52: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/52.jpg)
![Page 53: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/53.jpg)
Wrapping up
• Demos at the Netflix booth in the exhibit hall (see more Lipstick, Sting, and Genie).
• Lipstick is part of Netflix OSS.• Clone it on github at http:
//github.com/Netflix/Lipstick• We welcome feedback and contributions!
![Page 54: Watching Pigs Fly with the Netflix Hadoop Toolkit](https://reader036.vdocuments.site/reader036/viewer/2022062406/5589266ad8b42ade2f8b468c/html5/thumbnails/54.jpg)
Charles Smith: [email protected] Jeff Magnusson: [email protected]
Thank you!
Jobs: http://jobs.netflix.comNetflix OSS: http://netflix.github.io
Tech Blog: http://techblog.netflix.com/