big data , we have a communication problem
DESCRIPTION
BIG DATA , We have a communication problem. GINORMOUS SYSTEMS April 30–May 1, 2013 Washington, D.C. Daniel Tunkelang Head of Query Understanding, LinkedIn. BIG DATA IS EVERYWHERE. BIG DATA POWERS EVERYTHING. DATA SCIENTISTS WORRY ABOUT VOLUME, VELOCITY, VARIETY, …. BUT THE BOTTLENECK - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: BIG DATA , We have a communication problem](https://reader035.vdocuments.site/reader035/viewer/2022062517/568135b9550346895d9d1f3a/html5/thumbnails/1.jpg)
BIG DATA,We have a communication problem.
GINORMOUS SYSTEMSApril 30–May 1, 2013Washington, D.C.
Daniel TunkelangHead of Query Understanding, LinkedIn
![Page 2: BIG DATA , We have a communication problem](https://reader035.vdocuments.site/reader035/viewer/2022062517/568135b9550346895d9d1f3a/html5/thumbnails/2.jpg)
BIG DATA IS EVERYWHERE
![Page 3: BIG DATA , We have a communication problem](https://reader035.vdocuments.site/reader035/viewer/2022062517/568135b9550346895d9d1f3a/html5/thumbnails/3.jpg)
BIG DATA POWERS EVERYTHING
![Page 4: BIG DATA , We have a communication problem](https://reader035.vdocuments.site/reader035/viewer/2022062517/568135b9550346895d9d1f3a/html5/thumbnails/4.jpg)
DATA SCIENTISTS WORRY ABOUTVOLUME, VELOCITY, VARIETY, …
![Page 5: BIG DATA , We have a communication problem](https://reader035.vdocuments.site/reader035/viewer/2022062517/568135b9550346895d9d1f3a/html5/thumbnails/5.jpg)
BUT THE BOTTLENECKISN’T COMPUTATIONAL
IT’S COGNITIVE
![Page 6: BIG DATA , We have a communication problem](https://reader035.vdocuments.site/reader035/viewer/2022062517/568135b9550346895d9d1f3a/html5/thumbnails/6.jpg)
TOOLS AUGMENTHUMAN INTELLECT
BIG DATA IS A TOOL
Doug Engelbart, inventor ofthe mouse, hypertext, etc.
![Page 7: BIG DATA , We have a communication problem](https://reader035.vdocuments.site/reader035/viewer/2022062517/568135b9550346895d9d1f3a/html5/thumbnails/7.jpg)
NOT EVERYONE SUBSCRIBESTO THIS POINT OF VIEW…
Claudia Perlich, Chief Scientist of media6degrees, speaking atTTI/Vanguard 2012 Conference on Understanding Understanding:
![Page 8: BIG DATA , We have a communication problem](https://reader035.vdocuments.site/reader035/viewer/2022062517/568135b9550346895d9d1f3a/html5/thumbnails/8.jpg)
SHE HAS A POINT
![Page 9: BIG DATA , We have a communication problem](https://reader035.vdocuments.site/reader035/viewer/2022062517/568135b9550346895d9d1f3a/html5/thumbnails/9.jpg)
BUT PREDICTIVE MODELINGIS NOT ENOUGH
![Page 10: BIG DATA , We have a communication problem](https://reader035.vdocuments.site/reader035/viewer/2022062517/568135b9550346895d9d1f3a/html5/thumbnails/10.jpg)
TRAININGDATA?
OBJECTIVEFUNCTION?
![Page 11: BIG DATA , We have a communication problem](https://reader035.vdocuments.site/reader035/viewer/2022062517/568135b9550346895d9d1f3a/html5/thumbnails/11.jpg)
WE NEED APEOPLE-CENTRICAPPROACH TOBIG DATA
INTERPRETABILITYINTERACTION
INSIGHT
![Page 12: BIG DATA , We have a communication problem](https://reader035.vdocuments.site/reader035/viewer/2022062517/568135b9550346895d9d1f3a/html5/thumbnails/12.jpg)
LET’S START WITHINTERPRETABILITY
![Page 13: BIG DATA , We have a communication problem](https://reader035.vdocuments.site/reader035/viewer/2022062517/568135b9550346895d9d1f3a/html5/thumbnails/13.jpg)
EXAMPLE:SVMvs.
DECISION TREE
![Page 14: BIG DATA , We have a communication problem](https://reader035.vdocuments.site/reader035/viewer/2022062517/568135b9550346895d9d1f3a/html5/thumbnails/14.jpg)
DECISION TREES HAVE FLAWS…
DISCRETE
![Page 15: BIG DATA , We have a communication problem](https://reader035.vdocuments.site/reader035/viewer/2022062517/568135b9550346895d9d1f3a/html5/thumbnails/15.jpg)
BUT THEYCOMMUNICATE
(if they’re shallow)
early splits provide big picture…
fat leaves guidefeature engineering
…or reveal training data problems
![Page 16: BIG DATA , We have a communication problem](https://reader035.vdocuments.site/reader035/viewer/2022062517/568135b9550346895d9d1f3a/html5/thumbnails/16.jpg)
WHI
CHSUPPORTS
ITERATION
![Page 17: BIG DATA , We have a communication problem](https://reader035.vdocuments.site/reader035/viewer/2022062517/568135b9550346895d9d1f3a/html5/thumbnails/17.jpg)
INTERPRETABILITY DELIVERS
Key search leader favors rule-based approach for key scoring algorithms.
Replaced regression with decision tree in local search model: gained accuracy and insight.
Using trees to recognize spam, analyze search abandonment, model / quantify social proof.
![Page 18: BIG DATA , We have a communication problem](https://reader035.vdocuments.site/reader035/viewer/2022062517/568135b9550346895d9d1f3a/html5/thumbnails/18.jpg)
GO DEEP vs INTERPRETABILITY
A KEY DATA SCIENCE TRADE-OFF
![Page 19: BIG DATA , We have a communication problem](https://reader035.vdocuments.site/reader035/viewer/2022062517/568135b9550346895d9d1f3a/html5/thumbnails/19.jpg)
ON TOINTERACTION
![Page 20: BIG DATA , We have a communication problem](https://reader035.vdocuments.site/reader035/viewer/2022062517/568135b9550346895d9d1f3a/html5/thumbnails/20.jpg)
![Page 21: BIG DATA , We have a communication problem](https://reader035.vdocuments.site/reader035/viewer/2022062517/568135b9550346895d9d1f3a/html5/thumbnails/21.jpg)
DON’T OVERPAY FOR PRECISION
![Page 22: BIG DATA , We have a communication problem](https://reader035.vdocuments.site/reader035/viewer/2022062517/568135b9550346895d9d1f3a/html5/thumbnails/22.jpg)
BE FAST, CHEAP, AND 98% RIGHT
http://metamarkets.com/2012/fast-cheap-and-98-right-cardinality-estimation-for-big-data/
![Page 23: BIG DATA , We have a communication problem](https://reader035.vdocuments.site/reader035/viewer/2022062517/568135b9550346895d9d1f3a/html5/thumbnails/23.jpg)
ARE PEOPLE THAT IMPATIENT?
tolerable wait time for web users
0.1s increase in latency significantly reduces # of searches, ad revenue
tl;dr: YES
![Page 24: BIG DATA , We have a communication problem](https://reader035.vdocuments.site/reader035/viewer/2022062517/568135b9550346895d9d1f3a/html5/thumbnails/24.jpg)
IMPATIENCE IS GOODSPEED MATTERS
![Page 25: BIG DATA , We have a communication problem](https://reader035.vdocuments.site/reader035/viewer/2022062517/568135b9550346895d9d1f3a/html5/thumbnails/25.jpg)
INSIGHT
![Page 26: BIG DATA , We have a communication problem](https://reader035.vdocuments.site/reader035/viewer/2022062517/568135b9550346895d9d1f3a/html5/thumbnails/26.jpg)
http://blog.takejune.com/archives/52334044.html
![Page 27: BIG DATA , We have a communication problem](https://reader035.vdocuments.site/reader035/viewer/2022062517/568135b9550346895d9d1f3a/html5/thumbnails/27.jpg)
BE TRENDY AND NORMALIZE
vs
![Page 28: BIG DATA , We have a communication problem](https://reader035.vdocuments.site/reader035/viewer/2022062517/568135b9550346895d9d1f3a/html5/thumbnails/28.jpg)
Sept. 11thAbu Ghraib
Weapons Inspectors
SOLVE FOR INTERESTINGNESS
![Page 29: BIG DATA , We have a communication problem](https://reader035.vdocuments.site/reader035/viewer/2022062517/568135b9550346895d9d1f3a/html5/thumbnails/29.jpg)
COMPUTE POTENTIAL INSIGHTS
APPLY HUMAN INTUITION
![Page 30: BIG DATA , We have a communication problem](https://reader035.vdocuments.site/reader035/viewer/2022062517/568135b9550346895d9d1f3a/html5/thumbnails/30.jpg)
SUMMARY: Let’s have a conversation with Big Data.
INTERPRETABILITYINTERACTION
INSIGHT
![Page 31: BIG DATA , We have a communication problem](https://reader035.vdocuments.site/reader035/viewer/2022062517/568135b9550346895d9d1f3a/html5/thumbnails/31.jpg)