![Page 1: UC#BERKELEY# - University of California, Los Angelescadlab.cs.ucla.edu/expeditions_pi_meeting/slides/MichaelFranklin... · • 700+ member Meetup group! • Best Paper Awards:](https://reader031.vdocuments.site/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e3554/html5/thumbnails/1.jpg)
Making Sense at Scale with Algorithms, Machines & People!
PI: Michael Franklin!University of California, Berkeley!
!Expeditions in Computing PI Meeting!
May 15, 2013!
UC BERKELEY
![Page 2: UC#BERKELEY# - University of California, Los Angelescadlab.cs.ucla.edu/expeditions_pi_meeting/slides/MichaelFranklin... · • 700+ member Meetup group! • Best Paper Awards:](https://reader031.vdocuments.site/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e3554/html5/thumbnails/2.jpg)
The Berkeley AMPLab!
2
![Page 3: UC#BERKELEY# - University of California, Los Angelescadlab.cs.ucla.edu/expeditions_pi_meeting/slides/MichaelFranklin... · • 700+ member Meetup group! • Best Paper Awards:](https://reader031.vdocuments.site/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e3554/html5/thumbnails/3.jpg)
It’s All Happening On-‐line Every: Click Ad impression Billing event Fast Forward, pause,… Friend Request Transaction Network message Fault …
User Generated (Web & Mobile)
…..
Internet of Things / M2M ScienCfic CompuCng
Sources Driving Big Data!
![Page 4: UC#BERKELEY# - University of California, Los Angelescadlab.cs.ucla.edu/expeditions_pi_meeting/slides/MichaelFranklin... · • 700+ member Meetup group! • Best Paper Awards:](https://reader031.vdocuments.site/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e3554/html5/thumbnails/4.jpg)
Challenge 1: Data is Big!Projected Growth
Increase over 2
010
0
10
20
30
40
50
60
2010 2011 2012 2013 2014 2015
Moore's Law Overall Data Par8cle Accel. DNA Sequencers
Data Grows faster than Moore’s Law [IDC report, Kathy Yelick, LBNL]
![Page 5: UC#BERKELEY# - University of California, Los Angelescadlab.cs.ucla.edu/expeditions_pi_meeting/slides/MichaelFranklin... · • 700+ member Meetup group! • Best Paper Awards:](https://reader031.vdocuments.site/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e3554/html5/thumbnails/5.jpg)
Challenge 2: Data is Dirty!
• Variety of diverse sources!• Uncurated!• No schema !• Inconsistent syntax and semantics!
Dirty Data worse than Big Data
![Page 6: UC#BERKELEY# - University of California, Los Angelescadlab.cs.ucla.edu/expeditions_pi_meeting/slides/MichaelFranklin... · • 700+ member Meetup group! • Best Paper Awards:](https://reader031.vdocuments.site/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e3554/html5/thumbnails/6.jpg)
Challenge 3: Complex Questions!
• Hard questions!– What is the impact on traffic and
home prices of building a new on-ramp?!
• Detect real-time events!– Is there a cyber attack going on?!
• Open-ended questions !– How many supernovae happened
last year?!
![Page 7: UC#BERKELEY# - University of California, Los Angelescadlab.cs.ucla.edu/expeditions_pi_meeting/slides/MichaelFranklin... · • 700+ member Meetup group! • Best Paper Awards:](https://reader031.vdocuments.site/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e3554/html5/thumbnails/7.jpg)
Our Vision: A Necessary Synergy!lgorithms achines eople
Challenge 1: Data is Big ✔ ✔
Challenge 3: Ques8ons are complex
✔ ✔ ✔
Challenge 2: Data is Dirty ✔ ✔ ✔
![Page 8: UC#BERKELEY# - University of California, Los Angelescadlab.cs.ucla.edu/expeditions_pi_meeting/slides/MichaelFranklin... · • 700+ member Meetup group! • Best Paper Awards:](https://reader031.vdocuments.site/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e3554/html5/thumbnails/8.jpg)
The AMPLab Big Bets!• Traditional intellectual borders hinder “Big Data” stacks!
– Need Machine Learning/Systems/Database Co-Design!– Requires Cohabitation and Real Collaboration!
• Now is a unique opportunity to rethink fundamental design points:!– Changing Latency Demands!– Changing Consistency Requirements!– Cloud-based Elastic Resources!– Huge Desire for New Solutions in the Marketplace!– Open Source is the key to Tech Transfer in Big Data!
• Need to consider role of people throughout the entire analytics lifecycle!8
![Page 9: UC#BERKELEY# - University of California, Los Angelescadlab.cs.ucla.edu/expeditions_pi_meeting/slides/MichaelFranklin... · • 700+ member Meetup group! • Best Paper Awards:](https://reader031.vdocuments.site/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e3554/html5/thumbnails/9.jpg)
AMPLab: Collaborative Research!An integration of Faculty Interests (*Directors):!!!!!!
9
Alex Bayen (Mobile Sensing) Anthony Joseph (Sec./ Privacy) Ken Goldberg (Crowdsourcing) Randy Katz (Systems) *Michael Franklin (Databases) Dave Pa`erson (Systems) Armando Fox (Systems) *Ion Stoica (Systems) *Mike Jordan (Machine Learning) Sco` Shenker (Networking)
Twice-Yearly Research Retreats (industry & sponsors):!
50+ amazing grad students, post-docs, undergrads, developers, staff & visitors!
![Page 10: UC#BERKELEY# - University of California, Los Angelescadlab.cs.ucla.edu/expeditions_pi_meeting/slides/MichaelFranklin... · • 700+ member Meetup group! • Best Paper Awards:](https://reader031.vdocuments.site/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e3554/html5/thumbnails/10.jpg)
Co-Located for Collaboration!
10
![Page 11: UC#BERKELEY# - University of California, Los Angelescadlab.cs.ucla.edu/expeditions_pi_meeting/slides/MichaelFranklin... · • 700+ member Meetup group! • Best Paper Awards:](https://reader031.vdocuments.site/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e3554/html5/thumbnails/11.jpg)
Collaboration: Industry + Government!!AMPLab Launched January 2011 (5 yr plan)!Founding Sponsors:!!Sponsors and Affiliates:!!!!Federal Grants and Contracts:!
!11
Expeditions in Computing
XData Program
![Page 12: UC#BERKELEY# - University of California, Los Angelescadlab.cs.ucla.edu/expeditions_pi_meeting/slides/MichaelFranklin... · • 700+ member Meetup group! • Best Paper Awards:](https://reader031.vdocuments.site/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e3554/html5/thumbnails/12.jpg)
Collaboration: Applications!!
Participatory Sensing! Mobile Millenium - Traffic!Collective Discovery !!Opinion Space - Opinions!!Carat – Smartphone energy!
Urban Planning and Simulation!! UrbanSim – data integration!
Cancer Genomics/Personalized Medicine (w/ UCSF and UCSC) !!!SNAP: Fast Sequence Alignment!!Genome Data Warehouse!12
![Page 13: UC#BERKELEY# - University of California, Los Angelescadlab.cs.ucla.edu/expeditions_pi_meeting/slides/MichaelFranklin... · • 700+ member Meetup group! • Best Paper Awards:](https://reader031.vdocuments.site/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e3554/html5/thumbnails/13.jpg)
Shared Deliverable:Berkeley Data Analytics Stack (BDAS)!
13
![Page 14: UC#BERKELEY# - University of California, Los Angelescadlab.cs.ucla.edu/expeditions_pi_meeting/slides/MichaelFranklin... · • 700+ member Meetup group! • Best Paper Awards:](https://reader031.vdocuments.site/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e3554/html5/thumbnails/14.jpg)
BDAS: Current Snapshot!
Mesos
MPI
Resource Mgmt.
Data Processing Storm
Spark
Spark Streaming Shark
BlinkDB
HDFS Data Mgmt.
Tachyon
Hadoop
HIVE Pig Spark
Graph ML base
Released (BDAS) In development (BDAS) Exis8ng open source stack
BDAS Components being released under BSD or Apache Open Source License
![Page 15: UC#BERKELEY# - University of California, Los Angelescadlab.cs.ucla.edu/expeditions_pi_meeting/slides/MichaelFranklin... · • 700+ member Meetup group! • Best Paper Awards:](https://reader031.vdocuments.site/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e3554/html5/thumbnails/15.jpg)
Big Data Landscape – Our Corner!
15
![Page 16: UC#BERKELEY# - University of California, Los Angelescadlab.cs.ucla.edu/expeditions_pi_meeting/slides/MichaelFranklin... · • 700+ member Meetup group! • Best Paper Awards:](https://reader031.vdocuments.site/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e3554/html5/thumbnails/16.jpg)
Impact (so far)!• Open Source Release of BDAS components:!
• Mesos: Cluster Virtualization !• Business critical services on 6000+ servers at Twitter!• see “How Twitter Rebuilt Google’s Secret Weapon” Wired 3/13!
• Spark: In-memory Computation Framework &! Shark: Hive-Compatible SQL Query Engine on Spark!
• in use at large companies, start ups, and govt. agencies !• 100x Performance Improvement over Hadoop/Apache Hive!• available on Amazon Elastic Map Reduce!• 700+ member Meetup group!
• Best Paper Awards: Eurosys 13, ICDE 13, NSDI 12, SIGCOMM 12 and Best Demo Award: SIGMOD 12!
• Students in high-demand in academia and industry!16
![Page 17: UC#BERKELEY# - University of California, Los Angelescadlab.cs.ucla.edu/expeditions_pi_meeting/slides/MichaelFranklin... · • 700+ member Meetup group! • Best Paper Awards:](https://reader031.vdocuments.site/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e3554/html5/thumbnails/17.jpg)
Spark: Sys/ML Collaboration at Work!
iter. 1 iter. 2 . . .
Logistic Regression Performance
29 GB dataset on 20 EC2 m1.xlarge machines (4 cores each)
Research Challenge Addressed: How to design a distributed memory abstraction that is both fault-tolerant and efficient?
Technical Challenge: disk-oriented Hadoop Map Reduce inefficient for iterative Machine Learning
Solution: Resilient Distributed Datasets (RDDs)
![Page 18: UC#BERKELEY# - University of California, Los Angelescadlab.cs.ucla.edu/expeditions_pi_meeting/slides/MichaelFranklin... · • 700+ member Meetup group! • Best Paper Awards:](https://reader031.vdocuments.site/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e3554/html5/thumbnails/18.jpg)
Impact: Carat Smartphone App!
18 Over 500,000
downloads
![Page 19: UC#BERKELEY# - University of California, Los Angelescadlab.cs.ucla.edu/expeditions_pi_meeting/slides/MichaelFranklin... · • 700+ member Meetup group! • Best Paper Awards:](https://reader031.vdocuments.site/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e3554/html5/thumbnails/19.jpg)
MLBase – Declarative ML!
19
Vision: Make Machine Learning usable by “mere mortals” Allow high-level (declarative) specification of ML tasks Use Database-style “query optimization to generate efficient execution strategy
![Page 20: UC#BERKELEY# - University of California, Los Angelescadlab.cs.ucla.edu/expeditions_pi_meeting/slides/MichaelFranklin... · • 700+ member Meetup group! • Best Paper Awards:](https://reader031.vdocuments.site/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e3554/html5/thumbnails/20.jpg)
Hybrid Human/Machine Systems!Use machines for bulk data processing!Leverage human activity for data collection and event detection!Leverage human knowledge, reasoning and perception for:!
• subjective entity comparisons!• complex predicates !• finding missing data!• disambiguating questions!
!!
20
Disk 2
Disk 1
Parser
Optimizer
Stat
istic
s
CrowdSQL Results
Executor
Files Access Methods
UI Template Manager
Form Editor
UI Creation
HIT Manager
Met
aDat
a
Turker Relationship Manager
e.g., CrowdDB Architecture
![Page 21: UC#BERKELEY# - University of California, Los Angelescadlab.cs.ucla.edu/expeditions_pi_meeting/slides/MichaelFranklin... · • 700+ member Meetup group! • Best Paper Awards:](https://reader031.vdocuments.site/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e3554/html5/thumbnails/21.jpg)
Outreach!
21
AMPCamp I @ Berkeley, August 2012 AMPCamp II @ Strata Conf., Feb 2013 AMPCamp III @ Berkeley, August 2013 AMPCamp Online:
ampcamp.berkeley.edu
![Page 22: UC#BERKELEY# - University of California, Los Angelescadlab.cs.ucla.edu/expeditions_pi_meeting/slides/MichaelFranklin... · • 700+ member Meetup group! • Best Paper Awards:](https://reader031.vdocuments.site/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e3554/html5/thumbnails/22.jpg)
What do we get from Expeditions?!
Simply put – the ability to!! ! ! ! ! “swing for the fences”!
22
![Page 23: UC#BERKELEY# - University of California, Los Angelescadlab.cs.ucla.edu/expeditions_pi_meeting/slides/MichaelFranklin... · • 700+ member Meetup group! • Best Paper Awards:](https://reader031.vdocuments.site/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e3554/html5/thumbnails/23.jpg)
For More Information!amplab.cs.berkeley.edu!
• Papers and Project Pages!
• News updates and Blogs!
Twitter: @amplab!Github and Apache!http://[email protected]!!!!
23