“big data” and data -intensive science (escience)
DESCRIPTION
“Big Data” and Data -Intensive Science (eScience). Ed Lazowska Bill & Melinda Gates Chair in Computer Science & Engineering University of Washington July 2013. E xponential improvements in technology and algorithms are enabling the “big data” revolution. A proliferation of sensors - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: “Big Data” and Data -Intensive Science (eScience)](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815f12550346895dcdd82c/html5/thumbnails/1.jpg)
“Big Data” andData-Intensive Science (eScience)
Ed LazowskaBill & Melinda Gates Chair in Computer Science & EngineeringUniversity of Washington
July 2013
![Page 2: “Big Data” and Data -Intensive Science (eScience)](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815f12550346895dcdd82c/html5/thumbnails/2.jpg)
Exponential improvements in technology and algorithms are enabling the “big data” revolution
A proliferation of sensors Think about the sensors on your phone
More generally, the creation of almost all information in digital form It doesn’t need to be transcribed in order to be
processed Dramatic cost reductions in storage
You can afford to keep all the data Dramatic increases in network bandwidth
You can move the data to where it’s needed
![Page 3: “Big Data” and Data -Intensive Science (eScience)](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815f12550346895dcdd82c/html5/thumbnails/3.jpg)
Dramatic cost reductions and scalability improvements in computation With Amazon Web Services, or Google App Engine, or
Microsoft Azure, 1000 computers for 1 day cost the same as 1 computer for 1000 days!
Dramatic algorithmic breakthroughs Machine learning, data mining – fundamental advances
in computer science and statistics
![Page 4: “Big Data” and Data -Intensive Science (eScience)](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815f12550346895dcdd82c/html5/thumbnails/4.jpg)
Some examples of “big data” in action
Collaborative filtering
![Page 5: “Big Data” and Data -Intensive Science (eScience)](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815f12550346895dcdd82c/html5/thumbnails/5.jpg)
Fraud detection
![Page 6: “Big Data” and Data -Intensive Science (eScience)](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815f12550346895dcdd82c/html5/thumbnails/6.jpg)
Price prediction
![Page 7: “Big Data” and Data -Intensive Science (eScience)](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815f12550346895dcdd82c/html5/thumbnails/7.jpg)
Hospital re-admission prediction
![Page 8: “Big Data” and Data -Intensive Science (eScience)](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815f12550346895dcdd82c/html5/thumbnails/8.jpg)
Travel time prediction under specific circumstances
![Page 9: “Big Data” and Data -Intensive Science (eScience)](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815f12550346895dcdd82c/html5/thumbnails/9.jpg)
Sports
![Page 10: “Big Data” and Data -Intensive Science (eScience)](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815f12550346895dcdd82c/html5/thumbnails/10.jpg)
Home energy monitoring
![Page 11: “Big Data” and Data -Intensive Science (eScience)](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815f12550346895dcdd82c/html5/thumbnails/11.jpg)
Larry Smarr, UCSD
Gordon Bell, Microsoft Research
John Guttag & Collin Stultz, MIT
Google self-driving car
![Page 12: “Big Data” and Data -Intensive Science (eScience)](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815f12550346895dcdd82c/html5/thumbnails/12.jpg)
Speech recognition
![Page 13: “Big Data” and Data -Intensive Science (eScience)](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815f12550346895dcdd82c/html5/thumbnails/13.jpg)
Machine translation Speech -> text Text -> text translation Text -> speech in speaker’s voice
http://www.youtube.com/watch?v=Nu-nlQqFCKg&t=7m30s7:30 – 8:40
![Page 14: “Big Data” and Data -Intensive Science (eScience)](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815f12550346895dcdd82c/html5/thumbnails/14.jpg)
Scientific discovery
Ocean Observatories Initiative
Gene Sequencing
Large Hadron ColliderLarge Synoptic Survey Telescope
![Page 15: “Big Data” and Data -Intensive Science (eScience)](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815f12550346895dcdd82c/html5/thumbnails/15.jpg)
Presidential campaigning
![Page 16: “Big Data” and Data -Intensive Science (eScience)](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815f12550346895dcdd82c/html5/thumbnails/16.jpg)
Electoral forecasting
![Page 17: “Big Data” and Data -Intensive Science (eScience)](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815f12550346895dcdd82c/html5/thumbnails/17.jpg)
Real data-driven decision-making (vs. MBA baloney) for every sector!
![Page 18: “Big Data” and Data -Intensive Science (eScience)](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815f12550346895dcdd82c/html5/thumbnails/18.jpg)
eScience: Sensor-driven (data-driven) science and engineering
Transforming science (again!)Jim Gray
![Page 19: “Big Data” and Data -Intensive Science (eScience)](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815f12550346895dcdd82c/html5/thumbnails/19.jpg)
TheoryExperimentObservation
![Page 20: “Big Data” and Data -Intensive Science (eScience)](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815f12550346895dcdd82c/html5/thumbnails/20.jpg)
TheoryExperimentObservation
![Page 21: “Big Data” and Data -Intensive Science (eScience)](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815f12550346895dcdd82c/html5/thumbnails/21.jpg)
TheoryExperiment
Observation
[John Delaney, University of Washington]
![Page 22: “Big Data” and Data -Intensive Science (eScience)](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815f12550346895dcdd82c/html5/thumbnails/22.jpg)
TheoryExperimentObservation
ComputationalScience
![Page 23: “Big Data” and Data -Intensive Science (eScience)](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815f12550346895dcdd82c/html5/thumbnails/23.jpg)
TheoryExperimentObservation
ComputationalScience
eScience
![Page 24: “Big Data” and Data -Intensive Science (eScience)](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815f12550346895dcdd82c/html5/thumbnails/24.jpg)
eScience is driven by data more than by cycles
Massive volumes of data from sensors and networks of sensors
Apache Point telescope, SDSS
80TB of raw image data (80,000,000,000,000 bytes)
over a 7 year period
![Page 25: “Big Data” and Data -Intensive Science (eScience)](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815f12550346895dcdd82c/html5/thumbnails/25.jpg)
Large Synoptic Survey Telescope (LSST)
40TB/day(an SDSS every two days),
100+PB in its 10-year lifetime
400mbps sustained data rate between
Chile and NCSA
![Page 26: “Big Data” and Data -Intensive Science (eScience)](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815f12550346895dcdd82c/html5/thumbnails/26.jpg)
Large Hadron Collider700MB of data
per second,60TB/day, 20PB/year
![Page 27: “Big Data” and Data -Intensive Science (eScience)](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815f12550346895dcdd82c/html5/thumbnails/27.jpg)
IlluminaHiSeq 2000 Sequencer
~1TB/day
Major labs have 25-100
of these machines
![Page 28: “Big Data” and Data -Intensive Science (eScience)](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815f12550346895dcdd82c/html5/thumbnails/28.jpg)
Regional Scale Nodes of the NSF
Ocean Observatories
Initiative1000 km of fiber
optic cable on the seafloor, connecting
thousands of chemical, physical,
and biological sensors
![Page 29: “Big Data” and Data -Intensive Science (eScience)](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815f12550346895dcdd82c/html5/thumbnails/29.jpg)
The Web20+ billion web pages
x 20KB = 400+TBOne computer can read 30-35 MB/sec
from disk => 4 months just to read the web
![Page 30: “Big Data” and Data -Intensive Science (eScience)](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815f12550346895dcdd82c/html5/thumbnails/30.jpg)
eScience is about the analysis of data
The automated or semi-automated extraction of knowledge from massive volumes of data There’s simply too much of it to look at
It’s not just a matter of volume Volume Rate Complexity / dimensionality
![Page 31: “Big Data” and Data -Intensive Science (eScience)](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815f12550346895dcdd82c/html5/thumbnails/31.jpg)
eScience utilizes a spectrum of computer science techniques and technologies
Sensors and sensor networks
Backbone networks
Databases Data mining Machine learning Data visualization Cluster computing
at enormous scale
![Page 32: “Big Data” and Data -Intensive Science (eScience)](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815f12550346895dcdd82c/html5/thumbnails/32.jpg)
eScience will be pervasive
Simulation-oriented computational science has been transformational, but it has been a niche As an institution (e.g., a university), you didn’t need to
excel in order to be competitive eScience capabilities must be broadly available in
any institution If not, the institution will simply cease to be competitive