Petabytes for Peanuts! Making sense of “Ambient Data”Dave Campbell & FriendsMicrosoft Corporation
SVC04
Key Takeaways…
> Massive shift in how we process data> Incredible data volumes> Remaking how we discover
> Changing the Scientific Method> Reducing latency & impedance
> Extreme Scale Data Processing> Stream Processing (Several Views)> From “programs” to “queries”
> What’s up with this “anti-SQL” stuff anyhow?
“Free” Storage Power
1982Storage Cost: $~2000Transfer Time: 1 day
1997Storage Cost: $~1.00Transfer Time: ½ hour
2009Storage Cost: ~0.1₵
Transfer Time: 8 sec.
Ambient Data?Over 84 percent of Americans have cell phones, according to Steve Largent, president and CEO of CTIA. While two trillion minutes were used in 2007, an 18 percent increase over 2006 talk times.
More than 48 billion text messages were sent in the month of December 2007, an average 1.6 billion messages per day. The rate of text messaging represented a 157 percent increase over December 2006 texting. http://www.clickz.com/3628985
Text Message Traffic in US: 160GB / day 58TB / year
Voice traffic in US (GSM encoding)
200PB / year
The Old World
> Data volumes constrained by human typing speed
> App & Data formed closed system
App
DB
Assume 200M people in US typing 8 hr / day @ 10K keystokes / hour:
2TB/hr or ~6PB / year
The Old New WorldAvailable
DataQuestions
toAnswerDesign Schema
Design ETL
DW Nirvana!
Available data exploded
What data shouldwe throw out?
What if we havea new question?
The New World of Abundant DataSave All Available
DataNew
Question to Answer
AlgorithmicProcessing
Interesting Read: The Petabyte Age: Because More Isn't Just More — More Is Differenthttp://www.wired.com/science/discoveries/magazine/16-07/pb_intro
Hypothesize Theorize Test
Correlation isEnough!
Run “query”over data…
Analyze reduced data
ExploitCorrelation…
The CMS front end of the Large Hadron Collider records 1TB/sec!
http://blogs.discovermagazine.com/cosmicvariance/2006/09/27/lhc-factoids/
Analyze Model Monitor
Analysis
Event Stream both stored and processed
1
Analysis produces event correlation models
2
Event Stream
Models installed in event processing engine
3
Produce real time alerts and action
4
Correlation Model
Event ProcessingEngine
Alerts & Action
StreamInsight
Roman SchindlauerProgram ManagerSQL Data Stream Engine
demo
Extreme Scale Data Processing
SourceSourceSourceSourceSource
ETL DW
Analysis / Reporting
Majority of data filtered or discarded
1
All data retained and reprocessed
2
DW
AnalysisAnalysis / Reporting
SourceSource
Non-t
radit
ional
Sourc
es
Tradit
ional D
ata
W
are
house
Extr
em
e S
cale
Data
Pro
cess
ing
LINQ to “whatever”…
Erik MeijerArchitect (& more…)BPD Cloud Programmability Team
demo
YOUR FEEDBACK IS IMPORTANT TO US!
Please fill out session evaluation
forms online atMicrosoftPDC.com
Learn More On Channel 9
> Expand your PDC experience through Channel 9
> Explore videos, hands-on labs, sample code and demos through the new Channel 9 training courses
channel9.msdn.com/learnBuilt by Developers for Developers….
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.