chicago finance-big-data
DESCRIPTION
Talk about what scalability really means in terms of interacting processes and statistics of growthTRANSCRIPT
![Page 1: Chicago finance-big-data](https://reader035.vdocuments.site/reader035/viewer/2022081602/554f5beab4c905b9508b5434/html5/thumbnails/1.jpg)
1©MapR Technologies - Confidential
Scalability in Hadoop and Similar Systems
![Page 2: Chicago finance-big-data](https://reader035.vdocuments.site/reader035/viewer/2022081602/554f5beab4c905b9508b5434/html5/thumbnails/2.jpg)
2©MapR Technologies - Confidential
Big is the next big thing
Big data and Hadoop are exploding
Companies are being funded
Books are being written
Applications sprouting up everywhere
2
![Page 3: Chicago finance-big-data](https://reader035.vdocuments.site/reader035/viewer/2022081602/554f5beab4c905b9508b5434/html5/thumbnails/3.jpg)
3©MapR Technologies - Confidential
Slow Motion Explosion
3
![Page 4: Chicago finance-big-data](https://reader035.vdocuments.site/reader035/viewer/2022081602/554f5beab4c905b9508b5434/html5/thumbnails/4.jpg)
4©MapR Technologies - Confidential
Hadoop Explosion
4
![Page 5: Chicago finance-big-data](https://reader035.vdocuments.site/reader035/viewer/2022081602/554f5beab4c905b9508b5434/html5/thumbnails/5.jpg)
5©MapR Technologies - Confidential
Why Now?
But Moore’s law has applied for a long time
Why is Hadoop exploding now?
Why not 10 years ago?
Why not 20?
59/18/12
![Page 6: Chicago finance-big-data](https://reader035.vdocuments.site/reader035/viewer/2022081602/554f5beab4c905b9508b5434/html5/thumbnails/6.jpg)
6©MapR Technologies - Confidential
Size Matters, but …
If it were just availability of data then existing big companies would adopt big data technology first
6
![Page 7: Chicago finance-big-data](https://reader035.vdocuments.site/reader035/viewer/2022081602/554f5beab4c905b9508b5434/html5/thumbnails/7.jpg)
7©MapR Technologies - Confidential
Size Matters, but …
If it were just availability of data then existing big companies would adopt big data technology first
They didn’t
7
![Page 8: Chicago finance-big-data](https://reader035.vdocuments.site/reader035/viewer/2022081602/554f5beab4c905b9508b5434/html5/thumbnails/8.jpg)
8©MapR Technologies - Confidential
Or Maybe Cost
If it were just a net positive value then finance companies should adopt first because they have higher opportunity value / byte
8
![Page 9: Chicago finance-big-data](https://reader035.vdocuments.site/reader035/viewer/2022081602/554f5beab4c905b9508b5434/html5/thumbnails/9.jpg)
9©MapR Technologies - Confidential
Or Maybe Cost
If it were just a net positive value then finance companies should adopt first because they have higher opportunity value / byte
They didn’t
9
![Page 10: Chicago finance-big-data](https://reader035.vdocuments.site/reader035/viewer/2022081602/554f5beab4c905b9508b5434/html5/thumbnails/10.jpg)
10©MapR Technologies - Confidential
Backwards adoption
Under almost any threshold argument startups would not adopt big data technology first
10
![Page 11: Chicago finance-big-data](https://reader035.vdocuments.site/reader035/viewer/2022081602/554f5beab4c905b9508b5434/html5/thumbnails/11.jpg)
11©MapR Technologies - Confidential
Backwards adoption
Under almost any threshold argument startups would not adopt big data technology first
They did
11
![Page 12: Chicago finance-big-data](https://reader035.vdocuments.site/reader035/viewer/2022081602/554f5beab4c905b9508b5434/html5/thumbnails/12.jpg)
12©MapR Technologies - Confidential
Everywhere at Once?
Something very strange is happening– Big data is being applied at many different scales– At many value scales– By large companies and small
12
![Page 13: Chicago finance-big-data](https://reader035.vdocuments.site/reader035/viewer/2022081602/554f5beab4c905b9508b5434/html5/thumbnails/13.jpg)
13©MapR Technologies - Confidential
Everywhere at Once?
Something very strange is happening– Big data is being applied at many different scales– At many value scales– By large companies and small
Why?
13
![Page 14: Chicago finance-big-data](https://reader035.vdocuments.site/reader035/viewer/2022081602/554f5beab4c905b9508b5434/html5/thumbnails/14.jpg)
14©MapR Technologies - Confidential
More data is being produced more quicklyData sizes are bigger than even a very large computer can holdCost to create and store continues to decrease
The Conventional Answer
BUSTED!
![Page 15: Chicago finance-big-data](https://reader035.vdocuments.site/reader035/viewer/2022081602/554f5beab4c905b9508b5434/html5/thumbnails/15.jpg)
15©MapR Technologies - Confidential
Analytics Scaling Laws
Analytics scaling is all about the 80-20 rule – Big gains for little initial effort– Rapidly diminishing returns
The key to net value is how costs scale– Old school – exponential scaling– Big data – linear scaling, low constant
Cost/performance has changed radically– IF you can use many commodity boxes
![Page 16: Chicago finance-big-data](https://reader035.vdocuments.site/reader035/viewer/2022081602/554f5beab4c905b9508b5434/html5/thumbnails/16.jpg)
16©MapR Technologies - Confidential
We knew that
We should have known that
We didn’t know that!
You’re kidding, people do that?
![Page 17: Chicago finance-big-data](https://reader035.vdocuments.site/reader035/viewer/2022081602/554f5beab4c905b9508b5434/html5/thumbnails/17.jpg)
17©MapR Technologies - Confidential
Anybody with eyes
Intern with a spreadsheet
In-house analytics
Industry-wide data consortium
NSA, non-proliferation
![Page 18: Chicago finance-big-data](https://reader035.vdocuments.site/reader035/viewer/2022081602/554f5beab4c905b9508b5434/html5/thumbnails/18.jpg)
18©MapR Technologies - Confidential
Net value optimum has a sharp peak well before maximum effort
![Page 19: Chicago finance-big-data](https://reader035.vdocuments.site/reader035/viewer/2022081602/554f5beab4c905b9508b5434/html5/thumbnails/19.jpg)
19©MapR Technologies - Confidential
But scaling laws are changing both slope and shape
![Page 20: Chicago finance-big-data](https://reader035.vdocuments.site/reader035/viewer/2022081602/554f5beab4c905b9508b5434/html5/thumbnails/20.jpg)
20©MapR Technologies - Confidential
More than just a little
![Page 21: Chicago finance-big-data](https://reader035.vdocuments.site/reader035/viewer/2022081602/554f5beab4c905b9508b5434/html5/thumbnails/21.jpg)
21©MapR Technologies - Confidential
They are changing a LOT!
![Page 22: Chicago finance-big-data](https://reader035.vdocuments.site/reader035/viewer/2022081602/554f5beab4c905b9508b5434/html5/thumbnails/22.jpg)
22©MapR Technologies - Confidential
![Page 23: Chicago finance-big-data](https://reader035.vdocuments.site/reader035/viewer/2022081602/554f5beab4c905b9508b5434/html5/thumbnails/23.jpg)
23©MapR Technologies - Confidential
![Page 24: Chicago finance-big-data](https://reader035.vdocuments.site/reader035/viewer/2022081602/554f5beab4c905b9508b5434/html5/thumbnails/24.jpg)
24©MapR Technologies - Confidential
![Page 25: Chicago finance-big-data](https://reader035.vdocuments.site/reader035/viewer/2022081602/554f5beab4c905b9508b5434/html5/thumbnails/25.jpg)
25©MapR Technologies - Confidential
![Page 26: Chicago finance-big-data](https://reader035.vdocuments.site/reader035/viewer/2022081602/554f5beab4c905b9508b5434/html5/thumbnails/26.jpg)
26©MapR Technologies - Confidential
Initially, linear cost scaling actually makes things worse
A tipping point is reached and things change radically …
![Page 27: Chicago finance-big-data](https://reader035.vdocuments.site/reader035/viewer/2022081602/554f5beab4c905b9508b5434/html5/thumbnails/27.jpg)
27©MapR Technologies - Confidential
Pre-requisites for Tipping
To reach the tipping point, Algorithms must scale out horizontally– On commodity hardware– That can and will fail
Data practice must change– Denormalized is the new black– Flexible data dictionaries are the rule– Structured data becomes rare
![Page 28: Chicago finance-big-data](https://reader035.vdocuments.site/reader035/viewer/2022081602/554f5beab4c905b9508b5434/html5/thumbnails/28.jpg)
28©MapR Technologies - Confidential
Yeah… but wait
![Page 29: Chicago finance-big-data](https://reader035.vdocuments.site/reader035/viewer/2022081602/554f5beab4c905b9508b5434/html5/thumbnails/29.jpg)
29©MapR Technologies - Confidential
The Standard Sort of Model
People talk about the law of large numbers as if it were …
Well, as if it were a law
It’s not …
It is a context and assumption dependent theorem
![Page 30: Chicago finance-big-data](https://reader035.vdocuments.site/reader035/viewer/2022081602/554f5beab4c905b9508b5434/html5/thumbnails/30.jpg)
30©MapR Technologies - Confidential
What if …
These assumptions are:
Changes have a – stationary, – independent, – finite variance distribution
What happens if these assumptions are wrong?
And which of them is really wrong?
![Page 31: Chicago finance-big-data](https://reader035.vdocuments.site/reader035/viewer/2022081602/554f5beab4c905b9508b5434/html5/thumbnails/31.jpg)
31©MapR Technologies - Confidential
For Example
![Page 32: Chicago finance-big-data](https://reader035.vdocuments.site/reader035/viewer/2022081602/554f5beab4c905b9508b5434/html5/thumbnails/32.jpg)
32©MapR Technologies - Confidential
End point has nice tractable distribution
![Page 33: Chicago finance-big-data](https://reader035.vdocuments.site/reader035/viewer/2022081602/554f5beab4c905b9508b5434/html5/thumbnails/33.jpg)
33©MapR Technologies - Confidential
What if the Assumptions are Wrong?
Take the finite variance as a simple example
This leads to Levy stable distributions
Like the Cauchy distribution
![Page 34: Chicago finance-big-data](https://reader035.vdocuments.site/reader035/viewer/2022081602/554f5beab4c905b9508b5434/html5/thumbnails/34.jpg)
34©MapR Technologies - Confidential
Is it Really Different?
![Page 35: Chicago finance-big-data](https://reader035.vdocuments.site/reader035/viewer/2022081602/554f5beab4c905b9508b5434/html5/thumbnails/35.jpg)
35©MapR Technologies - Confidential
![Page 36: Chicago finance-big-data](https://reader035.vdocuments.site/reader035/viewer/2022081602/554f5beab4c905b9508b5434/html5/thumbnails/36.jpg)
36©MapR Technologies - Confidential
What About Real Life?
![Page 37: Chicago finance-big-data](https://reader035.vdocuments.site/reader035/viewer/2022081602/554f5beab4c905b9508b5434/html5/thumbnails/37.jpg)
37©MapR Technologies - Confidential
![Page 38: Chicago finance-big-data](https://reader035.vdocuments.site/reader035/viewer/2022081602/554f5beab4c905b9508b5434/html5/thumbnails/38.jpg)
38©MapR Technologies - Confidential
But is it Really Infinite Variance?
Or are there other kinds of phenomena that show this?
What about the independence assumption?
What if the supposedly independent components of the system communicate?
Like we do. Everyday. All the time.
![Page 39: Chicago finance-big-data](https://reader035.vdocuments.site/reader035/viewer/2022081602/554f5beab4c905b9508b5434/html5/thumbnails/39.jpg)
39©MapR Technologies - Confidential
Why the Difference?
Law of large numbers
Infinitevariance
Interactingagents
Apologies and credit to Simon DaDeo, SFI
The space of all things that change
The space of interacting things
![Page 40: Chicago finance-big-data](https://reader035.vdocuments.site/reader035/viewer/2022081602/554f5beab4c905b9508b5434/html5/thumbnails/40.jpg)
40©MapR Technologies - Confidential
What Happens with Interactions
Social phenomena defeat the law of large numbers Distributions are well modeled by “rich get richer” processes– Pittman-Yar process, Indian Buffet
Limiting dstributions are heavy tailed, power law We see these distributions everywhere– price of cotton in the 19th century– word frequencies– popularity of Github projects– equity pricing and volumes– sizes of cities– popularity of web-sites
![Page 41: Chicago finance-big-data](https://reader035.vdocuments.site/reader035/viewer/2022081602/554f5beab4c905b9508b5434/html5/thumbnails/41.jpg)
41©MapR Technologies - Confidential
What are the Implications?
![Page 42: Chicago finance-big-data](https://reader035.vdocuments.site/reader035/viewer/2022081602/554f5beab4c905b9508b5434/html5/thumbnails/42.jpg)
42©MapR Technologies - Confidential
![Page 43: Chicago finance-big-data](https://reader035.vdocuments.site/reader035/viewer/2022081602/554f5beab4c905b9508b5434/html5/thumbnails/43.jpg)
43©MapR Technologies - Confidential
In a Nutshell
Scalability is much more important than we thought
Mashups are more important than we thought
Network effects are more important than we thought
Exploration is more important than we thought
Hadoop style linear scaling must be mixed with ad hoc analysis
![Page 44: Chicago finance-big-data](https://reader035.vdocuments.site/reader035/viewer/2022081602/554f5beab4c905b9508b5434/html5/thumbnails/44.jpg)
44©MapR Technologies - Confidential
Thank You
![Page 45: Chicago finance-big-data](https://reader035.vdocuments.site/reader035/viewer/2022081602/554f5beab4c905b9508b5434/html5/thumbnails/45.jpg)
45©MapR Technologies - Confidential
whoami?
Ted Dunning– @ted_dunning– [email protected] (MapR distribution for Hadoop)– [email protected] (Mahout, Hadoop, Lucene, Zookeeper, Drill)– [email protected] (me)
More info:
http://www.mapr.com/company/events/hadoop-in-finance-2012