![Page 1: NoSQL & Big Data Analytics: History, Hype, Opportunities](https://reader031.vdocuments.site/reader031/viewer/2022013100/5483488d5906b5bc158b466f/html5/thumbnails/1.jpg)
1
analyze(NoSQL,BigData);/* history, hype, opportunities */
// By: Vishy Poosala
// Head of Bell Labs, India
// @vishyp
![Page 2: NoSQL & Big Data Analytics: History, Hype, Opportunities](https://reader031.vdocuments.site/reader031/viewer/2022013100/5483488d5906b5bc158b466f/html5/thumbnails/2.jpg)
2
The dark ages of COBOL
![Page 3: NoSQL & Big Data Analytics: History, Hype, Opportunities](https://reader031.vdocuments.site/reader031/viewer/2022013100/5483488d5906b5bc158b466f/html5/thumbnails/3.jpg)
3
..then Codd saidlet there be tables
Rows & Columns
Normal Forms
ACID
SQL
![Page 4: NoSQL & Big Data Analytics: History, Hype, Opportunities](https://reader031.vdocuments.site/reader031/viewer/2022013100/5483488d5906b5bc158b466f/html5/thumbnails/4.jpg)
4
www.data-for-humans.com
WHAT COLUMNS
? SET-VALUED
ATTRIBUTES
Schema Evolution
XML
![Page 5: NoSQL & Big Data Analytics: History, Hype, Opportunities](https://reader031.vdocuments.site/reader031/viewer/2022013100/5483488d5906b5bc158b466f/html5/thumbnails/5.jpg)
5
Billions of Keys & Values
Cassandra
Dynamo
Hadoop
Big Table
GFS
![Page 6: NoSQL & Big Data Analytics: History, Hype, Opportunities](https://reader031.vdocuments.site/reader031/viewer/2022013100/5483488d5906b5bc158b466f/html5/thumbnails/6.jpg)
6
How would you build a super-fast, FB-scale chat service, in 2012?
(for example)
![Page 7: NoSQL & Big Data Analytics: History, Hype, Opportunities](https://reader031.vdocuments.site/reader031/viewer/2022013100/5483488d5906b5bc158b466f/html5/thumbnails/7.jpg)
7
I want my own DB!
• Memcached• redisMain
Memory
• MongoDBDistr.
K-V
• CouchDBVersions
• Neo4jSocial Graphs
![Page 8: NoSQL & Big Data Analytics: History, Hype, Opportunities](https://reader031.vdocuments.site/reader031/viewer/2022013100/5483488d5906b5bc158b466f/html5/thumbnails/8.jpg)
8
BIG
Data
Analytics
Language
60’s 80-96
96-’07 ‘07-
KB
FILES
STATS
COBOL
GB
TABLES
OLAPCube
SQL
TB
Semi-Structured
Apps
XML
PB
VarietyDynamic
Mahout
NoSQL
![Page 9: NoSQL & Big Data Analytics: History, Hype, Opportunities](https://reader031.vdocuments.site/reader031/viewer/2022013100/5483488d5906b5bc158b466f/html5/thumbnails/9.jpg)
9
Following *AMAZING* Slides Courtesy: Gregory Piatesky-Shapiro, kdnuggets.com
You can find all the slides from his talk at:
http://www.slideshare.net/gpiatetskyshapiro/analytics-and-data-mining-industry-overview
Analyzing Analytics,
Job Trends
![Page 10: NoSQL & Big Data Analytics: History, Hype, Opportunities](https://reader031.vdocuments.site/reader031/viewer/2022013100/5483488d5906b5bc158b466f/html5/thumbnails/10.jpg)
10
Data Tsunami
• In 2010 enterprises stored 7 exabytes =7,000,000,000 GB
of new data (McKinsey)• 90 percent of the
world's data has been generated in the past two years (IBM)
Image with apologies to KDD-2011
![Page 11: NoSQL & Big Data Analytics: History, Hype, Opportunities](https://reader031.vdocuments.site/reader031/viewer/2022013100/5483488d5906b5bc158b466f/html5/thumbnails/11.jpg)
11
Pre-history
From Google Ngram viewer – English language booksNote: Our analysis uses only English language data. Other languages, especially Chinese , need to be considered for full picture
Statistics is the biggest term in 20th century, but data mining and analytics appears in late 1990s
![Page 12: NoSQL & Big Data Analytics: History, Hype, Opportunities](https://reader031.vdocuments.site/reader031/viewer/2022013100/5483488d5906b5bc158b466f/html5/thumbnails/12.jpg)
12
Recent History: Analytics, Data Mining, Knowledge Discovery
Analytics has been used since 1800, but started to rise in 2005Data Mining jumps around 1996 (soon after first KDD conference) but declines after 2003 (TIA controversy, associated with gov. invasion of privacy).Knowledge Discovery appears in 1989, jumps in 1996, and plateaus after 2000
![Page 13: NoSQL & Big Data Analytics: History, Hype, Opportunities](https://reader031.vdocuments.site/reader031/viewer/2022013100/5483488d5906b5bc158b466f/html5/thumbnails/13.jpg)
13
Google Trends: After 2006, Data Mining < Analytics
![Page 14: NoSQL & Big Data Analytics: History, Hype, Opportunities](https://reader031.vdocuments.site/reader031/viewer/2022013100/5483488d5906b5bc158b466f/html5/thumbnails/14.jpg)
14
Google Insights: searches for data mining, analytics -googleare most popular in India, US
![Page 15: NoSQL & Big Data Analytics: History, Hype, Opportunities](https://reader031.vdocuments.site/reader031/viewer/2022013100/5483488d5906b5bc158b466f/html5/thumbnails/15.jpg)
15
Analytics > Data Mining > Data Science
![Page 16: NoSQL & Big Data Analytics: History, Hype, Opportunities](https://reader031.vdocuments.site/reader031/viewer/2022013100/5483488d5906b5bc158b466f/html5/thumbnails/16.jpg)
16
Data Science, Big Data
![Page 17: NoSQL & Big Data Analytics: History, Hype, Opportunities](https://reader031.vdocuments.site/reader031/viewer/2022013100/5483488d5906b5bc158b466f/html5/thumbnails/17.jpg)
17
Data Types Analyzed/Mined
www.KDnuggets.com/polls/2011/data-types-analyzed-mined.html
![Page 18: NoSQL & Big Data Analytics: History, Hype, Opportunities](https://reader031.vdocuments.site/reader031/viewer/2022013100/5483488d5906b5bc158b466f/html5/thumbnails/18.jpg)
18
Largest Dataset Analyzed?2011 median dataset size ~10-20 GB, vs 8-10 GB in 2010.
Increase in10 GB to 1 PB range
www.KDnuggets.com/polls/2011/largest-dataset-analyzed-data-mined.html
![Page 19: NoSQL & Big Data Analytics: History, Hype, Opportunities](https://reader031.vdocuments.site/reader031/viewer/2022013100/5483488d5906b5bc158b466f/html5/thumbnails/19.jpg)
19
Which methods/algorithms did you use for data analysis in 2011
Decision Trees
Regression
Clustering
Statistics
Visualization
Time series/Sequence analysis
Support Vector (SVM)
Association rules
Ensemble methods
Text Mining
Neural Nets
Boosting
Bayesian
Bagging
Factor Analysis
Anomaly/Deviation detection
Social Network Analysis
Survival Analysis
Genetic algorithms
Uplift modeling
0% 10% 20% 30% 40% 50% 60% 70%
% analysts who used it
www.KDnuggets.com/polls/2011/algorithms-analytics-data-mining.html
![Page 20: NoSQL & Big Data Analytics: History, Hype, Opportunities](https://reader031.vdocuments.site/reader031/viewer/2022013100/5483488d5906b5bc158b466f/html5/thumbnails/20.jpg)
20
Cloud Analytics is not common (yet)
www.KDnuggets.com/polls/2011/algorithms-analytics-data-mining.html
![Page 21: NoSQL & Big Data Analytics: History, Hype, Opportunities](https://reader031.vdocuments.site/reader031/viewer/2022013100/5483488d5906b5bc158b466f/html5/thumbnails/21.jpg)
21
Shortage of Skills
• McKinsey: shortage by 2018 in the US of– 140-190,000 people with deep analytical skills
– 1.5 M managers/analysts with the know-how to use the analysis of big data to make effective decisions.
Source: www.mckinsey.com/mgi/publications/big_data/
![Page 22: NoSQL & Big Data Analytics: History, Hype, Opportunities](https://reader031.vdocuments.site/reader031/viewer/2022013100/5483488d5906b5bc158b466f/html5/thumbnails/22.jpg)
22
Job data: Data Scientist
![Page 23: NoSQL & Big Data Analytics: History, Hype, Opportunities](https://reader031.vdocuments.site/reader031/viewer/2022013100/5483488d5906b5bc158b466f/html5/thumbnails/23.jpg)
23
Jobs: Data Mining >> Data Scientist
![Page 24: NoSQL & Big Data Analytics: History, Hype, Opportunities](https://reader031.vdocuments.site/reader031/viewer/2022013100/5483488d5906b5bc158b466f/html5/thumbnails/24.jpg)
24
“Ground” Analytics (LinkedIn Skills)
~ 75,000 with Data Mining skill
~ 7,000 with Predictive Modeling
Also ~ 20,000 with Predictive Analytics(not related with Predictive Modeling ??
![Page 25: NoSQL & Big Data Analytics: History, Hype, Opportunities](https://reader031.vdocuments.site/reader031/viewer/2022013100/5483488d5906b5bc158b466f/html5/thumbnails/25.jpg)
25
Analytics LinkedIn Skills
Machine LearningPredictive Analytics
Text Mining MapRedu
ce
![Page 26: NoSQL & Big Data Analytics: History, Hype, Opportunities](https://reader031.vdocuments.site/reader031/viewer/2022013100/5483488d5906b5bc158b466f/html5/thumbnails/26.jpg)
26
Big Data Bubble?
Gartner Hype Cycle
Big Data
![Page 27: NoSQL & Big Data Analytics: History, Hype, Opportunities](https://reader031.vdocuments.site/reader031/viewer/2022013100/5483488d5906b5bc158b466f/html5/thumbnails/27.jpg)
27
@vishyp
http://innovation-edge.blogspot.com