adatao live demo at the first spark summit
DESCRIPTION
Adatao's Live Product Demo at the First Spark Summit December 2, 2013 Nikko Hotel, San FranciscoTRANSCRIPT
![Page 1: Adatao Live Demo at the First Spark Summit](https://reader033.vdocuments.site/reader033/viewer/2022060108/554fa0a4b4c9057b298b48a4/html5/thumbnails/1.jpg)
Adatao Live Demo at the First Spark Summit Dec 2, 2013, San Francisco (Video at the end of this deck)
Christopher Nguyen, PhD Co-Founder & CEO
DATA INTELLIGENCE FOR ALL
![Page 2: Adatao Live Demo at the First Spark Summit](https://reader033.vdocuments.site/reader033/viewer/2022060108/554fa0a4b4c9057b298b48a4/html5/thumbnails/2.jpg)
Hadoop distributed/streaming analytics, Yahoo Hadoop Eng, UIUC PhD
Machine learning & machine vision, US Army Research Lab, Johns Hopkins PhD
Big-Data Compute Engines, Google Apps Engineering Director, Google Founders’ Award, HKUST Prof, 2 successful enterprise exits, Stanford PhD
Deep engineering & business experience from Google, Yahoo et al. PhD’s in DM & ML from UIUC, Georgia Tech, Stanford, Berkeley, ...
![Page 3: Adatao Live Demo at the First Spark Summit](https://reader033.vdocuments.site/reader033/viewer/2022060108/554fa0a4b4c9057b298b48a4/html5/thumbnails/3.jpg)
Powerful In-Memory Data Mining
Machine Learning Big Analytics Platform BIG
COMPUTE
(Hadoop HDFS, Cassandra, SQL DMBS, Streaming Data)BIG
DATA
Business Users Data Scientists Data Engineers
Visually Beautiful
Interactive DataExploration
Narrative Web App
BIG INSIGHTS
01100011
0110001
01100011
10001100
01100011
0110001
01100011
10001100
ONE Integrated Platform for Business & Data Science & Engineering
![Page 4: Adatao Live Demo at the First Spark Summit](https://reader033.vdocuments.site/reader033/viewer/2022060108/554fa0a4b4c9057b298b48a4/html5/thumbnails/4.jpg)
Architecture Design One Integrated Platform for Business & Data Science & EngineeringBusiness Users
01100011
0110001
01100011
10001100
01100011
0110001
01100011
10001100
Data Scientists Data Engineers
OTHERS
Business Users
stack for
business users
Data Scientists Data Engineers
VSstack for
data science
stack for
data eng
![Page 5: Adatao Live Demo at the First Spark Summit](https://reader033.vdocuments.site/reader033/viewer/2022060108/554fa0a4b4c9057b298b48a4/html5/thumbnails/5.jpg)
for Data Scientists & Engineers
Powerful In-Memory Data Mining & Machine Learning—Model Terabytes in Seconds
Interactive, Cluster-Scale Data Munging & Modeling with Native R, R-Studio, Python, SQL, and Java Front-ends
Real-Time Scoring Directly From Trained Models
Share reproducible, live data analysis documents
Hadoop, Cassandra, RDBMS, Streaming Data
01100011
0110001
01100011
10001100
01100011
0110001
01100011
10001100 Big Data Mining & Machine Learning
![Page 6: Adatao Live Demo at the First Spark Summit](https://reader033.vdocuments.site/reader033/viewer/2022060108/554fa0a4b4c9057b298b48a4/html5/thumbnails/6.jpg)
for Business Users
A Beautiful New Way to Create & Share Visual Narratives of Your Analysis !Perform Ad Hoc Queries in Plain English !Publish Streaming, Interactive Dashboards !Collaborate With Others In Real Time !Query Terabytes in Seconds.
Predictive Decision Making
![Page 7: Adatao Live Demo at the First Spark Summit](https://reader033.vdocuments.site/reader033/viewer/2022060108/554fa0a4b4c9057b298b48a4/html5/thumbnails/7.jpg)
CLIENT WORKER WORKER WORKERWORKERMASTER
Demo Deployment Diagram
![Page 8: Adatao Live Demo at the First Spark Summit](https://reader033.vdocuments.site/reader033/viewer/2022060108/554fa0a4b4c9057b298b48a4/html5/thumbnails/8.jpg)
Demo Config
Cluster: 8-node x 8-core x 30GB RAM x 1TB Disk
Data Sets: 12GB-100GB, 100M-1B rows
Airline Arrival Data, 1988-2008 from DoT
![Page 9: Adatao Live Demo at the First Spark Summit](https://reader033.vdocuments.site/reader033/viewer/2022060108/554fa0a4b4c9057b298b48a4/html5/thumbnails/9.jpg)
Algorithms- LM & supporting statistics (AIC, log-likelihood, R2, cross-validation) - Binning - Classification metrics: confusion matrix, ROC, AUC, F1 - Logistic Regression with Ref Level for Categorical Vars - k-Means- Random Forest - Naive Bayes- Linear SVM
![Page 10: Adatao Live Demo at the First Spark Summit](https://reader033.vdocuments.site/reader033/viewer/2022060108/554fa0a4b4c9057b298b48a4/html5/thumbnails/10.jpg)
Algorithm Roadmap
- Hierarchical Clustering - Text Mining (token, POS, LDA, …) - SVD- Markov Chain Models- Ensemble Models - …
![Page 11: Adatao Live Demo at the First Spark Summit](https://reader033.vdocuments.site/reader033/viewer/2022060108/554fa0a4b4c9057b298b48a4/html5/thumbnails/11.jpg)
Thank you!
See demo video at !
http://youtu.be/5UAdk7oHoPE?t=7m