visual, interactive, predictive analytics for big data

31
Data Intelligence for All Christopher Nguyen, PhD Co-Founder & CEO Presented on December 2, 2013

Upload: adatao-inc

Post on 20-Aug-2015

518 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Visual, Interactive, Predictive Analytics for Big Data

Data Intelligence for All Christopher Nguyen, PhD Co-Founder & CEO

Presented on December 2, 2013

Page 2: Visual, Interactive, Predictive Analytics for Big Data

What do you get when you cross …

Atlantic Titanic

Page 3: Visual, Interactive, Predictive Analytics for Big Data

About Halfway!!

Page 4: Visual, Interactive, Predictive Analytics for Big Data

What do you get when you cross …

Page 5: Visual, Interactive, Predictive Analytics for Big Data

What do users need?

Page 6: Visual, Interactive, Predictive Analytics for Big Data

Let’s take a look!

Page 7: Visual, Interactive, Predictive Analytics for Big Data

… and that’s what we’re working on at Adatao

Page 8: Visual, Interactive, Predictive Analytics for Big Data

In the beginning, there was darkness...

1 9 8 0

Page 9: Visual, Interactive, Predictive Analytics for Big Data

$ $$$

Then came Business Intelligence

1 9 9 0

You could see what happened with your business

Page 10: Visual, Interactive, Predictive Analytics for Big Data

BIG DATA PROBLEMS

Huge Volume

High Velocity

Great Variety

2 0 0 5

Page 11: Visual, Interactive, Predictive Analytics for Big Data

HADOOPHIV E

MAPREDUCE

2 0 1 0

helped solve big data problems

Page 12: Visual, Interactive, Predictive Analytics for Big Data

BIG DATA + BIG COMPUTE = OPPORTUNITIES

Machine Learning

Predictive Analytics

Natural Language

automatic customer segmentation

2 0 1 3

BIG DATA= PROBLEMS

Page 13: Visual, Interactive, Predictive Analytics for Big Data

Powerful In-Memory Data Mining

Machine Learning Big Analytics Platform BIG

COMPUTE

(Hadoop HDFS, Cassandra, SQL DMBS, Streaming Data)BIG

DATA

Business Users Data Scientists Data Engineers

Visually Beautiful

Interactive DataExploration

Narrative Web App

BIG INSIGHTS

01100011

0110001

01100011

10001100

01100011

0110001

01100011

10001100

Page 14: Visual, Interactive, Predictive Analytics for Big Data

Hadoop distributed/streaming analytics, Yahoo Hadoop Eng, UIUC PhD

Machine learning & machine vision, US Army Research Lab, Johns Hopkins PhD

Big-Data Compute Engines, Google Apps Engineering Director, Google Founders’ Award, HKUST Prof, 2 successful enterprise exits, Stanford PhD

Deep engineering & business experience from Google, Yahoo et al. PhD’s in DM & ML from UIUC, Georgia Tech, Stanford, Berkeley, ...

Page 15: Visual, Interactive, Predictive Analytics for Big Data

Adatao pInsight demo

Page 16: Visual, Interactive, Predictive Analytics for Big Data

CLIENT WORKER WORKER WORKERWORKERMASTER

Demo Deployment Diagram

Page 17: Visual, Interactive, Predictive Analytics for Big Data

01100011

0110001

01100011

10001100

01100011

0110001

01100011

10001100

Adatao pAnalytics demo

Page 18: Visual, Interactive, Predictive Analytics for Big Data

Bonus Demo

Page 19: Visual, Interactive, Predictive Analytics for Big Data

Use Cases

Page 20: Visual, Interactive, Predictive Analytics for Big Data

Internet Service Provider

Interactive, Ad Hoc Business Query

Insight Discovery on Aggregated Operational Data

Finance

Engineering

Marketing

Sales

Page 21: Visual, Interactive, Predictive Analytics for Big Data

Customer Service Provider

Product Recommendation

Cross-channel User Experience Optimization

Page 22: Visual, Interactive, Predictive Analytics for Big Data

Sensor Network Analyticsfor Predictive Maintenance

Heavy Equipment Manufacturer

Page 23: Visual, Interactive, Predictive Analytics for Big Data

Mobile Ad Platform

Ad Targeting

CTR Prediction

Page 24: Visual, Interactive, Predictive Analytics for Big Data

Scaling Performance

Page 25: Visual, Interactive, Predictive Analytics for Big Data

Algorithm Run time (sec) for 50GB (800M rows)

Per-Core Throughput (MB/sec-core)

Per-Machine Throughput (MB/s)

bigr.lm (ridge) 3.04 130 1,040

bigr.lm 4.05 102 816

bigr.lm.gd 12.2 32 256

bigr.glm.gd 24.5 16 128

bigr.glm 36.1 11 88

bigr.kmeans 335 1.2 9.6

pAnalytics performance on building machine learning models with cluster Adatao16 (m3.2xlarge) on a 50GB data set of 5 features and 800 million rows. (Gradient descent algorithms are over 5 iterations)

Page 26: Visual, Interactive, Predictive Analytics for Big Data

Algorithm Run time (sec) for 1.1 TB dataset (1.6B rows)

Per-Core Throughput (MB/sec-core)

Per-Machine Throughput (MB/s)

bigr.lm (ridge) 70.9 130 1,040

bigr.lm 74.9 123 984

bigr.lm.gd 127 72.8 582

bigr.glm.gd 145 63.6 509

pAnalytics performance on building machine learning models with cluster Adatao40 (m3.2xlarge) on a 1.1 TB data set of 40 features and 1.6 billion rows. (Gradient descent algorithms are over 5 iterations)

Page 27: Visual, Interactive, Predictive Analytics for Big Data
Page 28: Visual, Interactive, Predictive Analytics for Big Data

Business Users

Fast & Easy Business Analytics !Natural Language !Beautiful Web UI

Big & Fast Data Science !R, Python, REST API !Data Mining & ML

Data Intelligence for All

Data Scientists & Engineers01100011

0110001

01100011

10001100

01100011

0110001

01100011

10001100

Page 29: Visual, Interactive, Predictive Analytics for Big Data

Thanks for contributing to

the Spark Community!

Page 30: Visual, Interactive, Predictive Analytics for Big Data
Page 31: Visual, Interactive, Predictive Analytics for Big Data

Thank you!

See demo video at !

http://youtu.be/5UAdk7oHoPE?t=7m