real-time big data analytics

16
Real-Time Big Data Analytics From Deployment to Production 1 David Smith Revolution Analytics @revodavid

Upload: cid

Post on 23-Feb-2016

92 views

Category:

Documents


0 download

DESCRIPTION

David Smith Revolution Analytics @ revodavid. Real-Time Big Data Analytics. From Deployment to Production. WHAT’S UP WITH THAT?. Buzzword Bingo!. REAL TIME. BIG DATA. PREDICTIVE ANALYTICS. Factors. Predictive Analytics Model. User ID Browser Time/Date / Location Previous purchases - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Real-Time Big Data Analytics

1

Real-Time Big Data AnalyticsFrom Deployment to

Production

David SmithRevolution Analytics

@revodavid

Page 2: Real-Time Big Data Analytics

2

WHAT’S UP

WITH THAT?

Page 3: Real-Time Big Data Analytics

3

REAL TIME

BIG DATA

PREDICTIVE ANALYTICS

Buzzword Bingo!

Page 4: Real-Time Big Data Analytics

4Photo: Sarah&Boston (flickr: pocheco) Creative Commons BY-SA 2.0

Page 5: Real-Time Big Data Analytics

5

Predictive Analytics Model

Factors

Scores

”IO VAPOURA” by Jaya Prime flickr.com/photos/sanjayaprime/4924462993 CC-BY 2.0

Decision TreeLogistic RegressionNeural NetworkK-means clusteringEnsemble Model

Predictive Model

User IDBrowserTime/Date / LocationPrevious purchasesFriend data

Any known information

Product of most interestOffer of most likely saleMost relevant linkForecast sale valueOptimal Bid

Prediction or Selection

Scoring Rules

Page 6: Real-Time Big Data Analytics

"CLOCK" by Heiko Klingele flickr.com/photos/divdax/3458668053/ CC-BY 2.0

6

Real-time Deployment1. Data distillation2. Model development and

validation3. Model deployment4. Real-time model scoring5. Model refresh

Page 7: Real-Time Big Data Analytics

7

1. Data Distillation in Hadoop

Unstructured

Data

Analytics Data Mart

Structured Data

Log Files

Sensor Streams

Language Text

HDFS Load Map-Reducermr

Page 8: Real-Time Big Data Analytics

8

2. The Model Development CycleFeature

SelectionSamplingAggregat

ionVariable Trans-

formation

Model Estimatio

n

Model Refinem

ent

Model Compari

son / Bench-

markingStructured Data Predictive Model

R White Paperbit.ly/r-is-hot

Page 9: Real-Time Big Data Analytics

9

3: Deployment OptionsUnknown factors

SQL / Rules EngineCode (C++, Java, R, Hadoop)PMML Engine

Factors known in advanceBatch Lookup Tables

Factors

Scores

Page 10: Real-Time Big Data Analytics

10

Why did I buy that blender?Just browsing in the mallTV ad / magazine adCoupon in the mail“Just moved” promo emailWebstore recommendationBrowsing catalog

Page 11: Real-Time Big Data Analytics

11

UpStream: Attribution Modeling

Page 12: Real-Time Big Data Analytics

• ETL• Marketing channel data• Behavioral variables• Promotional data• Overlay data

• Exploratory data analysis• Time-to-event models• GAM survival models

• Scoring for inference• Scoring for prediction

• 5 billion scores per day per retailer

UPSTREAM DATA FORMAT

CUSTOM VARIABLES (PMML)

4. Model Scorin

g

Page 13: Real-Time Big Data Analytics

13

5. Model refresh Factors

ScoresActual

Outcomes

Page 14: Real-Time Big Data Analytics

14

Big Data

Real TimeKilobytes/

SecMegabyte

s/Sec

Gigabytes Terabytes

Petabytes Exabytes

Seconds

Milliseconds

Minutes

Minutes Hours

Page 15: Real-Time Big Data Analytics

15

PREDICTIVE ANALYTICSBIG DATA

REAL TIMEWHAT’S UP

WITH THAT?

Page 16: Real-Time Big Data Analytics

16

www.revolutionanalytics.com +1 650 646 9545 Twitter: @RevolutionR

The leading enterprise provider of software and services for Open Source R

Real-Time Big Data Predictive Analytics: From Deployment to Production

Booth 618 / Office Hours Weds 1:30PM

David Smith@revodavid