real-time big data analytics

Post on 23-Feb-2016

92 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

David Smith Revolution Analytics @ revodavid. Real-Time Big Data Analytics. From Deployment to Production. WHAT’S UP WITH THAT?. Buzzword Bingo!. REAL TIME. BIG DATA. PREDICTIVE ANALYTICS. Factors. Predictive Analytics Model. User ID Browser Time/Date / Location Previous purchases - PowerPoint PPT Presentation

TRANSCRIPT

1

Real-Time Big Data AnalyticsFrom Deployment to

Production

David SmithRevolution Analytics

@revodavid

2

WHAT’S UP

WITH THAT?

3

REAL TIME

BIG DATA

PREDICTIVE ANALYTICS

Buzzword Bingo!

4Photo: Sarah&Boston (flickr: pocheco) Creative Commons BY-SA 2.0

5

Predictive Analytics Model

Factors

Scores

”IO VAPOURA” by Jaya Prime flickr.com/photos/sanjayaprime/4924462993 CC-BY 2.0

Decision TreeLogistic RegressionNeural NetworkK-means clusteringEnsemble Model

Predictive Model

User IDBrowserTime/Date / LocationPrevious purchasesFriend data

Any known information

Product of most interestOffer of most likely saleMost relevant linkForecast sale valueOptimal Bid

Prediction or Selection

Scoring Rules

"CLOCK" by Heiko Klingele flickr.com/photos/divdax/3458668053/ CC-BY 2.0

6

Real-time Deployment1. Data distillation2. Model development and

validation3. Model deployment4. Real-time model scoring5. Model refresh

7

1. Data Distillation in Hadoop

Unstructured

Data

Analytics Data Mart

Structured Data

Log Files

Sensor Streams

Language Text

HDFS Load Map-Reducermr

8

2. The Model Development CycleFeature

SelectionSamplingAggregat

ionVariable Trans-

formation

Model Estimatio

n

Model Refinem

ent

Model Compari

son / Bench-

markingStructured Data Predictive Model

R White Paperbit.ly/r-is-hot

9

3: Deployment OptionsUnknown factors

SQL / Rules EngineCode (C++, Java, R, Hadoop)PMML Engine

Factors known in advanceBatch Lookup Tables

Factors

Scores

10

Why did I buy that blender?Just browsing in the mallTV ad / magazine adCoupon in the mail“Just moved” promo emailWebstore recommendationBrowsing catalog

11

UpStream: Attribution Modeling

• ETL• Marketing channel data• Behavioral variables• Promotional data• Overlay data

• Exploratory data analysis• Time-to-event models• GAM survival models

• Scoring for inference• Scoring for prediction

• 5 billion scores per day per retailer

UPSTREAM DATA FORMAT

CUSTOM VARIABLES (PMML)

4. Model Scorin

g

13

5. Model refresh Factors

ScoresActual

Outcomes

14

Big Data

Real TimeKilobytes/

SecMegabyte

s/Sec

Gigabytes Terabytes

Petabytes Exabytes

Seconds

Milliseconds

Minutes

Minutes Hours

15

PREDICTIVE ANALYTICSBIG DATA

REAL TIMEWHAT’S UP

WITH THAT?

16

www.revolutionanalytics.com +1 650 646 9545 Twitter: @RevolutionR

The leading enterprise provider of software and services for Open Source R

Real-Time Big Data Predictive Analytics: From Deployment to Production

Booth 618 / Office Hours Weds 1:30PM

David Smith@revodavid

top related