introduction to real-time predictive modeling

40

Upload: lydung

Post on 13-Feb-2017

235 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to real-time predictive modeling
Page 2: Introduction to real-time predictive modeling
Page 3: Introduction to real-time predictive modeling
Page 4: Introduction to real-time predictive modeling
Page 5: Introduction to real-time predictive modeling

Factors

Scores / Classes

User Inputs

Prediction or Selection

Scoring Rules

Structured

Data

Page 6: Introduction to real-time predictive modeling

EXAMPLES

Predictive Modeling Applications

Page 8: Introduction to real-time predictive modeling

• Crime mapping

“The core innovation that Zillow

offers are its advanced statistical

predictive products, including the

Zestimate®, the Rent Zestimate

and the ZHVI® family of real

estate indexes. By using R in

production as well as research,

Zillow maximizes flexibility and

minimizes the latency in rolling

out updates and new products.”

• Statistical forecasting

Page 9: Introduction to real-time predictive modeling

Operational Announced

Central USIowa

West USCalifornia

North EuropeIreland

East USVirginia

East US 2Virginia

US GovVirginia

North Central US

Illinois

US GovIowa

South Central US

Texas

Brazil SouthSao Paulo

West EuropeNetherlands

China North *Beijing

China South *Shanghai

Japan EastSaitama

Japan WestOsakaIndia West

TBD

India EastTBD

East AsiaHong Kong

SE AsiaSingapore

Australia WestMelbourne

Australia EastSydney

* Operated by 21Vianet

Page 10: Introduction to real-time predictive modeling
Page 11: Introduction to real-time predictive modeling

http://blog.revolutionanalytics.com/2015/06/r-build-keynote.html/

Page 12: Introduction to real-time predictive modeling

REAL TIME

BIG DATA

PREDICTIVE ANALYTICS

Page 13: Introduction to real-time predictive modeling

Photo: Sarah&Boston (flickr: pocheco) Creative Commons BY-SA 2.0

Page 14: Introduction to real-time predictive modeling

"CLOCK" by Heiko Klingele flickr.com/photos/divdax/3458668053/ CC-BY 2.0

Page 15: Introduction to real-time predictive modeling

Structured

Data

Log Files

Sensor Streams

Language Text

ExtractionIngestion

Page 16: Introduction to real-time predictive modeling

Historical

Data

”IO VAPOURA” by Jaya Prime

flickr.com/photos/sanjayaprime/4924462993 CC-BY 2.0

Factors

Scores / Classes

Decision Tree

Logistic Regression

Neural Network

K-means clustering

Ensemble Model

User ID

Browser

Time/Date / Location

Previous purchases

Friend data

Any known information

Product of most interest

Offer of most likely sale

Most relevant link

Forecast sale value

Optimal Bid

Prediction or Selection

Scoring Rules

Page 17: Introduction to real-time predictive modeling

Feature Selection

Sampling

Aggregation

Variable Trans-

formation

Model Estimation

Model Refinement

Model Comparison /

Bench-marking

Known Factors

Known OutcomesPredictive Model

Page 18: Introduction to real-time predictive modeling

Name Node

Data NodeData Node Data NodeData Node Data Node

Job

Tracker

Task

Tracker

Task

Tracker

Task

Tracker

Task

Tracker

Task

Tracker

MapReduce

HDFS

Page 19: Introduction to real-time predictive modeling
Page 20: Introduction to real-time predictive modeling

Factors

Score

Structured

Data

Page 21: Introduction to real-time predictive modeling

Factors

Scores

Actual Outcomes

Structured

Data

Page 22: Introduction to real-time predictive modeling

Phase “Big Data” “Real Time”

Unstructured

Data

Petabytes (or

Exabytes!)

Minutes to Hours

Advanced

Analytics

Gigabytes to

Terabytes

Minutes

Deployment Megabytes/second Milliseconds

Consumption Kilobytes Seconds

Page 23: Introduction to real-time predictive modeling

powerbi.microsoft.com/en-us/industries/airline

Page 24: Introduction to real-time predictive modeling
Page 27: Introduction to real-time predictive modeling
Page 28: Introduction to real-time predictive modeling
Page 29: Introduction to real-time predictive modeling

Data• SQL Server 2016 Big-data R analytics integrated with SQL Server

database

• HDInsight Cloud-based Hadoop clusters

Develop

• Microsoft R Server Big-data R with distributed and in-database

computing

• Visual Studio R Tools for Visual Studio: integrated development

environment for R

Deploy• Azure ML Studio ML, Python and R in cloud-based Experiment

workflows

• Cortana Analytics Suite Cloud-based R APIs and Virtual Machines

Consume• PowerBI Computations and charts from R scripts in dashboards

• Excel With Azure ML Web Services plug-in

Page 30: Introduction to real-time predictive modeling

cloud computing

2011 2016 5x increase

data science

Universities filling 300,000 US talent gap

90% of the data in the world today has been created in the last two years alone

bigdata

opensourceincluding R, Linux, Hadoop

Page 31: Introduction to real-time predictive modeling

Getting Started with R tutorials:

• http://mran.microsoft.com/documents/getting-started/

Import/export data from SQL tables

• RODBC package: http://mran.microsoft.com/packages/info/?RODBC

Machine Learning Task View

• http://mran.microsoft.com/taskview/info/?MachineLearning

Applied Predictive Modeling (Kuhn & Johnson, 2014)

• http://appliedpredictivemodeling.com/ & R “caret” package

Page 32: Introduction to real-time predictive modeling

https://datainsightssummit.hubb.me/

Page 33: Introduction to real-time predictive modeling
Page 34: Introduction to real-time predictive modeling
Page 35: Introduction to real-time predictive modeling

http://blog.revolutionanalytics.com/2015/06/r-build-keynote.html/

Page 36: Introduction to real-time predictive modeling

Building a genetic disease risk application with RData

• Public genome data from 1000 Genomes

• About 2TB of raw data

Analytics Development

• Microsoft R Server

• VariantTools variant caller in R

Factors & Scores

• DNA Sample / genetic variations

• Risk association

Deployment and Consumption

• Expose as API

• Web page, phone app, etc

Data Platform

• HDInsight Hadoop 1800 Nodes

• Raw genome sequence data in HDFS

Page 37: Introduction to real-time predictive modeling
Page 38: Introduction to real-time predictive modeling
Page 39: Introduction to real-time predictive modeling
Page 40: Introduction to real-time predictive modeling

The Ultimate Business Analytics Training

Business analytics training doesn’t end today. Join us at the upcoming PASS Business Analytics Conference to gain more Power BI and Excel skills through practical, hands-on training that you can put to use immediately.

Like What You Heard?

Join David Smith again at the PASS BA Conference in the session:

“Power BI Desktop Deep Dive including R Integration”

May 2 – 4, 2016

San Jose, CA

REGISTER TODAYpassbaconference.com

Use discount code BACDATA for $150 savings*

Please Note: Discount Codes cannot be applied retroactively.