open source analytics visualization and predictive modeling of big data with r michael e. driscoll,...

39
Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON

Upload: jesus-pope

Post on 26-Mar-2015

218 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON

Open Source AnalyticsVisualization and Predictive Modeling of Big Data with R

Michael E. Driscoll, Ph.D.July 22, 2009

OSCON

Page 2: Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON
Page 3: Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON
Page 4: Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON
Page 5: Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON
Page 6: Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON

(from Jessica Hagy’s thisisindexed.com)

“Hard-working Middle Class” Hypothesis

Page 7: Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON

gdp <- read.csv('gdp.csv')hours <- read.csv('hours.csv')gdp.hours <- merge(hours,gdp)gdp.hours$freetime <- 4380 - gdp.hours$hours attach(gdp.hours)plot(freetime ~ gdp)

m <- lm(freetime ~ gdp,data=gdp.hours)abline(m,col=3,lw=2)pm <- loess(freetime ~ gdp)lines(spline(gdp,fitted(pm)))

Munge & Model OECD Data

Page 8: Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON

Visualize the Analysis: is it True?

Page 9: Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON

modeling Big Data

Page 10: Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON

100thousand gene measures

Page 11: Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON
Page 12: Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON

1million transactions during this presentation

Page 13: Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON

If You Liked ____, You’ll Love ___ !

Page 14: Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON

1 billion clicks during this presentation

Page 15: Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON
Page 16: Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON

1 million pitches thrownsince 2007

Page 17: Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON

A Tale of Two PitchersH

amel

sW

ebb

Page 18: Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON

xyplot(x ~ y, data=pitch)

Page 19: Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON

xyplot(x ~ y, groups=type, data=pitch)

Page 20: Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON

xyplot(x ~ y | type, data=pitch)

Page 21: Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON

xyplot(x ~ y | type, data=pitch,fill.color = pitch$color,panel = function(x,y, fill.color, …, subscripts) { fill <- fill.color[subscripts] panel.xyplot(x,y, fill= fill, …) })

Page 22: Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON

xyplot(x ~ y | type, data=pitch,fill.color = pitch$color,panel = function(x,y, fill.color, …, subscripts) { fill <- fill.color[subscripts] panel.xyplot(x, y, fill= fill, …) })

Page 23: Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON

visualizingBig Data

Page 24: Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON
Page 25: Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON

ggplot2 =grammar ofgraphics

Page 26: Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON
Page 27: Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON

qplot(carat, price, data = diamonds)

Page 28: Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON

qplot(log(carat), log(price), data = diamonds)

qplot(carat, price, log=“xy”, data = diamonds)OR

Page 29: Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON

qplot(log(carat), log(price), data = diamonds, alpha = I(1/20))

Page 30: Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON

qplot(log(carat), log(price), data = diamonds, alpha=I(1/20)) + facet_grid(. ~ color)

Page 31: Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON
Page 32: Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON

R on the cloud

Page 33: Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON

DataData

DesktopDesktop

Page 34: Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON

Coding Clickingvs

Page 35: Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON

LinuxApacheMySQLR

http://labs.dataspora.com/gameday

Page 36: Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON
Page 37: Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON
Page 38: Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON

Final thoughts

Page 39: Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON