a backstage tour of ggplot2 with hadley wickham
DESCRIPTION
Ggplot2 is one of R’s most popular, widely used packages, developed by Rice University’s Hadley Wickham. Ggplot2’s exploratory graphics capabilities are driving the use of R as a complement to legacy analytics tools such as SAS. SAS is well-regarded for its strength in data management and "production" statistics, where you know what you want to do and need to do it repeatedly. On the other hand, R is strong in data analysis and exploration in situations where figuring out what is needed is the biggest challenge. In this important way, SAS and R are strong companions. This webinar will provide an all-access pass to Hadley’s latest work. He’ll discuss: * A brief overview of ggplot2, and how it's different to other plotting systems * A sneak peek at some of the new features coming to the next version of ggplot2 * What’s been learned about good development practices in the 5 years since first starting to develop ggplot * Some of the internals of ggplot2, and talk about how he is gradually making it easier for others to contribute.TRANSCRIPT
February 2012
Hadley WickhamAssistant Professor / Dobelman Family Junior ChairDepartment of StatisticsRice University
ggplot2: A backstage tour
Wednesday, February 8, 12
1. Why ggplot2?
2. Sneak peek and new features
3. Best practices
4. Questions
Wednesday, February 8, 12
Poll: What graphics system are you currently using?
Wednesday, February 8, 12
Why ggplot2?
Wednesday, February 8, 12
WHC
day
whc
−0.3
−0.2
−0.1
0.0
0.1
0.2
20 40 60 80
02H02M12H
●
●
●
●
2004
Wednesday, February 8, 12
Wednesday, February 8, 12
“Nothing is as practical as a good theory”—Kurt Lewin
“[A good model] will bring together in a coherent way things that previously appeared unrelated and which also will provide a basis for dealing systematically with new situations”—David Cox
Wednesday, February 8, 12
A plot is made up of multiple layers.A layer consists of data, a set of mappings between variables and aesthetics, a geometric object and a statistical transformationScales control the details of the mapping.All components are independent and reusable.
Wednesday, February 8, 12
Interesting ggplot exampleLayered grammar + ggplot2
James Cheshire, http://bit.ly/xqHhAsWednesday, February 8, 12
Charlotte Wickham, http://cwick.co.nz/Wednesday, February 8, 12
David B Sparks, http://bit.ly/hn54NWWednesday, February 8, 12
Claudia Beleites, http://bit.ly/yNqlpzWednesday, February 8, 12
Poll: What resources are most helpful to you when improving your R skills?
Wednesday, February 8, 12
Learning ggplot2ggplot2 mailing listhttp://groups.google.com/group/ggplot2stackoverflowhttp://stackoverflow.com/tags/ggplot2Lattice to ggplot2 conversionhttp://learnr.wordpress.com/?s=latticeCookbook for common graphicshttp://wiki.stdout.org/rcookbook/Graphs/ggplot2 bookhttp://amzn.com/0387981403
Wednesday, February 8, 12
Sneak peek
Wednesday, February 8, 12
Poll: Why do you use visualisation?
Wednesday, February 8, 12
# Getting started
# To get the CRAN versioninstall.packages("ggplot2")
# To get the development versioninstall.packages("devtools")library(devtools)dev_mode() # don't overwrite your existing installinstall_github("ggplot2")
Wednesday, February 8, 12
Development version
CRAN version
Wednesday, February 8, 12
15
20
25
30
35
40
45
●●
●
●
●●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●
●
●
●●
●
●
● ●
●
●
●
●
●
●●●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●●●
●
●
●●●
2seater compact midsize minivan pickup subcompact suvclass
hwy
New geoms to deal with overplotting(by Winston Chang)
Wednesday, February 8, 12
15
20
25
30
35
40
45
●●
●
●
●●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●
●
●
●●
●
●
● ●
●
●
●
●
●
●●●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●●●
●
●
●●●
2seater compact midsize minivan pickup subcompact suvclass
hwy
qplot(class, hwy, data = mpg)
New geoms to deal with overplotting(by Winston Chang)
Wednesday, February 8, 12
15
20
25
30
35
40
45
●●
●
●
●●
●
●
●
●
●
●●●●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
● ●
●●
●
●
●
● ●
●
●
●●
● ●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
● ●●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●●
●
●●
●●
●●
●
●
● ●●
●
● ●
●
●
●
●
●
●●
●●
●
●
●
●
●●●
●
●
●
●
●
●●
● ●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●● ●●
●
●
●
●
●
●
●
●●●
●●
●●
●
2seater compact midsize minivan pickup subcompact suvclass
hwy
qplot(class, hwy, data = mpg, geom = "jitter")Wednesday, February 8, 12
15
20
25
30
35
40
45
2seater compact midsize minivan pickup subcompact suvclass
hwy
qplot(class, hwy, data = mpg, geom = "violin")Wednesday, February 8, 12
10
15
20
25
30
35
40
45
●●●●●
●●●
●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●
●●●●●
●
●●
●
●
●●
●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●
●●●●●●
●
●
●●●●●●
●●●●
●●●
●●●●●●●●●
●●●●●●●●●●●●●
●●●●●●●
●
●●●●●
●●●●●
●●●●●●●●●●●
●●●●●
●●●●●
●●
●
●
●●
●●●●●●●●
●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●
2seater compact midsize minivan pickup subcompact suvclass
hwy
Wednesday, February 8, 12
10
15
20
25
30
35
40
45
●●●●●
●●●
●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●
●●●●●
●
●●
●
●
●●
●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●
●●●●●●
●
●
●●●●●●
●●●●
●●●
●●●●●●●●●
●●●●●●●●●●●●●
●●●●●●●
●
●●●●●
●●●●●
●●●●●●●●●●●
●●●●●
●●●●●
●●
●
●
●●
●●●●●●●●
●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●
2seater compact midsize minivan pickup subcompact suvclass
hwy
qplot(class, hwy, data = mpg, geom = "dotplot", stackdir = "center", binaxis = "y", stackratio = 1, binwidth = 1)Wednesday, February 8, 12
0.0
0.2
0.4
0.6
0.8
1.0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0x
y
colour●
●
●
●
●
0.51.01.52.02.5
Better legends(by Kohske Takahashi)
Wednesday, February 8, 12
0.0
0.2
0.4
0.6
0.8
1.0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0x
y
colour●
●
●
●
●
0.51.01.52.02.5
df <- data.frame(x = runif(100), y = runif(100))df$colour <- with(df, x ^ 2 + y + runif(100))
qplot(x, y, data = df, colour = colour)
Better legends(by Kohske Takahashi)
Wednesday, February 8, 12
0.0
0.2
0.4
0.6
0.8
1.0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0x
y
colour● ● ●
● ●
0.5 1.0 1.5
2.0 2.5
qplot(x, y, data = df, colour = colour) + guides(colour = guide_legend(nrow = 2, byrow = T)) Wednesday, February 8, 12
0.0
0.2
0.4
0.6
0.8
1.0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0x
y
0.51.01.52.02.5
colour
qplot(x, y, data = df, colour = colour) + guides(colour = guide_colorbar())Wednesday, February 8, 12
qplot(x, y, data = df, colour = colour, alpha = I(1/4))Wednesday, February 8, 12
qplot(x, y, data = df, colour = colour, alpha = I(1/4)) + guides(colour = guide_legend( override.aes = list(alpha = 1, size = 2)))Wednesday, February 8, 12
# Better layout
df <- data.frame(x = 1:10, y = 10:1, colour = 1:2)qplot(x, y, data = df) + coord_fixed()qplot(x, y, data = df) + facet_wrap(~ colour)
# Internally, there has been a big rewrite of # the facetting data processing and rendering# systems. This lays the foundation for new # features, and fixes some annoying long-standing # bugs.
Wednesday, February 8, 12
# Speed improvements
system.time( print(qplot(carat, price, data = diamonds)))
# Includes new tools for figuring out what's# taking all the timebenchplot(qplot(carat, price, data = diamonds))
# See also geom_raster and geom_map
# Still a lot of work to do. The emphasis in# ggplot2 is reducing the amount of thinking # time by making it easier to go from the plot in# your brain to the plot on the page.
Wednesday, February 8, 12
30s with geom_tile, 8s with annotation_rasterWednesday, February 8, 12
library(ggplot2)library(reshape2)library(RgoogleMaps)library(ggmap)
theft <- subset(crime, offense == "theft" & lat > 29 & lat < 30.2 & lon > -95.8)
lonr <- range(theft$lon)latr <- range(theft$lat)
h_map <- GetMap.bbox(lonr, latr, size = c(1024, 1024))h_raster <- as.raster(h_map$myTile)
benchplot(ggplot(theft, aes(lon, lat)) + annotation_raster(h_raster, lonr[1], lonr[2], latr[1], latr[2]) + geom_density2d(colour = "black"))
h_data <- melt(as.matrix(h_raster))h_data$lat <- seq(latr[2], latr[1], length = 640)[h_data$Var1]h_data$lon <- seq(lonr[1], lonr[2], length = 640)[h_data$Var2]
benchplot(ggplot(theft, aes(lon, lat)) + geom_tile(aes(fill = value), data = h_data) + scale_fill_identity() + geom_density2d(colour = "black"))
Wednesday, February 8, 12
ggplot2 0.9 scheduled for release on March 1
Wednesday, February 8, 12
Poll: How big is your data?
Wednesday, February 8, 12
# Future work: big visualisation# (Sponsored by Revolution Analytics)
# How can you make a plot of 100 million # observations?
# In less that one minute.
Wednesday, February 8, 12
Wednesday, February 8, 12
Wednesday, February 8, 12
Wednesday, February 8, 12
Wednesday, February 8, 12
Wednesday, February 8, 12
~100,000 points0.06 s to bin0.20 s to convert6.0 s to plot
Wednesday, February 8, 12
~100,000 points0.06 s to bin0.20 s to convert6.0 s to plot
~1.2 million10 s to bin
Wednesday, February 8, 12
Best practices
Wednesday, February 8, 12
Poll: How do you learn about new packages?
Wednesday, February 8, 12
Package best practices
• Namespace• Documentation• Unit tests• Read the source!
• (ggplot2 not always the best example: it was was my second R package - I have now written around 30. I now know a lot more!)
Wednesday, February 8, 12
Wednesday, February 8, 12
# Namespaces
library(ggplot2)ddply
# Note that plyr, reshape etc aren't automatically# loaded. This is good development practice - # it's better to be explicit than implicit.
# Look at the NAMESPACE file.
Wednesday, February 8, 12
export("%+%")export(aes_all)export(aes_auto)export(aes_string)export(aes)export(annotate)export(annotation_custom)export(annotation_map)export(annotation_raster)export(autoplot)export(benchplot)export(borders)export(continuous_scale)export(coord_cartesian)export(coord_equal)export(coord_fixed)export(coord_flip)export(coord_map)export(coord_polar)...
Wednesday, February 8, 12
# Unit tests
# Look in tests/ or inst/tests/
library(testthat)test_package("ggplot2")
Wednesday, February 8, 12
# Documentation
# Function level in man/?geom_point?facet_wrappackage?ggplot2
# Vignettes in inst/doc# (ggplot2 doesn't have any)
# Publicationscitation("ggplot2")
Wednesday, February 8, 12
Questions
Wednesday, February 8, 12
Learning ggplot2ggplot2 mailing listhttp://groups.google.com/group/ggplot2stackoverflowhttp://stackoverflow.com/tags/ggplot2Lattice to ggplot2 conversionhttp://learnr.wordpress.com/?s=latticeCookbook for common graphicshttp://wiki.stdout.org/rcookbook/Graphs/ggplot2 bookhttp://amzn.com/0387981403
Wednesday, February 8, 12