analyzing mlb data with ggplot
DESCRIPTION
Making basic, good-looking plots in Python is tough. Matplotlib gives you great control, but at the expense of being very detailed. The rise of pandas has made Python the go-to language for data wrangling and munging but many people are still reluctant to leave R because of its outstanding data viz packages. ggplot is a port of the popular R package ggplot2 into Python. It provides a high level grammar that allow users to quickly and easily make good looking plots. An example may be found here: http://blog.yhathq.com/posts/ggplot-for-python.html Greg will show you how to use ggplot to analyze data from the MLB's open data source, pitchf/x. He will take you through the basics of ggplot and show how easy it is to create histograms, plot smoothed curves, customize colors & shapes. http://www.meetup.com/PyData-Boston/events/184382092/TRANSCRIPT
analyzing MLB data with ggplot
Greg Lamp
ggplot
● What is it?● Alternatives● How it works● Why should I use it?● Brief case study● Questions
Here I am on the Internet.
Founder/CTO @ Yhat
Hi, I’m Greg!
What is ggplot?
DSL for graphics
DSL for graphics
scatterplot
histogram
labels
color
shape
What about matplotlib?
a quick example
matplotlib ggplot
it’s not all bad!
matplotlib
syntax, api, default themes, learning curve
matplotlib
maturity, ipython, customization, community
syntax, api, default themes, learning curve
What about d3.js?
d3.js
ggplot
ggplot d3.js
How it works
Format
ggplot
data frame
“aesthetics”
Aesthetics
color
shape
size
...fill, alpha, slope, intercept, ymin,
ymax, ...
Geoms, Stats, & Scales
geom_point
geom_area
...there are many
stat_smooth
...there are a few
scale_color_brewer
scale_color_gradient
...there are many
Layers
ggplot()
+ggplot() geom_point()
+ +ggplot() geom_point() stat_smooth()
+ +ggplot() geom_point() stat_smooth()+ +
ggplot() + geom_point() + stat_smooth()
Why is this good?
Makes “reasonable assumptions”
not real colors
matplotlib freaks
still not real colors
...but i can guess what you mean
Concise yet expressive
Looks pretty good(and is easy to customize)
Seaborngithub.com/mwaskom/seaborn
Case Study
pitch speed
103.4 mph
Load ggplot and pandas
Read in our pitch f/x data
define the x-axis
pass in your data frame
add a histogram
How does fatigue impact velocity?
...not helpful
What about at the individual level?
Justin Verlander
ggplot let’s you fail quicker
Finding Help
/tagged/python-ggplot
What’s next?
Thanks!@theglamp