analyzing mlb data with ggplot

91
analyzing MLB data with ggplot Greg Lamp

Upload: yhat

Post on 26-Jan-2015

111 views

Category:

Technology


2 download

DESCRIPTION

Making basic, good-looking plots in Python is tough. Matplotlib gives you great control, but at the expense of being very detailed. The rise of pandas has made Python the go-to language for data wrangling and munging but many people are still reluctant to leave R because of its outstanding data viz packages. ggplot is a port of the popular R package ggplot2 into Python. It provides a high level grammar that allow users to quickly and easily make good looking plots. An example may be found here: http://blog.yhathq.com/posts/ggplot-for-python.html Greg will show you how to use ggplot to analyze data from the MLB's open data source, pitchf/x. He will take you through the basics of ggplot and show how easy it is to create histograms, plot smoothed curves, customize colors & shapes. http://www.meetup.com/PyData-Boston/events/184382092/

TRANSCRIPT

Page 1: Analyzing mlb data with ggplot

analyzing MLB data with ggplot

Greg Lamp

Page 2: Analyzing mlb data with ggplot

ggplot

● What is it?● Alternatives● How it works● Why should I use it?● Brief case study● Questions

Page 3: Analyzing mlb data with ggplot

Here I am on the Internet.

Founder/CTO @ Yhat

Hi, I’m Greg!

Page 4: Analyzing mlb data with ggplot

What is ggplot?

Page 5: Analyzing mlb data with ggplot
Page 6: Analyzing mlb data with ggplot

DSL for graphics

Page 7: Analyzing mlb data with ggplot

DSL for graphics

scatterplot

histogram

labels

color

shape

Page 8: Analyzing mlb data with ggplot

What about matplotlib?

Page 9: Analyzing mlb data with ggplot
Page 10: Analyzing mlb data with ggplot

a quick example

Page 11: Analyzing mlb data with ggplot
Page 12: Analyzing mlb data with ggplot

matplotlib ggplot

Page 13: Analyzing mlb data with ggplot

it’s not all bad!

Page 14: Analyzing mlb data with ggplot

matplotlib

syntax, api, default themes, learning curve

Page 15: Analyzing mlb data with ggplot

matplotlib

maturity, ipython, customization, community

syntax, api, default themes, learning curve

Page 16: Analyzing mlb data with ggplot

What about d3.js?

Page 17: Analyzing mlb data with ggplot

d3.js

Page 18: Analyzing mlb data with ggplot

ggplot

Page 19: Analyzing mlb data with ggplot

ggplot d3.js

Page 20: Analyzing mlb data with ggplot

How it works

Page 21: Analyzing mlb data with ggplot

Format

Page 22: Analyzing mlb data with ggplot

ggplot

Page 23: Analyzing mlb data with ggplot
Page 24: Analyzing mlb data with ggplot

data frame

Page 25: Analyzing mlb data with ggplot

“aesthetics”

Page 26: Analyzing mlb data with ggplot

Aesthetics

Page 27: Analyzing mlb data with ggplot
Page 28: Analyzing mlb data with ggplot
Page 29: Analyzing mlb data with ggplot
Page 30: Analyzing mlb data with ggplot

color

Page 31: Analyzing mlb data with ggplot

shape

Page 32: Analyzing mlb data with ggplot

size

Page 33: Analyzing mlb data with ggplot

...fill, alpha, slope, intercept, ymin,

ymax, ...

Page 34: Analyzing mlb data with ggplot

Geoms, Stats, & Scales

Page 35: Analyzing mlb data with ggplot

geom_point

Page 36: Analyzing mlb data with ggplot

geom_area

Page 37: Analyzing mlb data with ggplot

...there are many

Page 38: Analyzing mlb data with ggplot

stat_smooth

Page 39: Analyzing mlb data with ggplot

...there are a few

Page 40: Analyzing mlb data with ggplot

scale_color_brewer

Page 41: Analyzing mlb data with ggplot

scale_color_gradient

Page 42: Analyzing mlb data with ggplot

...there are many

Page 43: Analyzing mlb data with ggplot

Layers

Page 44: Analyzing mlb data with ggplot

ggplot()

Page 45: Analyzing mlb data with ggplot

+ggplot() geom_point()

Page 46: Analyzing mlb data with ggplot

+ +ggplot() geom_point() stat_smooth()

Page 47: Analyzing mlb data with ggplot

+ +ggplot() geom_point() stat_smooth()+ +

Page 48: Analyzing mlb data with ggplot

ggplot() + geom_point() + stat_smooth()

Page 49: Analyzing mlb data with ggplot

Why is this good?

Page 50: Analyzing mlb data with ggplot

Makes “reasonable assumptions”

Page 51: Analyzing mlb data with ggplot

not real colors

Page 52: Analyzing mlb data with ggplot

matplotlib freaks

Page 53: Analyzing mlb data with ggplot

still not real colors

...but i can guess what you mean

Page 54: Analyzing mlb data with ggplot
Page 55: Analyzing mlb data with ggplot

Concise yet expressive

Page 56: Analyzing mlb data with ggplot
Page 57: Analyzing mlb data with ggplot
Page 58: Analyzing mlb data with ggplot
Page 59: Analyzing mlb data with ggplot

Looks pretty good(and is easy to customize)

Page 60: Analyzing mlb data with ggplot
Page 61: Analyzing mlb data with ggplot

Seaborngithub.com/mwaskom/seaborn

Page 62: Analyzing mlb data with ggplot

Case Study

Page 63: Analyzing mlb data with ggplot
Page 64: Analyzing mlb data with ggplot
Page 65: Analyzing mlb data with ggplot
Page 66: Analyzing mlb data with ggplot

pitch speed

Page 67: Analyzing mlb data with ggplot
Page 68: Analyzing mlb data with ggplot

103.4 mph

Page 69: Analyzing mlb data with ggplot
Page 70: Analyzing mlb data with ggplot

Load ggplot and pandas

Page 71: Analyzing mlb data with ggplot

Read in our pitch f/x data

Page 72: Analyzing mlb data with ggplot

define the x-axis

pass in your data frame

Page 73: Analyzing mlb data with ggplot

add a histogram

Page 74: Analyzing mlb data with ggplot

How does fatigue impact velocity?

Page 75: Analyzing mlb data with ggplot

...not helpful

Page 76: Analyzing mlb data with ggplot
Page 77: Analyzing mlb data with ggplot
Page 78: Analyzing mlb data with ggplot

What about at the individual level?

Page 79: Analyzing mlb data with ggplot
Page 80: Analyzing mlb data with ggplot
Page 81: Analyzing mlb data with ggplot

Justin Verlander

Page 82: Analyzing mlb data with ggplot
Page 83: Analyzing mlb data with ggplot
Page 84: Analyzing mlb data with ggplot
Page 85: Analyzing mlb data with ggplot

ggplot let’s you fail quicker

Page 86: Analyzing mlb data with ggplot

Finding Help

Page 88: Analyzing mlb data with ggplot

http://ggplot.yhathq.com

Page 89: Analyzing mlb data with ggplot

What’s next?

Page 90: Analyzing mlb data with ggplot
Page 91: Analyzing mlb data with ggplot

Thanks!@theglamp

[email protected]