oscon 2013 jesse anderson

35
1 Headline Goes Here Speaker Name or Subhead Goes Here DO NOT USE PUBLICLY PRIOR TO 10/23/12 Doing Data Science on the NFL Play by Play Dataset Jesse Anderson | Curriculum Developer and Instructor July 2013 v2

Upload: oscon-byrum

Post on 06-Dec-2014

2.549 views

Category:

Technology


5 download

DESCRIPTION

Jesse Anderson's OSCON 2013 talk

TRANSCRIPT

Page 1: Oscon 2013 Jesse Anderson

1

Headline Goes HereSpeaker Name or Subhead Goes Here

DO NOT USE PUBLICLY PRIOR TO 10/23/12

Doing Data Science on theNFL Play by Play DatasetJesse Anderson | Curriculum Developer and Instructor July 2013 v2

Page 2: Oscon 2013 Jesse Anderson

2

Plays

• Advanced NFL stats released all Play by Play since 2002 season• 2,898 total games• 471,392 plays

Page 3: Oscon 2013 Jesse Anderson

3

Full Play Entry20121119_CHI@SF,3,17,48,SF,CHI,3,2,76,20,0,(2:48) C.Kaepernick pass short right to M.Crabtree to SF 25 for 1 yard (C.Tillman). Caught at SF 25. 0-yds YAC,0,3,0,27,7 ,2012

Page 4: Oscon 2013 Jesse Anderson

4

Play Description

(2:48) C.Kaepernick pass short right to M.Crabtree to SF 25 for 1 yard (C.Tillman). Caught at SF 25. 0-yds YAC

Page 5: Oscon 2013 Jesse Anderson

5

There's A Chart for That

Page 6: Oscon 2013 Jesse Anderson

6

There's A Custom MapReduce Behind Thatpublic class IncompletesMapper extends Mapper<LongWritable, Text, Text, PassWritable> {

@Overridepublic void map(LongWritable key, Text value, Context context) throws

IOException, InterruptedException {String line = value.toString();

if (line.contains("incomplete")) {Matcher matcher = incompletePass.matcher(line);

if (matcher.find()) {context.write(new Text(matcher.group(1) +

"-" + matcher.group(2)), new PassWritable(1,Integer.parseInt(matcher.group(3))));

Page 7: Oscon 2013 Jesse Anderson

7

The Hive Story

Enter the Query

Page 8: Oscon 2013 Jesse Anderson

8

Queryable Data

Give me every run play by New Orleans in the 2010 season

Page 9: Oscon 2013 Jesse Anderson

9

From the Data: Fourth Downs

15% of 4th downplays weren't kicks

Page 10: Oscon 2013 Jesse Anderson

10

Play by Play Pieces

(2:48) C.Kaepernick pass short right to M.Crabtree to SF 25 for 1 yard (C.Tillman). Caught at SF 25. 0-yds YAC

Page 11: Oscon 2013 Jesse Anderson

11

From the Data: Sacks

QB sacks and scrambles

double on 3rd downs

Page 12: Oscon 2013 Jesse Anderson

12

Hive

• Abstraction on top of MapReduce• Allows queries using a SQL-like

language

Page 13: Oscon 2013 Jesse Anderson

13

Hive Query

Give me every run by New Orleans in the 2010 season:

SELECT * FROM playbyplay WHERE playtype = "RUN"and year = 2010and game like "%NO%";

Page 14: Oscon 2013 Jesse Anderson

14

From the Data: Yards to Go

With 1 yard to go, 65% of plays are runs

Page 15: Oscon 2013 Jesse Anderson

15

Lost in data

Algorithm Alone

Page 16: Oscon 2013 Jesse Anderson

16

Data Janitorial

Page 17: Oscon 2013 Jesse Anderson

17

From the Data: Number of Plays By Yard Line

9% 41% 28% 18%

3%

Direction of Offense

Page 18: Oscon 2013 Jesse Anderson

18

Stadium

Page 19: Oscon 2013 Jesse Anderson

19

Figuring Out Stadium

20121119_CHI@SF

Date Played Away Team Home Team

Page 20: Oscon 2013 Jesse Anderson

20

From the Data: Stadium Attendance

Stadiums with the smallest capacities average the best

scores 20.55-17.79

Page 21: Oscon 2013 Jesse Anderson

21

Stadium DataStadium The capacity of the stadium

Expanded Capacity The expanded capacity of the stadium

Location The location of the stadium

Playing Surface The type of grass, etc that the stadium has

Is Artificial Is the playing surface artificial

Team The name of the team that plays at the stadium

Roof Type The type of roof in the stadium (None, Retractable, Dome)

Elevation The elevation of the stadium

Page 22: Oscon 2013 Jesse Anderson

22

From the Data: Stadium Elevation

There is a 1% increase in passes at Mile High versus sea

level stadiums

Page 23: Oscon 2013 Jesse Anderson

23

Weather

1,015 games had weather

Page 24: Oscon 2013 Jesse Anderson

24

From the Data: Fumble

Games with weather have a fumble 93%

of the timecompared to 56%

without

Page 25: Oscon 2013 Jesse Anderson

25

Weather Data

STATION Station identifier

STATION NAME Station location name

READING DATE Date of reading

PRCP Precipitation

AWND Average daily wind speed

WV20 Fog, ice fog, or freezing fog (may include heavy fog)

TMAX Maximum temperature

TMIN Minimum temperature

Page 26: Oscon 2013 Jesse Anderson

26

From the Data: Home Field Advantage

Baltimore has the biggest weather advantage 22-14

Page 27: Oscon 2013 Jesse Anderson

27

Arrests

Page 28: Oscon 2013 Jesse Anderson

28

Arrest Data

Season Player Arrested in (February to February)

Team Team person played on

Player Name of player Arrested

Player Arrested Was a player in the play arrested that season

Offense Player Arrested Offense had player arrested in season

Defense Player Arrested Defense had player arrested in season

Home Team Player Arrested Home Team had player arrested in season

Away Team Player Arrested Away Team had player arrested in season

Page 29: Oscon 2013 Jesse Anderson

29

Whenever there are arrests either in the home team, away team or both,

the home team

From 2002 to 2012, each team had many arrests. From to a low in 2002 of

56% to a

91%HIGH OF57%WINS

Arrest = Win?

Page 30: Oscon 2013 Jesse Anderson

30

“ ”

The data didn't bear out some of my preconceived notions

Page 31: Oscon 2013 Jesse Anderson

31

“ ”

If we had every piece of data about a game, could we determine its outcome?

Page 32: Oscon 2013 Jesse Anderson

32

The Low Downs

• /me - http://www.jesse-anderson.com• @jessetanderson• Code - https://github.com/eljefe6a/nfldata

*I am not in any way affiliated with the NFL or any Team

Page 33: Oscon 2013 Jesse Anderson

33

Page 34: Oscon 2013 Jesse Anderson

34

From the Data: Weather

Wind had the most effect on gamesAt calm winds 41% pass and 37% runAt >30 MPH 34% pass and 46% run

Page 35: Oscon 2013 Jesse Anderson

35

From the Data: Field Goals

Weather only increases misses by %114% of Field Goals are missed21% of Field Goals are missed 30-39 MPH average winds