baseball sta)s)cs and an introducon to...

31
Baseball sta)s)cs and an introduc)on to R

Upload: others

Post on 09-Feb-2020

14 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Baseball sta)s)cs and an introducon to Remeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149...introducon to R Overview Discussion of Big Data Baseball Watch half an inning of

Baseball sta)s)cs and an introduc)on to R

Page 2: Baseball sta)s)cs and an introducon to Remeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149...introducon to R Overview Discussion of Big Data Baseball Watch half an inning of

Overview

DiscussionofBigDataBaseballWatchhalfaninningofthe2014All-stargame

Reviewofstructureddataandclassicbaseballsta?s?cs

Introduc?ontoR!

Page 3: Baseball sta)s)cs and an introducon to Remeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149...introducon to R Overview Discussion of Big Data Baseball Watch half an inning of

Discussion of Big Data Baseball chapter 1

Sta?s?cscangetusbeyondwhatwecan“see”ifwetrustthem(Phillip)•  Shouldwejusttrusttheanalyses,whataboutplayerswhohave“heart”?(James)•  Howdowemaximizeourdecisionswithbothanalysisandhumandecisions(Campbell)

Howtofindandquan?fytherelevantdata?(on-basepercentage,etc.)(Henne)•  New/differentsta?s?csandanalyzescangivepowerfulnewinsights(Aodhan)•  Newcomputa?onalsystemscanshednewinsights(Kefentse)•  Whatisthevalueofdifferenthittypes,e.g.,singlesvs.homeruns?(Julia)

Whydidn’tanyonerealizethatequidistantspacingofdefensiveplayerswassubop?mal?(Helen)

•  Yes,thedefensivechangeswillbeexplainedmoreinfuturechapters(Ian)

Howcanwemakechangesthatarewithinourreach?(Maddie)•  Withonlya$15millionbudget(Sheyla)•  Andtakingonchallengingsitua?ons(Ma_)

Rulesofthegamearethesame,butthewayplayersareacquiredhaschanged(Christopher)

Page 4: Baseball sta)s)cs and an introducon to Remeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149...introducon to R Overview Discussion of Big Data Baseball Watch half an inning of

2014 All-star game

Na#onal American Order Player Posi#on Order Player Posi#on

1 AndrewMcCutchen CF 1 DerekJeter SS

2 YasielPuig RF 2 MikeTrout LF

3 TroyTulowitzki SS 3 RobinsonCanó 2B

4 PaulGoldschmidt 1B 4 MiguelCabrera 1B

5 GiancarloStanton DH 5 JoséBau?sta RF

6 AramisRamírez 3B 6 NelsonCruz DH7 ChaseUtley 2B 7 AdamJones CF

8 JonathanLucroy C 8 JoshDonaldson 3B

9 CarlosGómez LF 9 SalvadorPérez C

AdamWainwright P FélixHernández P

Page 5: Baseball sta)s)cs and an introducon to Remeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149...introducon to R Overview Discussion of Big Data Baseball Watch half an inning of

Score card

Page 6: Baseball sta)s)cs and an introducon to Remeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149...introducon to R Overview Discussion of Big Data Baseball Watch half an inning of

sta)s)cs and structured data

sta#s#cs:anumericalsummaryofdataSta#s#cs:isthemathema?csofcollec?ng,organizingandinterpre?ngdata

Page 7: Baseball sta)s)cs and an introducon to Remeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149...introducon to R Overview Discussion of Big Data Baseball Watch half an inning of

Describing and summarizing data

sta?s?csthatareusedtosummarizeadataset(sampleofdata)arecalleddescrip#vesta#s#csExamples:

•  Maximumvalueinthedataset•  Minimumvalueinthedataset•  Meanvalueofthedataset

Page 8: Baseball sta)s)cs and an introducon to Remeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149...introducon to R Overview Discussion of Big Data Baseball Watch half an inning of

Common baseball descrip)ve sta)s)cs

G=games•  Numberofgamesaplayerpar?cipatedin(outof162gamesinaseason)

AB=atbats•  Numberof?mesaba_erwashiqngandeithergotahitorgotout(doesnotincludewalksorreachingbaseonanerror)

R=runs•  Numberofrunstheplayerscored

H=hit•  Numberof?mesaplayerhittheballongotonbaseorhitahomerun(sumof1B,2B,3B,HR)

Page 9: Baseball sta)s)cs and an introducon to Remeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149...introducon to R Overview Discussion of Big Data Baseball Watch half an inning of

Common baseball sta)s)cs

BB=baseonballs(walks)•  Numberof?mesaplayergotonbasedotothepitcherthrowing4balls

RBI=Runsba_edin•  Howmanyrunsscoredasaresultofaplayergeqngahit

SB=stolenbases•  Numberof?mesarunneradvancedby‘stealingabase’

Page 10: Baseball sta)s)cs and an introducon to Remeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149...introducon to R Overview Discussion of Big Data Baseball Watch half an inning of

Common derived baseball sta)s)cs

AVG=baqngaverage•  Hits/(Atbats)=H/AB=(1B+2B+3B+HR)/AB

SLG=sluggingpercentage•  (1*1B+2*2B+3*3B+4*4B)/AB

Page 11: Baseball sta)s)cs and an introducon to Remeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149...introducon to R Overview Discussion of Big Data Baseball Watch half an inning of

Lahman Database – Individual player yearly baIng sta)s)cs

Cases

Variables

DatatakenfromtheLahmanBaqngdataset

Page 12: Baseball sta)s)cs and an introducon to Remeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149...introducon to R Overview Discussion of Big Data Baseball Watch half an inning of

Example Dataset – Individual player yearly sta)s)cs

Cases

Variables

Page 13: Baseball sta)s)cs and an introducon to Remeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149...introducon to R Overview Discussion of Big Data Baseball Watch half an inning of

Categorical and Quan)ta)ve Variables

Cases

CategoricalVariable Quan?ta?veVariable

Page 14: Baseball sta)s)cs and an introducon to Remeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149...introducon to R Overview Discussion of Big Data Baseball Watch half an inning of

Another Dataset – 2014 Team sta)s)cs Cases

Variables

Page 15: Baseball sta)s)cs and an introducon to Remeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149...introducon to R Overview Discussion of Big Data Baseball Watch half an inning of

A Ques)on

Q:Whatprogramminglanguagedothepiratesuse?A:Arrrr

Q:Worstjokeofthesemester?A:Waitandsee…

Page 16: Baseball sta)s)cs and an introducon to Remeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149...introducon to R Overview Discussion of Big Data Baseball Watch half an inning of

Basics of R

Everyonelogonto: h_ps://asterius.hampshire.edu/Createanewscripttokeepnotesaboutyourwork

Page 17: Baseball sta)s)cs and an introducon to Remeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149...introducon to R Overview Discussion of Big Data Baseball Watch half an inning of

RStudio layout

3.Environment1.RMarkdownandscripts

2.Console4.Files,etc.

Page 18: Baseball sta)s)cs and an introducon to Remeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149...introducon to R Overview Discussion of Big Data Baseball Watch half an inning of

RStudio layout

2.Console

Rasacalculator>2+2>7*5

Page 19: Baseball sta)s)cs and an introducon to Remeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149...introducon to R Overview Discussion of Big Data Baseball Watch half an inning of

R Basics

Arithme?c:>2+2>7*5

Assignment:

>a<-4>b<-7>D<-a+b>D[1]11

Numberjourney…

Page 20: Baseball sta)s)cs and an introducon to Remeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149...introducon to R Overview Discussion of Big Data Baseball Watch half an inning of

Number journey

>a<-7>b<-52>d<-a*b>d[1]364

Page 21: Baseball sta)s)cs and an introducon to Remeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149...introducon to R Overview Discussion of Big Data Baseball Watch half an inning of

Character strings and booleans

>a<-7 >s<-"helloeveryone">b<-TRUE>class(a)[1]numeric>class(s)[1]character

Page 22: Baseball sta)s)cs and an introducon to Remeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149...introducon to R Overview Discussion of Big Data Baseball Watch half an inning of

Func)ons

Func?onsuseparenthesis:func?onName(x)>sqrt(49) >tolower("HELLOeveryone")Togethelp>?sqrtOnecanaddcommentstoyourcode>sqrt(49)#thistakesthesquarerootof49

Page 23: Baseball sta)s)cs and an introducon to Remeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149...introducon to R Overview Discussion of Big Data Baseball Watch half an inning of

GeIng help

Youcangethelpaboutafunc?oninRusingthe?command.

>?sqrt

Page 24: Baseball sta)s)cs and an introducon to Remeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149...introducon to R Overview Discussion of Big Data Baseball Watch half an inning of

Vectors

Vectorsareorderedsequencesofnumbersorle_ersThec()func?onisusedtocreatevectors

>v<-c(5,232,5,543) Onecanaccesselementsofavectorusingsquarebrackets[]>v[3]#whatwilltheanswerbe?Workswithstringstoo>z<-c("a","b","c","d")>z[3] Canaddnamestovectorelements>names(v)<-c(“first",“second",“third",“fourth")

Page 25: Baseball sta)s)cs and an introducon to Remeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149...introducon to R Overview Discussion of Big Data Baseball Watch half an inning of

Ques)on?

Q:WhatkindofgradesdidthePiratesgetinSta?s?csclass?A:HighSeas

Q:Worstjokeofthesemester?A:Staytuned…

Page 26: Baseball sta)s)cs and an introducon to Remeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149...introducon to R Overview Discussion of Big Data Baseball Watch half an inning of

Data types: data frames

DataFramesarecollec?onsofvectorsofthatsamelength.•  Eachvectorcanhaveadifferenttypeofdata

Page 27: Baseball sta)s)cs and an introducon to Remeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149...introducon to R Overview Discussion of Big Data Baseball Watch half an inning of

Let’s look at a data frame

Loadafunc?onIwroteintoRbytyping: source('/home/shared/baseball_stats_2017/baseball_class_functions.R')

Ifyouloadthiscorrectlyyoushouldhaveafunc?oninyourGlobalEnvironmentcalledget.Lahman.batting.data()

Page 28: Baseball sta)s)cs and an introducon to Remeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149...introducon to R Overview Discussion of Big Data Baseball Watch half an inning of

Let’s look at a data frame

Usethisfunc?ontogetbaqngdataonaspecificplayer: > card.data <- get.Lahman.batting.data("Kelly", "Shoppach") > View(card.data)

Page 29: Baseball sta)s)cs and an introducon to Remeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149...introducon to R Overview Discussion of Big Data Baseball Watch half an inning of

Let’s look at a data frame

Geqngnumberofgames(G)Kellyplayedeachseason: > card.data$G [1] 9 41 59 112 89 63 87 28 48 35 1

Page 30: Baseball sta)s)cs and an introducon to Remeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149...introducon to R Overview Discussion of Big Data Baseball Watch half an inning of

Compu)ng sta)s)cs

Onecomputesta?s?csonvectors(columnsofadataframe)> sum(card.data$G)

[1] 572 Or we can assign vectors in a data frame to an object > games <- card.data$G

>games

Page 31: Baseball sta)s)cs and an introducon to Remeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149...introducon to R Overview Discussion of Big Data Baseball Watch half an inning of

Prac)ce R with DataCamp!

Trychapters1and2ontheintroduc?ontoRDataCamptutorialh_ps://www.datacamp.com/courses/free-introduc?on-to-r

Readchapter2ofBigDataBaseballandpostaquoteandreac?onbymidnightonWednesday