algorithm for baseball prediction
Post on 13-Jul-2015
1.069 Views
Preview:
TRANSCRIPT
BASEBALL PREDICTIONTWITTER PROJECT @HITORNOT_MLB
CONTACT
FB: /JOOYEONK2
E-MAIL: JYSCARDIOID AT GMAIL DOT COM
13年11月20日水曜日
AGENDA
OVERVIEW & OBJECTIVE
PREPARATION FOR TRAINING DATA
ALGORITHM
REAL-TIME DATA RETRIEVING
STRATEGY TO GAIN TWITTER FOLLOWERS
FUTURE WORKS
13年11月20日水曜日
OVERVIEW
CURRENT # FOLLOWERS: 92
TOTAL # OF BROADCASTING : 3 TIMES (SINCE THE MID-PART OF THE WORLD SERIES)
CONCEPT:
I ASSUMED THAT A ‘HIT’ IS MADE IN THE GAME BY THE COLLABORATION BETWEEN BATTER AND PITCHER, AND THEREFORE CONSIDERED THE STATISTICS OF THESE TWO AGENTS.
13年11月20日水曜日
PROVIDING REAL-TIME PREDICTIONS WITH HIGH ACCURACY
• LINEAR REGRESSION, SVR
• AVG, BABIP, K%
• HOME/AWAY, RECENT CONDITION, VS R, L
CREATING VALUABLE SOCIAL NETWORK FOR MYSELF
• BASEBALL LOVERS HANG OUT
• STATISTICS LOVERS GAINING KNOWLEDGE
OBJECTIVE OF THIS PROJECT
GAIN ATTENTION:BIGGER NETWORK
FEEDBACK:BATTER ACCURACY
13年11月20日水曜日
PREPARATION FOR TRAINING DATA
DATA RETRIEVED FROM WWW.FANGRAPHS.COM
1. GET BATTER & PITCHER’S STATS
AVG, K%, BABIP% FOR SEASONAL, RECENT (WITHIN A MONTH), HOME/AWAY,
AND VS RIGHTY OR LEFTY STATS : A TOTAL OF (3*2)*4=24 FEATURES
EX)WWW.FANGRAPHS.COM/LEADERS.ASPX?POS=ALL&STATS=BAT&LG=ALL&QUAL=Y&TYPE=C,
35,23,41&SEASON=2013&MONTH=0&SEASON1=2013&IND=0&TEAM=0&ROST=0&AGE=0&FILTER=&PLAYERS=0
CAN EXPORT THE DATA INTO
CSV FORMAT BY
CLICKING THIS
13年11月20日水曜日
PREPARATION FOR TRAINING DATA
DATA RETRIEVED FROM WWW.FANGRAPHS.COM
2. GET RESULTS
FANGRAPHS.COM → PLAYER’S PAGE → GAME LOG
EX) BATTER:YASIEL PUIG(RIGHTY), PITCHER: JEFF FRANCIS(LEFTY) HOME/AWAY: HOME FOR DODGERS, RESULT: STRIKE OUT
BOTTOM INNING: HOME FOR BATTER
TAKE THIS INFO!
13年11月20日水曜日
PREPARATION FOR TRAINING DATA
DATA RETRIEVED FROM WWW.FANGRAPHS.COM
3. SUM 1 AND 2 TO MAKE FEATURES
EX) YASIEL PUIG VS JOSE FERNANDEZ AT DODGERS STADIUM
SeasonalSeasonalSeasonalSeasonalSeasonalSeasonal Home/AwayHome/AwayHome/AwayHome/AwayHome/AwayHome/Away vs R, Lvs R, Lvs R, Lvs R, Lvs R, Lvs R, L RecentRecentRecentRecentRecentRecent
batterbatterbatter pitcherpitcherpitcher batterbatterbatter pitcherpitcherpitcher batterbatterbatter pitcherpitcherpitcher batterbatterbatter pitcherpitcherpitcher0.324 0.356 23.2 1
AVGBABIP
K% CONSTANT
100 for hit, 0 for out
X=
Y=
13年11月20日水曜日
ALGORITHM FOR PREDICTION
TWO CANDIDATES: SUPPORT VECTOR REGRESSION(SVR) AND LINEAR REGRESSION
OPTIMIZATION FOR SVR (RBF KERNEL): GRID SERACH
MEASUREMENT OF ERROR: MEAN SQUARE ERROR
IN-SAMPLE ERROR: SVR MUCH BETTER THAN LINEAR REGRESSION
OUT-OF SAMPLE ERROR: SVR HAD A SLIGHT EDGE ON LINEAR REGRESSION.
※BASELINE: ALWAYS GIVE 50% FOR PREDICTION REGARDLESS OF THE GIVEN CONDITIONS
Error =1
N
NX
i=1
(yi � pi)2
!
WHERE YI IS THE ACTUAL RESULT AND PI IS THE OUTCOME. BOTH RANGES FROM 0 TO 100
mse Baseline lin-reg svr
in-sample
out-of-sample
2500.0 2229.8 1021.8
2500.0 2258.3 2230.3
13年11月20日水曜日
REAL-TIME DATA RETRIEVAL AND PREDICTION
1. USING THE TEXT DATA BROADCASTED BY WWW.SPORTS.YAHOO.COM/MLB/
2. RETRIEVE LOCATION (HOME/AWAY), CURRENT PITCHER, BATTER’S NAME TO MAKE THE FEATURE SPACE FOR EVERY BAT (EVERY UPDATE)
3. USE TWITTER API TO AUTOMATICALLY REPORT THE RESULT TO TWITTER
FIANL RESULT!
13年11月20日水曜日
STRATEGY TO GAIN TWITTER FOLLOWERS
THE SERVICE WAS LAUNCHED IN THE MIDDLE OF WORLD SERIES BETWEEN THE RED SOX VS THE CARDINALS
THE TARGETS WERE 3 GROUPS: CARDINALS FANS, RED SOX FANS, AND PEOPLE WHO ARE INTERESTED IN STATISTICS AND MACHINE LEARNING
SEARCHING KEYWORDS → FOLLOWING USERS WHO ARE LIKELY TO FOLLOW BACK
USED HASH TAGS SUCH AS #WORLDSERIUS, #STLCARDS, #REDSOX
13年11月20日水曜日
top related