data mining on /r/nbaeecs.csuohio.edu/~sschung/cis660/data mining final...data preprocessing gather...

Post on 10-Apr-2018

230 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

DATA MINING ON /R/NBAALEX CHENGELIS AND ANDREW YU

INTRODUCTION

� Raw data processing using an API

� Data Processing and Storage

� Comment heat maps

� Comment scores based on game action

� Word counting

� Naïve Bayes Classifier

TECHNOLOGIES USED

� Python

� NLTK

� PRAW

� Tableau

� CSV and a little Excel

DATA PREPROCESSING

� Gather data using PRAW

� Create an agent for use in Reddit’s API

� Gather URL’s to cycle through

� Write the comment, flair, and score to a CSV file

WHAT OUR DATA LOOKS LIKE

VISUALIZATION OF COMMENTS

Team City State Count Score Avg

Lakers Los AngelesCalifornia 461 7982 17.31

Hornets Charlotte North Carolina 124 3293 26.56

Celtics Boston mass 337 9083 26.95

Nuggets Denver Colorado 98 3175 32.40

Nets Brooklyn new York 65 6062 93.26

Bucks Milwaukee Wisconsin 96 1034 10.77

Pelicans New OrleansLouisiana 66 222 3.36

Bulls Chicago Illinois 320 7364 23.01

NBA 170 2077 12.22

Warriors Oakland California 310 6312 20.36

Pistons Detroit Michigan 110 1859 16.90

76ers PhiladelphiaPennsylvania 153 4922 32.17

Hawks Atlanta Georgia 104 4071 39.14

Suns Phoenix Arizona 107 295 2.76

Huskies hartford Connecticut 10 52 5.20

Grizzlies memphis Tennessee 113 408 3.61

Wizards Washington, D.C 123 1611 13.10

West 19 252 13.26

Mavericks Dallas Texas 100 3601 36.01

Heat Miami Florida 282 10087 35.77

Rockets Houston Texas 212 4929 23.25

Raptors Toronto 362 9086 25.10

Kings SacramentoCalifornia 99 1811 18.29

SupersonicsSeattle Washington 126 4173 33.12

Pacers IndianapolisIndiana 61 147 2.41

USA 10 22 2.20

Blazers Portland Oregon 157 2471 15.74

Thunder Oklahoma CityOklahoma City 276 11555 41.87

Clippers Los AngelesCalifornia 138 3935 28.51

Cavaliers Cleveland Ohio 960 15608 16.26

Spurs San Antonio Texas 310 3705 11.95

TimberwolvesMinneapolisMinnesota 168 6164 36.69

Knicks New York New york 324 4403 13.59

East 13 112 8.62

Bandwagon 227 6568 28.93

Jazz Salt Lake CityUtah 46 1649 35.85

Magic Orlando Florida 66 426 6.45

GAME 1COMMENTS

GAME 1 COMMENT SCORES

GAME 2 COMMENTS

GAME 2 COMMENT SCORES

GAME 3 COMMENTS

GAME 3 COMMENTS SCORE

GAME 4 COMMENTS

GAME 4 COMMENT SCORES

GAME 5 COMMENTS

GAME 5 COMMENT SCORES

GAME 6 COMMENTS

GAME 6 COMMENT SCORE

GAME 7 – CLEVELAND CHAMPS

GAME 7 – CLEVELAND CHAMPS

USING TIME VARIANT

DOING SOME TEXT MINING

WHAT WE DID WITH WORDS

� Tried inverted index but ran into some problems.

� 50 thousand + comments

� Took an easier term frequency while ignoring the 100 most used English words.

Word Count

game 728

lebron 641

just 402

him 338

warriors 305

cavs 303

curry 287

com 254

team 253

love 236

3 222

he's 222

finals 218

fuck 214

had 205

even 203

nba 203

think 199

shit 197

it's 195

win 190

got 188

i'm 184

best 184

http 176

's 174

don't 173

7 172

series 171

cleveland 166

fucking 165

did 163

kyrie 163

good 162

after 158

back 158

player 157

ever 157

draymond 155

last 153

too 153

CLEVELAND WINS WORD CLOUD

GOLDEN STATE WINS WORD CLOUD

CAN YOU DETERMINE WHO WON BASED ON A COMMENT?NAÏVE BAYES CLASSIFIER - BASED ON GUIDE BY ANDY BROMBERG

HTTP://ANDYBROMBERG.COM/SENTIMENT-ANALYSIS-PYTHON/

HOW WE BUILT THE NAÏVE BAYES CLASSIFIER

� Used the same Cleveland Wins and Golden State Wins text files.

� A lot like negative and positive sentiment analysis but with wins.

� Take ¾ of comments for training and ¼ for the testing

� Strip all punctuation and escape characters

CONT.

� We call the classifier that is included with NLTK, initiate the reference and test Sets and populate the them.

� Before this we actually created a function that used a chi-square test to score each word.

� Finally we actually use the classifier for predictions

RESULTS

Features Accuracy

All Words 57.713%

10 best 55.771%

100 best 60.302%

1000 best 66.235%

best 10000 60.949%

best 15000 58.360%

INTERESTING RESULTS

� Shaun Livingston

� Bench player for the Warriors

� If he is in a comment.

� 95.28% chance that the Warriors won

INTERESTING RESULTS

� Harrison Barnes

� Part time starter, part time bench players, full time punching bag

� If his name is in the comment.

� 94.68% chance CLEVELAND won

INTERESTING RESULTS

� Kyrie and LeBron

� In game 5 both score 41 points

� If 41 is in the comments

� 93.24%

MOST TELLING WORDS FOR BOTH TEAMS

Word Chance

Shaun 95.28%

fired 92.91%

range 92.00%

Thunder 91.80%

healthy 91.67%

talent 90.74%

splash 89.25%

Warriors Most UsefulWord Chance

Harrison 94.48%

41' 93.24%

Sunday 92.37%

tweet 90.74%

road 89.69%

calls 90.29%

mad 90.29%

Cleveland Most Useful

ADDING TIME TO THE EQUATION

HTML SOURCE CODE

CSV TABLE

DATA PREPROCESSING (PYTHON, EXCEL, R)

DATA VISUALIZATION (R) - GAME 1

DATA VISUALIZATION (R) - GAME 2

DATA VISUALIZATION (R) - GAME 3

DATA VISUALIZATION (R) - GAME 4

DATA VISUALIZATION (R) - GAME5

DATA VISUALIZATION (R) - GAME 6

DATA VISUALIZATION (R) - GAME 7

GAME 7 IN BROADCAST TIME(START @8PM)

REDDIT.COM/R/NBA GAMETHREAD COMMENT DENTSITY

COMMENT DENSITY: 10:28:20– 10:31:40 ET

WOW

I think i speak for the free world when I say: go not GSW

I LOVE YOU BRON BRON

Hollllly s***

HOLY s*** LEBRON

Barnes scarred of the moment

Every time KLove bricks a shot, an angel gets its wings.

Im watching a really laggy stream bro and im behind

Holy F*** Lebron

NO REGARD FOR HUMAN LIFE

HOLY F***ING s***

NAAAAAAAH GET THE F*** OUT!

OH MY GOD

HILY DUCK

HOLY s*** THAT BLOCK

Omfg.....

DAE RIGGED

Where is the Love? The Love. The Loooove....

Holy s*** this game.

HOLY s***

Why is my heart pounding!?

OH s*** ITS DAT BRON

JAMES!!

OH MY GOD

WOW

lebron!!!

HOW THE F***

WOWOW

this defense is so sexy

Can anyone hit a shot

top related