data analysis of tennis matches

Post on 22-Feb-2016

56 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Data Analysis of Tennis Matches. Fatih Çalışır. Domain of the Data. ATP World Tour 250 ATP 250 Brisbane ATP 250 Sydney ... ATP World Tour 500 ATP 500 Memphis ATP 500 Dubai. 4 Types of Tennis Tournaments. Domain of the Data. ATP World Tour 1000 ATP 1000 Paris ATP 1000 Shanghai ... - PowerPoint PPT Presentation

TRANSCRIPT

Data Analysis of Tennis Matches

Fatih Çalışır

1. ATP World Tour 250 ATP 250 Brisbane ATP 250 Sydney ...

2. ATP World Tour 500 ATP 500 Memphis ATP 500 Dubai

Domain of the Data4 Types of Tennis Tournaments

3. ATP World Tour 1000 ATP 1000 Paris ATP 1000 Shanghai ...

4. Grand Slams Australian Open Roland Garros Wimbeldon US Open

Domain of the Data

• Men’s Single• Year 2010• 11 ATP 500 Tournament• 9 ATP 1000 Tournament• 4 Grand Slams

Domain of the Data

Source of DataInternetOfficial Websites of the Players

ATP(Association of Tennis Professionals) Homa Page

2010 Result Archive

Data ConstructionFrom different tablesEach table from different

websiteCombining easily

Data ConstructionPlayers Table

Data ConstructionTournament Results Table

Data ConstructionTournament Info Table

Data ConstructionFinal Data Table29 features1453 instances

Aim of the ProjectClassification

Finding weights for attributes

Missing ValuesPlayers’ HeightPlayers’ WeightPlayers’ BMIPlayers’ Date of being

Professional

Missing ValuesPlayers’ HeightConsider players with same weight

Take the averagePlayers’ WeightConsider players with same height

Take the average

Missing ValuesPlayers’ Height and WeightIf both of them are missingRemove the row

Players’ Date of beign ProfessionalConsider players with same ageTake the average

Data UnderstandingMin,Max,Median,Average

values for numeric attributes

Data UnderstandingOccurrence table for categorical

and numeric attributes

Data UnderstandingHistogram for numeric attributes

Data UnderstandingBox Plot for main characteristics

of numerical attributes

Data UnderstandingScatter Plot to relate two

attributes

Feature SelectionLinear Correlation

Feature SelectionBackward EleminationNaive Bayes for Ranking

Feature Selection28 attributes reduced to 19

attributesAtrributes are meaningful

Weight of AttributesRIMARC to find weights

ClassificationKNIME

Decision Tree – C4.5

Gain Ratio Qualitiy Meauser

Classification1017 instances for training436 instances for testing842 positive instances611 negative instancesTraining and test data is

randomly selected

ClassificationDecision Tree

Classification

ClassificationConfusion Matrix

ClassificationConfusion Matrix

ClassificationAccuracy Statistics

ClassificationNaive Bayes ClassifierConfusion Matrix

ClassificationConfusion Matrix

ClassificationAccuracy Statistics

ClassificationC4.5 vs Naive Bayes

Decision Tree (C4.5) Naive Bayes

ClassificationC4.5 vs Naive Bayes

Decision Tree (C4.5)

Naive Bayes

top related