data analysis of tennis matches
DESCRIPTION
Data Analysis of Tennis Matches. Fatih Çalışır. Domain of the Data. ATP World Tour 250 ATP 250 Brisbane ATP 250 Sydney ... ATP World Tour 500 ATP 500 Memphis ATP 500 Dubai. 4 Types of Tennis Tournaments. Domain of the Data. ATP World Tour 1000 ATP 1000 Paris ATP 1000 Shanghai ... - PowerPoint PPT PresentationTRANSCRIPT
Data Analysis of Tennis Matches
Fatih Çalışır
1. ATP World Tour 250 ATP 250 Brisbane ATP 250 Sydney ...
2. ATP World Tour 500 ATP 500 Memphis ATP 500 Dubai
Domain of the Data4 Types of Tennis Tournaments
3. ATP World Tour 1000 ATP 1000 Paris ATP 1000 Shanghai ...
4. Grand Slams Australian Open Roland Garros Wimbeldon US Open
Domain of the Data
• Men’s Single• Year 2010• 11 ATP 500 Tournament• 9 ATP 1000 Tournament• 4 Grand Slams
Domain of the Data
Source of DataInternetOfficial Websites of the Players
ATP(Association of Tennis Professionals) Homa Page
2010 Result Archive
Data ConstructionFrom different tablesEach table from different
websiteCombining easily
Data ConstructionPlayers Table
Data ConstructionTournament Results Table
Data ConstructionTournament Info Table
Data ConstructionFinal Data Table29 features1453 instances
Aim of the ProjectClassification
Finding weights for attributes
Missing ValuesPlayers’ HeightPlayers’ WeightPlayers’ BMIPlayers’ Date of being
Professional
Missing ValuesPlayers’ HeightConsider players with same weight
Take the averagePlayers’ WeightConsider players with same height
Take the average
Missing ValuesPlayers’ Height and WeightIf both of them are missingRemove the row
Players’ Date of beign ProfessionalConsider players with same ageTake the average
Data UnderstandingMin,Max,Median,Average
values for numeric attributes
Data UnderstandingOccurrence table for categorical
and numeric attributes
Data UnderstandingHistogram for numeric attributes
Data UnderstandingBox Plot for main characteristics
of numerical attributes
Data UnderstandingScatter Plot to relate two
attributes
Feature SelectionLinear Correlation
Feature SelectionBackward EleminationNaive Bayes for Ranking
Feature Selection28 attributes reduced to 19
attributesAtrributes are meaningful
Weight of AttributesRIMARC to find weights
ClassificationKNIME
Decision Tree – C4.5
Gain Ratio Qualitiy Meauser
Classification1017 instances for training436 instances for testing842 positive instances611 negative instancesTraining and test data is
randomly selected
ClassificationDecision Tree
Classification
ClassificationConfusion Matrix
ClassificationConfusion Matrix
ClassificationAccuracy Statistics
ClassificationNaive Bayes ClassifierConfusion Matrix
ClassificationConfusion Matrix
ClassificationAccuracy Statistics
ClassificationC4.5 vs Naive Bayes
Decision Tree (C4.5) Naive Bayes
ClassificationC4.5 vs Naive Bayes
Decision Tree (C4.5)
Naive Bayes