using r to win kaggle data mining competitions chris raimondi november 1, 2012
TRANSCRIPT
- Slide 1
- Using R to win Kaggle Data Mining Competitions Chris Raimondi November 1, 2012
- Slide 2
- Overview of talk What I hope you get out of this talk Life before R Simple model example R programming language Background/Stats/Info How to get started Kaggle
- Slide 3
- Overview of talk Individual Kaggle competitions HIV Progression Chess Mapping Dark Matter Dunnhumbys Shoppers Challenge Online Product Sales
- Slide 4
- What I want you to leave with Belief that you dont need to be a statistician to use R - NOR do you need to fully understand Machine Learning in order to use it Motivation to use Kaggle competitions to learn R Knowledge on how to start
- Slide 5
- My life before R Lots of Excel Had tried programming in the past got frustrated Read NY Times article in January 2009 about R & Google Installed R, but gave up after a couple minutes Months later
- Slide 6
- My life before R Using Excel to run PageRank calculations that took hours and was very messy Was experimenting with Pajek a windows based Network/Link analysis program Was looking for a similar program that did PageRank calculations Revisited R as a possibility
- Slide 7
- My life before R Came across R Graph Gallery Saw this graph
- Slide 8
- Slide 9
- Addicted to R in one line of code pairs(iris[1:4], main="Edgar Anderson's Iris Data", pch=21, bg=c("red", "green3", "blue")[unclass(iris$Species)]) pairs = function iris = dataframe
- Slide 10
- What do we want to do with R? Machine learning a.k.a. or more specifically Making models We want to TRAIN a set of data with KNOWN answers/outcomes In order to PREDICT the answer/outcome to similar data where the answer is not known
- Slide 11
- Slide 12
- How to train a model R allows for the training of models using probably over 100 different machine learning methods To train a model you need to provide 1.Name of the function which machine learning method 2.Name of Dataset 3.What is your response variable and what features are you going to use
- Slide 13
- Example machine learning methods available in R BaggingPartial Least Squares Boosted TreesPrincipal Component Regression Elastic NetProjection Pursuit Regression Gaussian ProcessesQuadratic Discriminant Analysis Generalized additive modelRandom Forests Generalized linear modelRecursive Partitioning K Nearest NeighborRule-Based Models Linear RegressionSelf-Organizing Maps Nearest Shrunken CentroidsSparse Linear Discriminant Analysis Neural NetworksSupport Vector Machines
- Slide 14
- Code used to train decision tree library(party) irisct