what does it take to win the kaggle/yandex competition

13
WHAT DOES IT TAKE TO WIN THE KAGGLE/YANDEX COMPETITION Christophe Bourguignat Kenji Lefèvre-Hasegawa Paul Masurel @Dataiku Matthieu Scordia @Dataiku

Upload: kenjil

Post on 27-Jan-2015

115 views

Category:

Technology


2 download

DESCRIPTION

A feedback on how we won Kaggle/Yandex competition

TRANSCRIPT

Page 1: What does it take to win the Kaggle/Yandex competition

WHAT DOES IT TAKE TO WIN THE KAGGLE/YANDEX COMPETITION

Christophe BourguignatKenji Lefèvre-HasegawaPaul Masurel @DataikuMatthieu Scordia @Dataiku

Page 2: What does it take to win the Kaggle/Yandex competition

OUTLINE OF THE TALK

• Review of the Kaggle/Yandex Challenge• How we worked (team work & tools)• The winning model

Page 3: What does it take to win the Kaggle/Yandex competition

GOAL Re-rank URLs returned by Yandex according to the personal preferences of the users

url1

url2

url3

url4

url3

url2

url1

url4

GOAL

ML CHALLENGE Predict user’s pertinence for urls and rerank result set accordingly

The Kaggle/Yandex challenge

Page 4: What does it take to win the Kaggle/Yandex competition

GIVEN• 30 days logs test: 3 days, train: 27 days

• Users historic queries, clicks & dwell-times

• Test session prior activity queries, clicks & dwell-times

SIZE• 15Gb size

The Kaggle/Yandex challenge

Q Q T ?Test session :

Q Q Q Q

Page 5: What does it take to win the Kaggle/Yandex competition

QUALITY METRIC

• One query test / user on the last 3 days• NDCG metric penalize error of pertinence on top ranked

urls

• No A/B test

The Kaggle/Yandex challenge

url1

url2

url3

url4

url3

url2

url1

url4

url1

url2

url4

url3

Prediction Another rankingKaggle

BADOK

Page 6: What does it take to win the Kaggle/Yandex competition

TEAM DATAIKU SCIENCE STUDIO / KAGGLE

• Christophe Bourguignat Engineer, Data enthusiastic

• Kenji Lefèvre-Hasegawa Ph.D. math, new to ML

• Paul Masurel Software Engineer @dataiku

• Matthieu Scordia Data Scientist @dataiku

First meeting : October16th 2013

How we worked (Team work & tools)

Page 7: What does it take to win the Kaggle/Yandex competition

WE’VE USED

• Related papers (mainly Microsoft’s)• 12 core, 64 Gb• Python scikit-learn• Dataiku Science Studio• Java Ranklib

How we worked (Team work & tools)

Page 8: What does it take to win the Kaggle/Yandex competition

DATAIKU SCIENCE STUDIO

How we worked (Team work & tools)

LEARNING

Team members work independantly

Original train

Split train & validation

Labels

Featu

res &

labels

FEATURES CONSTRUCTION

Team members work independantly

Features

DATA DRIVEN COMPUTATION

Page 9: What does it take to win the Kaggle/Yandex competition

HOW MUCH WORK ?• 960+ emails • 360+ features• 50+ ideas grid tuned (300+ models fitted)

• Server heavily loaded the last 3 weeks • 56 kaggle submissions• 196 teams, 264 players, 3570 submissions

How we worked (Team work & tools)

1/2 month 1 week 1 week 1 week

Top 25

Top 10

5th

1st

3rd

1st

2014-01-01Future top 2 & 3

enter race

Page 10: What does it take to win the Kaggle/Yandex competition

PROBLEM ANALYSIS

Query

Result Set• Rank• URL Snippet Quality• URL is skipped, clicked or missed

Reading URL• URL & Domain pertinence with dwell-time

CLICK

The winning model

Page 11: What does it take to win the Kaggle/Yandex competition

FEATURESFeatures :• Rank• User habits, query specificity (entropy, frequency,…)• Snippet pertinence• Missed, Skipped, Clicked• URL & Domain Pertinence

Declinaison of & Clicked• Probability, Stimuli freq., Mean Reciprocal Rank (MRR)• For each user : historic & previous activity in test session & aggregate• For all user• Declined for all queries & same query

The winning model

Page 12: What does it take to win the Kaggle/Yandex competition

MODELS

• Random Forest (predict proba)+ maximize E(NDCG)

• Lambda MART (Gradient Boosting Tree optimized for NDCG) WINS !

The winning model

Kaggle/Yandex Top 1 then 3rd

Page 13: What does it take to win the Kaggle/Yandex competition

QUESTIONS

?

Thank you !