kaggle facebook recruiting contest
TRANSCRIPT
Kaggle Facebook Recruiting Contest
THE DATA
• Data set: 7 million + stack overflow posts with tags
• Training set: 2 million stack overflow posts without tags
• Goal: supply tags for training set stack overflow posts
ATTACK 1
• Build Baysian classifiers for each tag in training set
• 1) tried applying entire posts to each classifier– Classifiers became too large and slow
• 2) Apply POS tagger to posts and only use nouns to train classifiers– Classifiers became too large and slow
ATTACK 2
• Text search of training set posts for list of high frequency training set tags
• 1) Simple application caused too many false positives
• 2) Finally rated each tag in list based on false positive to positive ration and removed problematic tags from list
284/367 - hoorah