industrialize sentiment analysis for comment moderation

13
Industrialize Sentiment Analysis for Comment Moderation Maggie Xiong Huffington Post

Upload: maggiexyz

Post on 25-Jul-2015

110 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Industrialize Sentiment Analysis for Comment Moderation

Industrialize Sentiment Analysis

for Comment Moderation

Maggie Xiong

Huffington Post

Page 2: Industrialize Sentiment Analysis for Comment Moderation
Page 3: Industrialize Sentiment Analysis for Comment Moderation

Basic Comment Moderation Process

User comments on an article Moderator publishes or rejects a comment based on a

set of guidelines “10 commandments”

Comments for different articles come in every second. We would need a small army to handle the moderation.

The comment should contribute to the discussion, conveying a respectful message, thought or idea, whether or not it agrees with another user or the author.

The comment should not intentionally misspell words, use non-alphabetic characters, or use extra or missing spaces to bypass moderation.

The comment should not attack, demean, belittle, or stereotype any person or group....

Page 4: Industrialize Sentiment Analysis for Comment Moderation

JuLiA to the Rescue

Sentiment analysis suite - JuLiA Supports various preprocessing options

Stemming, stopwords, etc Includes a number of popular ML algorithms

SVM, naïve Bayes, AdaBoost (decision tree), etc Uses hadoop for parallelizing the training of different

models and for the exploration of the parameter space Train 1000's of models with different param setup in parallel Pick the winner for production Ensemble the different winners for even higher accuracy

Page 5: Industrialize Sentiment Analysis for Comment Moderation

Training Data

Goldset About 20000 comments (~13000 train, ~7000 holdout) Publish-or-reject votes from 3 moderators

Christian and Gay? One Politician's Personal Interview (VIDEO)I'm curious if you have ever watched the film "For The Bible Tells Me So" or if you have read the book "Torn" by Justin Lee. Bottom line: Biblical interpretation varies. If that's your interpretation of the scripture then make sure you abide by it.

Rick Santorum On Middle Class: 'That's Marxism Talk,' 'There's No Class In America'what an angry petty little man he is. issues too. lots of issues he needs to work on. He certainly has nothing of value to offer or to say. he's a screwed up little prick

Paul Ryan Spending Cuts Face Backlash From Moderate RepublicansYou seem to take a negative view of democrats and draw reference to a study "I co-authored with Robert Book".....sort of like a Muslim professor writing a book on Christianity your biases disqualify you from offering anything other than a self serving opinion....now of course I'm just using republican/fox news logic here"

Page 6: Industrialize Sentiment Analysis for Comment Moderation

Training Process

73 923 balanced_winnow 5 1 10 …73 923 balanced_winnow 5 2 10 …73 923 balanced_winnow 5 3 10 …73 923 balanced_winnow 5 1 20 …73 923 balanced_winnow 5 2 20 …73 923 balanced_winnow 5 3 20 …

Train Request (a parameter set per line)

Investments are taxed as capital gains..... 1It was the overleveraged and underregulated banks … 1I am afraid we may be headed for … 1In the famous words of Homer Simpson, “it takes 2 to lie …” 0

Training Data

Model 1Model 1

Model 2Model 2

Model 3Model 3

Model 4Model 4

Model 5Model 5

Model kModel k

Hadoop Cluster

Page 7: Industrialize Sentiment Analysis for Comment Moderation

Results

Single best model: Naïve Bayes

Page 8: Industrialize Sentiment Analysis for Comment Moderation

Results

Model decision on goldset approved comments

Model decision on goldset rejected comments

Page 9: Industrialize Sentiment Analysis for Comment Moderation

Pool for Better Results

Logistic regression using multiple model results

Page 10: Industrialize Sentiment Analysis for Comment Moderation

Pool for Better Results

Model decision on goldset approved comments

Model decision on goldset rejected comments

Page 11: Industrialize Sentiment Analysis for Comment Moderation

Further Steps

Improve the training data set Data gathered within moderators' normal work flow More votes per comment More comments

Per vertical models Incorporate comment-to-article similarity

Page 12: Industrialize Sentiment Analysis for Comment Moderation

In addition to saving his own life, Zimmerman likely save a couple other lives as well.

Page 13: Industrialize Sentiment Analysis for Comment Moderation

Thanks!

Conversation and Machine Learning teams We are hiring!

[email protected]