relevance based ranking of video comments on youtube

20
Authors University Politehnica of Bucharest Relevance-Based Ranking of Video Comments on YouTube Andrei Șerbănoiu Traian Rebedea [email protected]

Upload: traian-rebedea

Post on 25-May-2015

1.192 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Relevance based ranking of video comments on YouTube

Authors

University Politehnica of Bucharest

Relevance-Based Ranking of Video Comments on YouTube

Andrei ȘerbănoiuTraian Rebedea [email protected]

Page 2: Relevance based ranking of video comments on YouTube

Overview

• Introduction• Motivation• System architecture • Classification of relevant comments• Ranking of relevant comments• Results• Conclusions

12.04.23 Sesiunea de Licenţe - Iulie 2012 2

Page 3: Relevance based ranking of video comments on YouTube

Introduction• Text classification and raking for comments on YouTube

videos– First: classification whether the comment is relevant or not

for the given video file– Second: ranking the relevant comments

• Focus on identifying relevant information• Comments have a very small number of words –

sometimes less than 10, on average of the order of tens

• Relevance is evaluated with respect to the information collected from other online sources about the video

12.04.23 CSCS 2013 – Bucharest, Romania 3

Page 4: Relevance based ranking of video comments on YouTube

Existing research• We have not been able to identify any previous

research in the direction of identifying relevant comments

• YouTube research– Identify the relevant features of community acceptance

(comments with many “likes”)– Extract the sentiment orientation– Differentiate between clean and noisy comments

• Other research– Ranking Comments on the Social Web (uses Digg)

12.04.23 CSCS 2013 – Bucharest, Romania 4

Page 5: Relevance based ranking of video comments on YouTube

Motivation

• The Police – Every Breath You Take

12.04.23 CSCS 2013 – Bucharest, Romania 5

Page 6: Relevance based ranking of video comments on YouTube

Motivation

• Most commented video• “10 questions that every intelligent Christian

must answer”• 1,429,425 comments on 30th May 2013 (early

morning)

• How many of these comments are spam?• Which ones would be most relevant to the

video?12.04.23 CSCS 2013 – Bucharest, Romania 6

Page 7: Relevance based ranking of video comments on YouTube

Solution• Ranking of the comments according to relevance

• Steps:1. Automatically link video with other online sources

relevant to it2. Filter comments to remove noisy comments3. Rank the remaining comments according to

relevance computed using NLP techniques

• Our solution works for music videos

12.04.23 CSCS 2013 – Bucharest, Romania 7

Page 8: Relevance based ranking of video comments on YouTube

System architecture

12.04.23 CSCS 2013 – Bucharest, Romania 8

Page 9: Relevance based ranking of video comments on YouTube

Processing pipeline

comments = fetchYouTubeComments();comments = filterComments(comments);commentTopics=createCommentTopics(comments)resources = getResources(wikipedia,allmusic,lyrics);for(int i=0;i<commentTopics.length;i++){

computeRelevance(commentTopics[i], resources);}

12.04.23 CSCS 2013 – Bucharest, Romania 9

Page 10: Relevance based ranking of video comments on YouTube

Preprocessing

• Comments retrieved with YouTube Data API– Only used last 100 comments per video

• Filter comments not written in English using JLangDetect

• Extracted the main topics for each comment using Mallet => 5 topics per comment

• Expanding the topics with synonyms and hypernyms from WordNet

12.04.23 CSCS 2013 – Bucharest, Romania 10

Page 11: Relevance based ranking of video comments on YouTube

Pre-classification of comments• Objective: to reduce the number of comments considered for

ranking by identifying noise• Classification based on a neural network by using a set of

simple linguistic features• Multilayered Perceptron implemented in Weka

• Features– Number of non-ASCII characters– Number of capital letters– Number of newlines– Number of digits– Number of trivial and swear-words– Number of words in comment– Average word size– Number of punctuation marks– Common text spam count

12.04.23 CSCS 2013 – Bucharest, Romania 11

Page 12: Relevance based ranking of video comments on YouTube

Pre-classification of comments

• Trained on a small corpus with 100 relevant comments and 100 noisy comments

• Examples of noisy comments:– "Step 1: Pause this videoStep 2: Google 'Rainymood'Step 3: Click the first linkStep 4: Unpause this videoStep 5: Thumbs? up this comment, enjoy and thankme later"– "Those 3,175 haters listen to? 'Techno'. “– " IF YOU LIKE DIRTY DIANA SONG THE SINGER '' STEFANO GIORGINI '' DID A GREAT? REMAKE

STEFANO IS A VERY GOOD SINGER SONGWRITER I THINK YOU WILL LIKE HIS VERSION JUST LOOK FOR '' STEFANO GIORGINI '' DIRTY DIANA" "

12.04.23 CSCS 2013 – Bucharest, Romania 12

Page 13: Relevance based ranking of video comments on YouTube

Pre-classification of comments

• Results of pre-classification stage

12.04.23 CSCS 2013 – Bucharest, Romania 13

Type of Instances No. Instances %

Correctly Classified Instances 174 87.46

Incorrectly Classified Instances 26 12.54

Total Number of Instances 200 -

Page 14: Relevance based ranking of video comments on YouTube

Relevance scoring stage

• Initial approach• Extract topics from comments as previously

mentioned (Mallet + WordNet)

• Fetch Wikipedia articles for artist and song name

• Score computed based in number of appearances of the topics from the comments in the articles

12.04.23 CSCS 2013 – Bucharest, Romania 14

Page 15: Relevance based ranking of video comments on YouTube

Relevance scoring stage

• Second approach: topic-based scoring• Similar to the previous one, but topics are also

extracted from the Wikipedia articles with Mallet

• Scoring is done based on:– Number of topics extracted from each comment– Wikipedia topic matches for each comment

12.04.23 CSCS 2013 – Bucharest, Romania 15

Page 16: Relevance based ranking of video comments on YouTube

Relevance scoring stage• Third approach• Multiple-source topic-based scoring

• Additional source added to the Wikipedia articles– Information from allmusic.com website on artists and

songs– Information from song lyrics

• Topics matched between comments and Wikipedia + Allmusic articles, plus exact match of lyrics

• Final relevance score is a weighted sum of the previous factors

12.04.23 CSCS 2013 – Bucharest, Romania 16

Page 17: Relevance based ranking of video comments on YouTube

Results

12.04.23 CSCS 2013 – Bucharest, Romania 17

Comment Relevance

maybe your friend should know that being english, have a picture in abbey road and "sing" all you need is love" won't make one direction? a group like the beatles...

662

my mom said she doesn't like the beatles and she said that john was only good to look at? not to hear. my dad said, " haha so true!." i'm an orphan now.

968

you shouldn't be listening to the beatles since these seem to turn your friends into enemies! beatles are all about peace!? you are not getting their message!

983

please read this ! hey i know u just wanna listen to the song but i still have to write this hoping someone will see it and that someone will care .i'm a? young musician from croatia so this spam is my only chance to get noticed.please check out my channel and i promise u won't be sorry.i appreciate your time because music means everything to me, thank you! ?

1309

i didn't mean fight other places. i meant focus on the hurt people in your own country first, then expand to the others. if people don't agree with peace that's an opinion. not a fact, and people often take offense to opinions. there isn't? anything to take offense to, they say something that's all it is. they said it, don't put meaning to it. world peace - i meant the whole world having peace there

1639

Page 18: Relevance based ranking of video comments on YouTube

Results

• Difficult to assess whether the impact of the relevance measure

• Interpreting the comments is subjective – Need human annotators

• The order of the comments is completely different from the one presented now on YouTube (correlation lower than 0.031 for the first 100 comments)

• Method 1 is also not correlated with the other two methods

• Methods 2 and 3 have a higher correlation: 0.124

12.04.23 CSCS 2013 – Bucharest, Romania 18

Page 19: Relevance based ranking of video comments on YouTube

Conclusions• 2-stage method for ranking comments on YouTube• The first stage removes noisy comments• The second stage tries to link the comments with

information from other web pages relevant for the video

• Relevance is computed based on topic-modeling with Mallet

• • Results are encouraging, but need to find a more

rigorous method of assessing them• Results are better than the usual results provided by

YouTube, however the processing time for each video should not be neglected

12.04.23 CSCS 2013 – Bucharest, Romania 19

Page 20: Relevance based ranking of video comments on YouTube

Thank you!

• Questions?

• Discussion

12.04.23 CSCS 2013 – Bucharest, Romania 20