recommending #-tags in twitter

22
Recommending #-Tags in Twitter Eva Zangerle , Wolfgang Gassler and Günther Specht 1

Upload: evazangerle

Post on 26-Jan-2015

122 views

Category:

Documents


0 download

DESCRIPTION

 

TRANSCRIPT

1

Recommending #-Tags in Twitter

Eva Zangerle, Wolfgang Gassler and Günther Specht

2

Outline

• Motivation• Approach• Ranking Methods• Evaluation• Future Directions

3

Hashtags

• Tags for Tweets• (Manual) Categorization of conversations• Follow streams of conversation

4

Motivation

• Only 20% of tweets contain hashtags• Hashtags can be chosen freely

– #umap2011? #umap11? #umap?– Synonymous hashtags– Heterogeneity– Search capability limited

5

Are hashtag recommendations possible?

Motivation

6

Goals

• Recommendation of suitable hashtags during entering a tweet

• Encourage use of hashtags– Improve search capabilities– Better categorization

• Fight heterogeneity– Avoid use of synonymous hashtags

7

Approach

• First Attempt• Crawl set of tweets containing hashtags• Analysis of dataset• Can it be done based on content?• Compare entered tweet to existing tweets

8

Content-based Approach

User enters message

Retrieve 500 most similar messages

Retrieve candidate-set of Hashtags

Ranking of Hashtags

Top-k Recommendations

9

Crawled Dataset

• Crawled July 2010 – February 2011• 16,034,195 messages in total• 3,209,281 messages containing hashtags

(20%) -> used as dataset for evaluation• Top five contained in 8% of all messages

containing hashtags (#jobs, #nowplaying, #zodiacfacts, #news and #fb)

10

Hashtags per Tweet

11

Hashtags per Tweet

RT @Bhupesh tweet: #Quad #loop-http://bit.ly/ciHX2U #retweet #India #Jobs #World #news #canada #ad #win #USA #tdf #oea #hacking #icantstop #sdcc #game

12

Longtail Distribution

13

Ranking Methods

Input: Set of Candiate Hashtags (from 500 similar tweets)Output: Ranked Candidate List -> top k shown

1. Similarity Rank– Use similarity measure of tweets for ranking (tf/idf cosine similarity)– The higher the similarity of the tweets, the higher the ranking of the

corresponding hashtags

2. Overall Popularity Rank– Most popular hashtags over whole dataset– The more popular, the higher the ranking within the candidate

hashtags

14

Ranking Methods

Input: Set of Candiate Hashtags (from 500 similar tweets)Output: Ranked Candidate List -> top k shown

3. Recommendation Popularity Rank- Count number of occurrences for each hashtags within

candidate list- The more similar tweets feature the hashtag, the higher the

rank of the hashtag

15

Evaluation

Compare top-k recommendations

Use three proposed ranking methods

Compute hashtag recommendations for t

Use t as input for recommendation algorithm

Remove hashtags from t

Randomly select tweet t from dataset

16

Evaluation

• Dataset– 3,209,281 messages– 5,097,545 hashtags– 510,170 distinct hashtags

• Testrun – 10,000 randomly chosen tweets (max. 5 hashtags)– Retweets excluded– 30,000 testruns (3 ranking methods)

17

Evaluation - Precision

Avg. Number of hashtags per

message in dataset

18

Evaluation - Recall

Top-3 recommendations

enough?

19

What we showed…

• Motivation for recommendation of hashtags• Content-based recommendations• Simple, straight-forward approach• 40% Recall@3 • … so it can be done!

20

{ "coordinates": null, "favorited": false, "created_at": "Thu Jul 15 23:26:04 +0000 2010", "truncated": false, "text": "RT @ApeyBaby44: Labels r run by lawyer & accountants. http://tl.gd/2hkmas", "contributors": null, "id": 18639444000, "geo": null, "in_reply_to_user_id": null, "place": null, "in_reply_to_screen_name": null, "user": { "name": "DIGGZ", "profile_sidebar_border_color": "F2E195", "profile_background_tile": true, "profile_sidebar_fill_color": "FFF7CC", "created_at": "Fri Apr 03 03:16:01 +0000 2009", "profile_image_url": "http://a3.twimg.com/profile_images/1079346239/untitled_normal.JPG", "location": "ATL, NC, VA, NY all day!", "profile_link_color": "FF0000", "follow_request_sent": null, "url": "http://thisisseriousbiz.com", "favourites_count": 42, "contributors_enabled": false, "utc_offset": -18000, "id": 28489988, "profile_use_background_image": true, "profile_text_color": "0C3E53", "protected": false, "followers_count": 588, "lang": "en", "notifications": null, "time_zone": "Quito", "verified": false, "profile_background_color": "BADFCD", "geo_enabled": true, "description": "Half of Production duo SeriousBIZ circa 2008\r\n#teamSERIOUSBIZ\r\n#teamblackberry PIN 315442C9\r\n#teamfollowback", "friends_count": 477, "statuses_count": 6269, "profile_background_image_url": "http://a1.twimg.com/profile_background_images/118926256/_MG_43571.JPG", "following": null, "screen_name": "DIGGZSeriousBIZ" }, "source": "<a href=\"http://www.ubertwitter.com/bb/download.php\" rel=\"nofollow\">\u00dcberTwitter</a>", "in_reply_to_status_id": null }

We‘ve barely scratched the surface…

• Exploited only small fraction of available information

• 90% are metadata

21

Thank You!

#ideas?

22

Future Challenges

• User Interface• Social Graph• Realtime Recommendations• Synonymous Tags?• Ranking• Real User Tests