towards twitter hashtag recommendation using distributed word representations and a deep feed...

Post on 18-Dec-2014

120 Views

Category:

Technology

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

Towards Twitter hashtag recommendation using distributed word representations and a deep feed forward neural network

TRANSCRIPT

ELIS – Multimedia Lab

Towards Twitter Hashtag Recommendation Using Distributed Word Representations and a Deep Feed

Forward Neural Network

CSSC-2014New Delhi, 24 September 2014

Abhineshwar Tomar, Frederic Godin, Baptist Vandersmissen, Wesley De Neve, Rik Van de Walle

Multimedia Lab, Ghent University – iMinds, BelgiumImage and Video Systems Lab, KAIST, South Korea

2

ELIS – Multimedia Lab

Introduction Goal Motivation Methodology Results Conclusion Future work

Overview

3

ELIS – Multimedia Lab

Introduction Goal Motivation Methodology Results Conclusion Future work

Overview

4

ELIS – Multimedia Lab

• An online social network service that enables users to send and read short 140-character text messages, called "tweets" or "microposts"

Twitter

Tweet ormicropostRetweet

(sharing)

Favorite(like or

bookmark)

Mention(starts with @)

Hashtag(starts with #)

5

ELIS – Multimedia Lab

Note the presence of both textual and (embedded) visual information!

Famous Tweets

6

ELIS – Multimedia Lab

• Usage in general- 271 million monthly active users- 500 million Tweets are sent per day

• Hashtags- Only 8% of the tweets contain hashtags- 3% of the hashtags are used more than 5 times

Twitter Statistics

7

ELIS – Multimedia Lab

Hashtags on Twitter

Hashtag usage:- topic-based indexing & search

• #socialnetwork• #Reddit

- conversational/event clustering• #www2014

Observation: only 8% of tweets contain a hashtag

8

ELIS – Multimedia Lab

Introduction Goal Why Methodology Results Conclusion Future work

Overview

9

ELIS – Multimedia Lab

Generate hashtags that adhere to the semantic and linguistic regularity of a tweet

Goal

10

ELIS – Multimedia Lab

Introduction Goal Motivation Methodology Results Conclusion Future work

Overview

11

ELIS – Multimedia Lab

• Hashtags- Content categorization and discovery- Effective search of tweets

• Our approach- Connect similar hashtags (topics)- Promote the use of hashtags

• By understanding the semantics of the tweet

Why

12

ELIS – Multimedia Lab

Introduction Goal Motivation Methodology Results Conclusion Future work

Overview

13

ELIS – Multimedia Lab

• Preprocessing- Remove non-English words- Remove non-ASCII characters- Remove mentions (@USER)- Remove URLs- Remove RT @ from retweets

• Feature vector generation

• Training of a feed forward neural network

• Evaluation

Methodology (1/3)

14

ELIS – Multimedia Lab

• Training: learning the relation between tweets and hashtags

Methodology (2/3)

300-D tweet vector

word2vec

300-D hashtag vector

word2vec

Deep feed-forward neural

network

300-D input layer1000-D hidden layer500-D hidden layer400-D hidden layer300-D output layer

Tweet HashtagElizabeth Warren Taking on Hillary as New Democratic Powerhouse

#politics

15

ELIS – Multimedia Lab

• Testing: recommending hashtags to tweets

Methodology (3/3)

300-D tweet vector

word2vec

300-D hashtag vector

Deep feed-forward neural

network

300-D input layer1000-D hidden layer500-D hidden layer400-D hidden layer300-D output layer

TweetHouse Democrats suggestObama impeachment isimminent to raise cash

vec2word

HashtagHashtag

HashtagHashtags

#politics#crisis

16

ELIS – Multimedia Lab

• Developed by Google Research

• Computes vector representations for words- Through the use of neural network technology

• Trained on part of the Google News dataset (+/- 100 billion words)• The model contains vectors for 3 million words and phrases

- Capture the semantic meaning of a word

• Example word vector properties- vector('Paris') - vector('France') + vector('Italy') ≈ vector('Rome')- vector('king') - vector('man') + vector('woman') ≈ vector('queen')

word2vec

17

ELIS – Multimedia Lab

Introduction Goal Motivation Methodology Results Conclusion Future work

Overview

18

ELIS – Multimedia Lab

Tweet Recommended hashtags

1 Someone dm/text me bc I’m so bored madd, Oh noes, rainnwilson, sooooooo, fricken

2 The good life is one inspired by love and guided by knowledge.

Ahh yes, FIVE THINGS About, YANKEES TALK, Kinder gentler,Ya gotta love

3 Method of Losing Weight http://t.co/rs64CEuo5W Shape Shifting, Treat Acne, Detect Cancer, Warps, Calorie Burn

4 I hate today cause its room cleaning day for me!!! FAN ’S ATTIC, Puh leez, Mopping robot, % #F######## 3v.jsn, InterestEURO JAP

5 SPELLS AND SPELL-CASTING:ENCYCLOPEDIA OF 5000 SPELLS ( JUDIKA ILLES ):BLACKSMITH’S WATER HEALING SPELL: A... http://t.co/k0TfrqJFQW

DEBUTS NEW, NOW AVAILABLE FOR, TO PUBLISH, DESIGNED TO,IS READY TO

Results (1/3)

19

ELIS – Multimedia Lab

Results (2/3)

20

ELIS – Multimedia Lab

Top-k recommendation Hit-rate

She et al. Our approach1 Top-5 82% 83.33%2 Top-10 89% 86.67%

Results (3/3)

21

ELIS – Multimedia Lab

Introduction Goal Motivation Methodology Results Conclusion Future work

Overview

22

ELIS – Multimedia Lab

Conclusion

• Introduced a novel approach for hashtag recommendation, using distributed word representations and a feed forward neural network

• Learns semantic and linguistic regularities without requiring careful feature engineering

• Can easily take advantage of temporal information

• Supports the automatic creation of new hashtags/trends

23

ELIS – Multimedia Lab

Introduction Goal Motivation Methodology Results Conclusion Future work

Overview

24

ELIS – Multimedia Lab

Future Work

• Use of more than four days of data

• Use word representations from different data sources

• Investigate impact of the quality of the word representations created

• Investigate impact of the use of DBpedia and Freebase

ELIS – Multimedia Lab

top related