mapping tweets to conference talks: a goldmine for semantics

30
Mapping Tweets to Conference Talks: A Goldmine for Semantics Milan Stankovic, Hypios, Paris-Sorbonne, FR & Matthew Rowe, KMI, Open University, UK

Upload: milstan

Post on 19-Jun-2015

718 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Mapping Tweets to Conference Talks: A Goldmine for Semantics

Mapping Tweets to Conference Talks: A Goldmine for Semantics

Milan Stankovic, Hypios, Paris-Sorbonne, FR & Matthew Rowe, KMI, Open University, UK

Page 2: Mapping Tweets to Conference Talks: A Goldmine for Semantics

On Conference We Tweet

Page 3: Mapping Tweets to Conference Talks: A Goldmine for Semantics

Is there a Correspondance?

?

Page 4: Mapping Tweets to Conference Talks: A Goldmine for Semantics

Why?

tweettweet talktalkis about

Page 5: Mapping Tweets to Conference Talks: A Goldmine for Semantics

Why?

tweettweet talktalkis about

Topic 3

Topic 2

Topic 1

has topic

has topic

has topic

useruser

made

Page 6: Mapping Tweets to Conference Talks: A Goldmine for Semantics

Why?

tweettweet talktalkis about

Topic 3

Topic 2

Topic 1

has topic

has topic

has topic

useruser

made

interest ?interest ?

Page 7: Mapping Tweets to Conference Talks: A Goldmine for Semantics

Why?

tweettweet talktalkis about

useruser

made

were at the same talk ?were at the same talk ?

tweettweetis about

useruser

made

Page 8: Mapping Tweets to Conference Talks: A Goldmine for Semantics

Potential Benefits

• Digital memory• Conference feedback

– number of tweets for a talk– conversational aspects– sentiment analysis

• User profiling and expert finding• Trending topics

Page 9: Mapping Tweets to Conference Talks: A Goldmine for Semantics

Rich Activity Twitter Event Data

• We take Twitter archives from TwapperKeeper

• We enrich Tweets with relevant DBPedia concepts using Zemanta

• We rely on existing Linked Data about talks to perform the mappings.

Page 10: Mapping Tweets to Conference Talks: A Goldmine for Semantics

ESWC Dataset

• Collected during the Extended Semantic Web Conference 2010– Any tweets tagged with “eswc”

• 1082 tweets• 213 tweets enriched with concepts

Page 11: Mapping Tweets to Conference Talks: A Goldmine for Semantics

Aligning Tweets with Talks

• Goal: Label tweets with talks• Method:

– Induce a labelling function to perform alignment

– Labelled data = events from Web of Data– Unlabelled data = tweets

Liii yx 1,

Uiix 1

YXf :

Page 12: Mapping Tweets to Conference Talks: A Goldmine for Semantics

Aligning Tweets with Talks

1. Feature Extraction:

@prefix swrc: <http://swrc.ontoware.org/ontology#>@prefix swc: <http://data.semanticweb.org/ns/swc/ontology#>@prefix dog: <http://data.semanticweb.org>@prefix dc: <http://purl.org/dc/elements/1.1/><http://data.semanticweb.org/conference/eswc/2010/paper/phd_symposium/23> rdf:type swrc:InProceedings ; dc:subject "Knowledge Acquisition" ; dc:subject "Semantic Analysis" ; dc:subject "Social Web" ; dc:subject "Microblogs" ; dc:title "Exploring the Wisdom of the Tweets: Knowledge Acquisition from Social Awareness Streams" ; swrc:abstract "Although one might argue that little wisdom can be conveyed in messages of 140 ..." ; swrc:author <http://data.semanticweb.org/person/claudia-wagner> .<http://data.semanticweb.org/person/claudia-wagner> rdf:type foaf:Person ; foaf:name "Claudia Wagner" ; swrc:affiliation <http://data.semanticweb.org/organization/joanneum-research> ; foaf:based_near <http://dbpedia.org/resource/Austria>

Page 13: Mapping Tweets to Conference Talks: A Goldmine for Semantics

Aligning Tweets with Talks

1. Feature Extraction: F1 - Immediate Resource Leaves

@prefix swrc: <http://swrc.ontoware.org/ontology#>@prefix swc: <http://data.semanticweb.org/ns/swc/ontology#>@prefix dog: <http://data.semanticweb.org>@prefix dc: <http://purl.org/dc/elements/1.1/><http://data.semanticweb.org/conference/eswc/2010/paper/phd_symposium/23> rdf:type swrc:InProceedings ; dc:subject "Knowledge Acquisition" ; dc:subject "Semantic Analysis" ; dc:subject "Social Web" ; dc:subject "Microblogs" ; dc:title "Exploring the Wisdom of the Tweets: Knowledge Acquisition from Social Awareness Streams" ; swrc:abstract "Although one might argue that little wisdom can be conveyed in messages of 140 ..." ; swrc:author <http://data.semanticweb.org/person/claudia-wagner> .<http://data.semanticweb.org/person/claudia-wagner> rdf:type foaf:Person ; foaf:name "Claudia Wagner" ; swrc:affiliation <http://data.semanticweb.org/organization/joanneum-research> ; foaf:based_near <http://dbpedia.org/resource/Austria>

Knowledge Acquisition Semantic Analysis Social Web Microblogs Exploring the Wisdom of the Tweets: Knowledge Acquisition from Social Awareness Streams Although one might argue that little wisdom can be conveyed in messages of 140 http://data.semanticweb.org/person/claudia-wagner

Knowledge Acquisition Semantic Analysis Social Web Microblogs Exploring the Wisdom of the Tweets: Knowledge Acquisition from Social Awareness Streams Although one might argue that little wisdom can be conveyed in messages of 140 http://data.semanticweb.org/person/claudia-wagner

Page 14: Mapping Tweets to Conference Talks: A Goldmine for Semantics

Aligning Tweets with Talks

1. Feature Extraction: F2 – 1-step Resource Leaves

@prefix swrc: <http://swrc.ontoware.org/ontology#>@prefix swc: <http://data.semanticweb.org/ns/swc/ontology#>@prefix dog: <http://data.semanticweb.org>@prefix dc: <http://purl.org/dc/elements/1.1/><http://data.semanticweb.org/conference/eswc/2010/paper/phd_symposium/23> rdf:type swrc:InProceedings ; dc:subject "Knowledge Acquisition" ; dc:subject "Semantic Analysis" ; dc:subject "Social Web" ; dc:subject "Microblogs" ; dc:title "Exploring the Wisdom of the Tweets: Knowledge Acquisition from Social Awareness Streams" ; swrc:abstract "Although one might argue that little wisdom can be conveyed in messages of 140 ..." ; swrc:author <http://data.semanticweb.org/person/claudia-wagner> .<http://data.semanticweb.org/person/claudia-wagner> rdf:type foaf:Person ; foaf:name "Claudia Wagner" ; swrc:affiliation <http://data.semanticweb.org/organization/joanneum-research> ; foaf:based_near <http://dbpedia.org/resource/Austria>

http://data.semanticweb.org/person/claudia-wagner Claudia Wagner http://data.semanticweb.org/organization/joanneum-research http://dbpedia.org/resource/Austria

http://data.semanticweb.org/person/claudia-wagner Claudia Wagner http://data.semanticweb.org/organization/joanneum-research http://dbpedia.org/resource/Austria

Page 15: Mapping Tweets to Conference Talks: A Goldmine for Semantics

Aligning Tweets with Talks

1. Feature Extraction: F3 – DBPedia Concepts

@prefix swrc: <http://swrc.ontoware.org/ontology#>@prefix swc: <http://data.semanticweb.org/ns/swc/ontology#>@prefix dog: <http://data.semanticweb.org>@prefix dc: <http://purl.org/dc/elements/1.1/><http://data.semanticweb.org/conference/eswc/2010/paper/phd_symposium/23> rdf:type swrc:InProceedings ; dc:subject "Knowledge Acquisition" ; dc:subject "Semantic Analysis" ; dc:subject "Social Web" ; dc:subject "Microblogs" ; dc:title "Exploring the Wisdom of the Tweets: Knowledge Acquisition from Social Awareness Streams" ; swrc:abstract "Although one might argue that little wisdom can be conveyed in messages of 140 ..." ; swrc:author <http://data.semanticweb.org/person/claudia-wagner> .<http://data.semanticweb.org/person/claudia-wagner> rdf:type foaf:Person ; foaf:name "Claudia Wagner" ; swrc:affiliation <http://data.semanticweb.org/organization/joanneum-research> ; foaf:based_near <http://dbpedia.org/resource/Austria>

Http://dbpedia.org/resource/TwitterHttp://dbpedia.org/resource/Twitter

Http://dbpedia.org/resource/Social_WebHttp://dbpedia.org/resource/Social_Web

Http://dbpedia.org/resource/MicroblogsHttp://dbpedia.org/resource/Microblogs

Page 16: Mapping Tweets to Conference Talks: A Goldmine for Semantics

Aligning Tweets with Talks

2. Feature Vector Composition

Knowledge Acquisition Semantic Analysis Social Web Microblogs Exploring the Wisdom of the Tweets: Knowledge Acquisition from Social Awareness Streams Although one might argue that little wisdom can be conveyed in messages of 140 http://data.semanticweb.org/person/claudia-wagner

Knowledge Acquisition Semantic Analysis Social Web Microblogs Exploring the Wisdom of the Tweets: Knowledge Acquisition from Social Awareness Streams Although one might argue that little wisdom can be conveyed in messages of 140 http://data.semanticweb.org/person/claudia-wagner

knowledge acquisition semanticanalysis social web microblogs exploring wisdom tweetsknowledgeacquisitionsocial awareness streams wisdom messages

IndexerIndexer

knowledge 2

acquisition 2

semantic 1

analysis 1

social 2

web 1

microblogs 1

exploring 1

wisdom 1

tweets 1

awareness 1

streams 1

wisdom 1

messages 1

Page 17: Mapping Tweets to Conference Talks: A Goldmine for Semantics

Aligning Tweets with Talks

3. Inducing the Labelling Function– Both tweets and events are provided as feature

vectors– Induce a labelling function:

Choose the most likely event (y) given the tweet (x)

YXf :

Page 18: Mapping Tweets to Conference Talks: A Goldmine for Semantics

Aligning Tweets with Talks

3. Inducing the Labelling Function: Proximity-based Clustering

– Build a centroid vector for each event• From event feature vectors

– Compare each tweet vector with each centroid• Choose event (y) which is closest

)),((minarg y

Yy

xdy

n

iiixxmanhat

1

),( 2

1

),(

n

iiixxeucl

Page 19: Mapping Tweets to Conference Talks: A Goldmine for Semantics

Aligning Tweets with Talks

3. Inducing the Labelling Function: Naive Bayes Classification

– Assigns most probably event label given tweet features

– Using Bayes Theorem, we write this as:

),,,|( 21maxarg n

Yy

xxxyPy

ii

Yy

n

Yy

n

n

Yy

yxPyPy

yPyxxxPy

xxxP

yPyxxxPy

)|()(

)()|,,,(

),,,(

)()|,,,(

maxarg

maxarg

maxarg

21

21

21

Page 20: Mapping Tweets to Conference Talks: A Goldmine for Semantics

Experiments

• Dataset– Corpus of Tweets collected during ESWC 2010

• Gold Standard Construction– Used 3 raters to label a portion of tweet corpus

• 200 tweets labelled

– Took interrater agreement between raters• Using Kappa statistic

– Initial Agreement was too low: 0.328– Utilised Delphi method to improve agreement– Second round of labelling produced: 0.820

Page 21: Mapping Tweets to Conference Talks: A Goldmine for Semantics

Experiments

• Evaluation Measures– Precision: proportion of event tweets correctly

labelled– Recall: proportion of tweets successfully

returned for a tweet– F-measure: Harmonic mean of precision and

recall

• Placed emphasis of precision over recallRP

RPmeasuref

2

2 )1(

1,5.0,2.0

Page 22: Mapping Tweets to Conference Talks: A Goldmine for Semantics

Results

Page 23: Mapping Tweets to Conference Talks: A Goldmine for Semantics

Imagine…

Page 24: Mapping Tweets to Conference Talks: A Goldmine for Semantics

Imagine user profiling

ESWC dataset, user Matthew Rowe

Page 25: Mapping Tweets to Conference Talks: A Goldmine for Semantics

Imagine conference feedback

ESWC dataset

directly from Tweets

from mappings (Talks)

Page 26: Mapping Tweets to Conference Talks: A Goldmine for Semantics

We Challenge You

Page 27: Mapping Tweets to Conference Talks: A Goldmine for Semantics

We Challenge You!

• Beat us in mappings!• We provide the human generated gold

stadnard mappings• Can you find a more precise way to do tweet-

talk mappings?• Can you find other uses? Let us know!

Page 28: Mapping Tweets to Conference Talks: A Goldmine for Semantics

We Challenge You!

• you can find the gold standard data here :

http://research.hypios.com/?page_id=131

• you can find all the data (and automated mappings) here:

http://data.hypios.com/tweets/sparql

Page 29: Mapping Tweets to Conference Talks: A Goldmine for Semantics

We Challenge You!

http://data.hypios.com/tweets/sparql

SELECT ?tweet ?talk WHERE {

?tweet <http://linkedevents.org/ontology/illustrate> ?talk.

}

Page 30: Mapping Tweets to Conference Talks: A Goldmine for Semantics

brought to you by

[email protected] & [email protected] 2010, Shanghaï, China