Transcript
Page 1: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

MediaFinder: Collect, Enrich and Visualize Media Memes

Shared by the Crowd

Raphaël Troncy

[email protected] / @rtroncy

Page 2: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

Conferences and natural disaster

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 2

Page 3: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

- 3 14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro

Page 4: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

- 4 14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro

Page 6: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

- 6 14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro

Page 7: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

Social Media: some definitions

Media Item: a photo or a video that is shared on a social network

Micropost: a text status message that can optionally accompany a media item

Social Network: an online service that focuses on building and reflecting social relationships among people sharing interests or activities Media Sharing Platforms: emphasis on sharing media

but blurred boundaries with social networks since users are encouraged to react on media content (like, comment, favorite, etc.)

Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro 14/05/2013 - 7

Page 8: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

Social networks and media items

First-order support: Posting requires the inclusion of a media item Example: Flickr, YouTube

Second-order support: Possibility to post media items but also text-only messages Example: Facebook

Third-order support: No direct support for media items but rely on third party applications

to host them Example: Twitter before the introduction of native photo support

Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro 14/05/2013 - 8

Page 9: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

Media Server

Composition of media item extractors (12 SNs) Rely on search APIs + a fix 30s timeout window to provide results Fallback on screen scraping when necessary (Twitter ecosystem)

Implemented as a NodeJS server

Serialize results in a common schema (JSON)

Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro 14/05/2013 - 9

Page 10: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 10

Deep link Permalink

Clean text for NLP processing

Aggregate view of ALL social interactions

12 Social Networks

Page 11: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

Media Finder (www2013)

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 11

Page 12: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

Media Finder (zooming on media items)

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 12

Page 13: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

Media Finder (timeline view)

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 13

Page 14: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

Named Entities are Pivotal

Standalone software GATE Stanford CoreNLP Temis

Web APIs

http://nerd.eurecom.fr/

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 14

Page 15: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

What is NERD? REST API2 ontology1

UI3

1 http://nerd.eurecom.fr/ontology 2 http://nerd.eurecom.fr/api/application.wadl 3 http://nerd.eurecom.fr

The NERD ontology has been integrated in the NIF project, a EU FP7 in the context of the LOD2: Creating Knowledge out of Interlinked Data

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 15

Page 16: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

NERD REST API

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 16

GET, POST, PUT,

DELETE

/document /user /annotation/{extractor} /extraction /evaluation ...

JSON/RDF*

“entities” : [{ “entity”: “Tim Berners-Lee” , “type”: “Person” , “uri”: "http://dbpedia.org/resource/Tim_berners_lee", “nerdType”: "http://nerd.eurecom.fr/ontology#Person", “startChar”: 30, “endChar”: 45, “confidence”: 1, “relevance”: 0.5 }]

Rizzo G., Troncy R. (2012), NERD: A Framework for Unifying Named Entity Recognition and Disambiguation Web Extraction Tools. In: European chapter of the Association for Computational Linguistics (EACL'12), Avignon, France.

Page 17: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

Media Finder Architecture

Media items harvesting using the Media Server http://eventmedia.eurecom.fr/media-

server/search/{combined}/{term} https://github.com/vuknje/media-server (@tomayac fork)

Image near de-duplication DCT signature on image and video frame,

Hamming distance between image pairs

Clustering and disambiguation Named Entity Extraction using NERD Topic Generation using LDA

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 17

Page 18: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

Media Finder (named entities clustering)

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 18

Page 19: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

Media Finder (zooming in a cluster)

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 19

Page 20: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

Media Finder

Live Topic Generation from Event Streams Meet us at WWW 2013 Demo Session http://www.youtube.com/watch?v=8iRiwz7cDYY

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 20

Page 21: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

Tracking an event: Italian Election

Repeated queries over a period of time We have tracked and analyzed media posts tagged as

elezioni2013 from 2013-02-26 to 2013-03-03 Cron job: every 30 minutes over the 6 days Slice the data in 24 hours slots

Research questions: Can we re-create the news headlines?

Storyboarding: http://mediafinder.eurecom.fr/story/elezioni2013

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 21

Page 22: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

Tracking an event: Italian Election

Dataset: ~16501 microposts containing (duplicate) media items ~21087 Named Entities extracted

Clustering NER and LDA Generate Bag of Entities (BOE) disambiguated with a

DBpedia URI

Examples: Monti, Bersani, Italia, Berlusconi, Grillo, Stelle

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 22

Page 23: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

Tracking an event: Italian Election

Tracking and Analyzing The 2013 Italian Election To appear at ESWC 2013 Demo Session http://www.youtube.com/watch?v=jIMdnwMoWnk

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 23

Page 24: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

Take Home Message

Media Server / Media Finder: Aggregating fresh social media items Making sense of media collection for video hyper-linking

NERD platform for extracting key information

Vision: adoption of semantic multimedia technologies will foster a European market for media fragment re-purposing and re-selling

Sneak preview: Interact with a Kinect and discover enriched hypervideo http://www.youtube.com/watch?v=4mSC685AG7k

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 24

Page 25: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

Credits

Vuk Milicic … interaction designer

Giuseppe Rizzo … NERD guru

José Luis Redondo Garcia … triplification and clustering

Thomas Steiner … Media Server original code

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 25

Page 26: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

http://www.slideshare.net/troncy

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 26


Top Related