MediaFinder: Collect, Enrich and Visualize Media Memes
Shared by the Crowd
Raphaël Troncy
[email protected] / @rtroncy
Conferences and natural disaster
14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 2
- 3 14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro
- 4 14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro
- 5 14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro
- 6 14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro
Social Media: some definitions
Media Item: a photo or a video that is shared on a social network
Micropost: a text status message that can optionally accompany a media item
Social Network: an online service that focuses on building and reflecting social relationships among people sharing interests or activities Media Sharing Platforms: emphasis on sharing media
but blurred boundaries with social networks since users are encouraged to react on media content (like, comment, favorite, etc.)
Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro 14/05/2013 - 7
Social networks and media items
First-order support: Posting requires the inclusion of a media item Example: Flickr, YouTube
Second-order support: Possibility to post media items but also text-only messages Example: Facebook
Third-order support: No direct support for media items but rely on third party applications
to host them Example: Twitter before the introduction of native photo support
Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro 14/05/2013 - 8
Media Server
Composition of media item extractors (12 SNs) Rely on search APIs + a fix 30s timeout window to provide results Fallback on screen scraping when necessary (Twitter ecosystem)
Implemented as a NodeJS server
Serialize results in a common schema (JSON)
Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro 14/05/2013 - 9
14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 10
Deep link Permalink
Clean text for NLP processing
Aggregate view of ALL social interactions
12 Social Networks
Media Finder (www2013)
14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 11
Media Finder (zooming on media items)
14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 12
Media Finder (timeline view)
14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 13
Named Entities are Pivotal
Standalone software GATE Stanford CoreNLP Temis
Web APIs
http://nerd.eurecom.fr/
14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 14
What is NERD? REST API2 ontology1
UI3
1 http://nerd.eurecom.fr/ontology 2 http://nerd.eurecom.fr/api/application.wadl 3 http://nerd.eurecom.fr
The NERD ontology has been integrated in the NIF project, a EU FP7 in the context of the LOD2: Creating Knowledge out of Interlinked Data
14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 15
NERD REST API
14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 16
GET, POST, PUT,
DELETE
/document /user /annotation/{extractor} /extraction /evaluation ...
JSON/RDF*
“entities” : [{ “entity”: “Tim Berners-Lee” , “type”: “Person” , “uri”: "http://dbpedia.org/resource/Tim_berners_lee", “nerdType”: "http://nerd.eurecom.fr/ontology#Person", “startChar”: 30, “endChar”: 45, “confidence”: 1, “relevance”: 0.5 }]
Rizzo G., Troncy R. (2012), NERD: A Framework for Unifying Named Entity Recognition and Disambiguation Web Extraction Tools. In: European chapter of the Association for Computational Linguistics (EACL'12), Avignon, France.
Media Finder Architecture
Media items harvesting using the Media Server http://eventmedia.eurecom.fr/media-
server/search/{combined}/{term} https://github.com/vuknje/media-server (@tomayac fork)
Image near de-duplication DCT signature on image and video frame,
Hamming distance between image pairs
Clustering and disambiguation Named Entity Extraction using NERD Topic Generation using LDA
14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 17
Media Finder (named entities clustering)
14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 18
Media Finder (zooming in a cluster)
14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 19
Media Finder
Live Topic Generation from Event Streams Meet us at WWW 2013 Demo Session http://www.youtube.com/watch?v=8iRiwz7cDYY
14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 20
Tracking an event: Italian Election
Repeated queries over a period of time We have tracked and analyzed media posts tagged as
elezioni2013 from 2013-02-26 to 2013-03-03 Cron job: every 30 minutes over the 6 days Slice the data in 24 hours slots
Research questions: Can we re-create the news headlines?
Storyboarding: http://mediafinder.eurecom.fr/story/elezioni2013
14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 21
Tracking an event: Italian Election
Dataset: ~16501 microposts containing (duplicate) media items ~21087 Named Entities extracted
Clustering NER and LDA Generate Bag of Entities (BOE) disambiguated with a
DBpedia URI
Examples: Monti, Bersani, Italia, Berlusconi, Grillo, Stelle
14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 22
Tracking an event: Italian Election
Tracking and Analyzing The 2013 Italian Election To appear at ESWC 2013 Demo Session http://www.youtube.com/watch?v=jIMdnwMoWnk
14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 23
Take Home Message
Media Server / Media Finder: Aggregating fresh social media items Making sense of media collection for video hyper-linking
NERD platform for extracting key information
Vision: adoption of semantic multimedia technologies will foster a European market for media fragment re-purposing and re-selling
Sneak preview: Interact with a Kinect and discover enriched hypervideo http://www.youtube.com/watch?v=4mSC685AG7k
14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 24
Credits
Vuk Milicic … interaction designer
Giuseppe Rizzo … NERD guru
José Luis Redondo Garcia … triplification and clustering
Thomas Steiner … Media Server original code
14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 25
http://www.slideshare.net/troncy
14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 26