mediaeval 2015 - synchronization of multi-user event media at mediaeval 2015: task description,...
TRANSCRIPT
Synchronization of
Multi-User Event Media (SEM) Task
2015
Nicola Conci (Univ. of Trento)
Francesco G.B. De Natale (Univ. of Trento)
Vasileios Mezaris (ITI – CERTH)
Mike Matton (VRT)
Motivation
• People collect and share dozens of media through social networks, cloud services, Internet.
• Having access to all this data, users can create their own version of the event: – Summaries.
– Stories, presenting the media on a single timeline.
– Personalized albums, allowing the selection of media that concern a specific user.
– Contextualized albums, containing information about the event captured by different users.
Motivation
• Such a large amount of data is often unstructured and heterogeneous.
• It is desirable to find a consistent way of presenting the media galleries captured during an event.
• This task is not trivial, since timing and location information attached to the captured media (mostly timestamps and GPS) could be inaccurate or missing.
Aims and Objectives
• Assuming a multi-users scenario (10+), each collecting a certain number of media (photos, videos, audio files), the goal is to align them along a common timeline (time synchronization).
• Detect the main sub-events in the entire gallery (sub-event clustering).
Given N image collections (galleries) taken by different users/devices at the
same event, find the best (relative) time alignment among them and detect
the significant sub-events over the whole gallery
Datasets
• The working assumptions are as follows: – Media of each dataset are split to galleries (the media of a single
user).
– Each gallery may be composed of photos and video clips (or audio files) taken from the same device.
– Each gallery will be consistent in terms of time and location information, when available.
– Teams can use any kind of available information related to the media items: tags, annotation, timestamp, GPS, content, as well as possibly related information available on the internet.
Datasets
• We have provided 4 different datasets: – Tour De France 2014 (TDF14): Photos taken during an annual multiple
stage bicycle race and collected from Flickr.
– NAMM Show 2015 (NAMM15): One of the world's largest trade-only event for the music products industry, with several booths and live shows.
– Salford Test Shoot (SAL): A series of musical performances captured using both professional- and consumer-grade equipment performed by an ensemble of ten musicians from the BBC Philharmonic Orchestra.
– Spring Parti Salesiani 2015 (SPS15): A dataset recorded during a local music and food event in Trento, Italy. Composed of videos and photos captured by the attendees during the event.
Tour De France 2014 Dataset
Leeds – Harrogate (United Kingdom), July 5th, 2014
Évry – Paris Champs-Élysées, July 27th, 2014
Gérardmer – Mulhouse, July 13th 2014
Saint-Gaudens – Saint-Lary-Soulan Pla d’Adet, July 24th, 2014
. . .
. . .
NAMM Show 2015 Dataset
Josh Damigo performing on the Marriott Stage, January 23rd, 2015
Deer Park Avenue performing in the Gibson Guitars showroom, January 25th, 2015
The Bangles performing at the WiMN "She Rocks" Awards, January 23rd, 2015
Dilana performing on the GoPro stage, January 24th, 2015
. . .
. . .
Salford Test Shoot Dataset
Session #1 Session #2
Session #7 Session #9
. . .
. . .
Spring Parti Salesiani Dataset
. . .
. . .
Preparation Bands live show
Testimonies & Speeches Dj Party
Datasets
Number of photos in
the dataset
Number of videos in the
dataset
Number of audio files in the dataset
Number of galleries
Number of sub-
events consisting the event
TDF14 2471 - - 33 89
NAMM15 420 32 - 19 97
SAL - 129 894 34 10
SPS15 189 101 - 11 4
Datasets
• Datasets consist of various media types (photos, videos and audio files). The videos also have an audio track.
• The ground truth for the datasets was built by considering the acquisition time of the media and manually verified to check the consistency with respect to the captured event.
Datasets
• SEM 2015 datasets are publicly available for download and use by the research community at:
http://mmlab.disi.unitn.it/MediaEvalSEM2015/
except SAL dataset, which is available at:
https://icosole.lab.vrt.be/viewer/home
(dataset + ground truth + evaluation script)
• SEM 2014 datasets (Vancouver and London Olympic games) are also available at:
http://mmlab.disi.unitn.it/MediaEvalSEM2014/
Metrics for evaluation
• For the synchronization the goal is to maximize the number of galleries, for which the synchronization error is below a
predefined threshold, (with respect to a reference gallery). – Precision measures the number of galleries (M) over the total number
of galleries (N-1, excluding the reference):
– Accuracy is the average temporal offset calculated over the synchronized collections, normalized with respect to ∆����:
Metrics for evaluation
• For the sub-event clustering evaluation we use the F1 score:
In the formulation above we declare a true positive (TP) when two photos related to the same sub-event are put in the same cluster. False positives (FP) occur when two photos are assigned to the same cluster although belonging to different sub-events, and a false negative (FN) when two photos belonging to different sub-events are assigned to the same cluster.
Team scores
TDF14 NAMM15 SAL SPS15
Precision Accuracy Precision Accuracy Precision Accuracy Precision Accuracy
JRS 0.4062 0.7661 0.0556 0.9444 - - - -
CERTH-ITI-MM
(task organizer) 0.1250 0.8446 0.8330 0.9083 0.4242 0.9998 - -
Time Synchronization
Team scores
TDF14 NAMM15 SAL SPS15
F1 Score F1 Score F1 Score F1 Score
JRS 0.2538 0.1454 - -
CERTH-ITI-MM
(task organizer) 0.1134 0.3658 0.1640 -
Sub-event Clustering
Conclusions
• Datasets this year contain a mix of different file types (still photos, various formats of video files, audio files).
• Due to the considerable diversity of datasets, we conclude that it is very challenging for a single approach to effectively handle this data.
• We notice, depending on the dataset, teams have either achieved good precision (synchronized most of the galleries), or good accuracy (synchronized galleries correctly).
• 2 participants and a total of 6 runs make it difficult to draw more detailed conclusions.
Thank you for your attention! Questions?
More information and contact: Dr. Vasileios Mezaris [email protected] http://www.iti.gr/~bmezaris