mediaeval 2015 - synchronization of multi-user event media at mediaeval 2015: task description,...

Post on 20-Jan-2017

111 Views

Category:

Education

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Synchronization of

Multi-User Event Media (SEM) Task

2015

Nicola Conci (Univ. of Trento)

Francesco G.B. De Natale (Univ. of Trento)

Vasileios Mezaris (ITI – CERTH)

Mike Matton (VRT)

Motivation

• People collect and share dozens of media through social networks, cloud services, Internet.

• Having access to all this data, users can create their own version of the event: – Summaries.

– Stories, presenting the media on a single timeline.

– Personalized albums, allowing the selection of media that concern a specific user.

– Contextualized albums, containing information about the event captured by different users.

Motivation

• Such a large amount of data is often unstructured and heterogeneous.

• It is desirable to find a consistent way of presenting the media galleries captured during an event.

• This task is not trivial, since timing and location information attached to the captured media (mostly timestamps and GPS) could be inaccurate or missing.

Aims and Objectives

• Assuming a multi-users scenario (10+), each collecting a certain number of media (photos, videos, audio files), the goal is to align them along a common timeline (time synchronization).

• Detect the main sub-events in the entire gallery (sub-event clustering).

Given N image collections (galleries) taken by different users/devices at the

same event, find the best (relative) time alignment among them and detect

the significant sub-events over the whole gallery

Datasets

• The working assumptions are as follows: – Media of each dataset are split to galleries (the media of a single

user).

– Each gallery may be composed of photos and video clips (or audio files) taken from the same device.

– Each gallery will be consistent in terms of time and location information, when available.

– Teams can use any kind of available information related to the media items: tags, annotation, timestamp, GPS, content, as well as possibly related information available on the internet.

Datasets

• We have provided 4 different datasets: – Tour De France 2014 (TDF14): Photos taken during an annual multiple

stage bicycle race and collected from Flickr.

– NAMM Show 2015 (NAMM15): One of the world's largest trade-only event for the music products industry, with several booths and live shows.

– Salford Test Shoot (SAL): A series of musical performances captured using both professional- and consumer-grade equipment performed by an ensemble of ten musicians from the BBC Philharmonic Orchestra.

– Spring Parti Salesiani 2015 (SPS15): A dataset recorded during a local music and food event in Trento, Italy. Composed of videos and photos captured by the attendees during the event.

Tour De France 2014 Dataset

Leeds – Harrogate (United Kingdom), July 5th, 2014

Évry – Paris Champs-Élysées, July 27th, 2014

Gérardmer – Mulhouse, July 13th 2014

Saint-Gaudens – Saint-Lary-Soulan Pla d’Adet, July 24th, 2014

. . .

. . .

NAMM Show 2015 Dataset

Josh Damigo performing on the Marriott Stage, January 23rd, 2015

Deer Park Avenue performing in the Gibson Guitars showroom, January 25th, 2015

The Bangles performing at the WiMN "She Rocks" Awards, January 23rd, 2015

Dilana performing on the GoPro stage, January 24th, 2015

. . .

. . .

Salford Test Shoot Dataset

Session #1 Session #2

Session #7 Session #9

. . .

. . .

Spring Parti Salesiani Dataset

. . .

. . .

Preparation Bands live show

Testimonies & Speeches Dj Party

Datasets

Number of photos in

the dataset

Number of videos in the

dataset

Number of audio files in the dataset

Number of galleries

Number of sub-

events consisting the event

TDF14 2471 - - 33 89

NAMM15 420 32 - 19 97

SAL - 129 894 34 10

SPS15 189 101 - 11 4

Datasets

• Datasets consist of various media types (photos, videos and audio files). The videos also have an audio track.

• The ground truth for the datasets was built by considering the acquisition time of the media and manually verified to check the consistency with respect to the captured event.

Datasets

• SEM 2015 datasets are publicly available for download and use by the research community at:

http://mmlab.disi.unitn.it/MediaEvalSEM2015/

except SAL dataset, which is available at:

https://icosole.lab.vrt.be/viewer/home

(dataset + ground truth + evaluation script)

• SEM 2014 datasets (Vancouver and London Olympic games) are also available at:

http://mmlab.disi.unitn.it/MediaEvalSEM2014/

Metrics for evaluation

• For the synchronization the goal is to maximize the number of galleries, for which the synchronization error is below a

predefined threshold, (with respect to a reference gallery). – Precision measures the number of galleries (M) over the total number

of galleries (N-1, excluding the reference):

– Accuracy is the average temporal offset calculated over the synchronized collections, normalized with respect to ∆����:

Metrics for evaluation

• For the sub-event clustering evaluation we use the F1 score:

In the formulation above we declare a true positive (TP) when two photos related to the same sub-event are put in the same cluster. False positives (FP) occur when two photos are assigned to the same cluster although belonging to different sub-events, and a false negative (FN) when two photos belonging to different sub-events are assigned to the same cluster.

Team scores

TDF14 NAMM15 SAL SPS15

Precision Accuracy Precision Accuracy Precision Accuracy Precision Accuracy

JRS 0.4062 0.7661 0.0556 0.9444 - - - -

CERTH-ITI-MM

(task organizer) 0.1250 0.8446 0.8330 0.9083 0.4242 0.9998 - -

Time Synchronization

Team scores

TDF14 NAMM15 SAL SPS15

F1 Score F1 Score F1 Score F1 Score

JRS 0.2538 0.1454 - -

CERTH-ITI-MM

(task organizer) 0.1134 0.3658 0.1640 -

Sub-event Clustering

Conclusions

• Datasets this year contain a mix of different file types (still photos, various formats of video files, audio files).

• Due to the considerable diversity of datasets, we conclude that it is very challenging for a single approach to effectively handle this data.

• We notice, depending on the dataset, teams have either achieved good precision (synchronized most of the galleries), or good accuracy (synchronized galleries correctly).

• 2 participants and a total of 6 runs make it difficult to draw more detailed conclusions.

Thank you for your attention! Questions?

More information and contact: Dr. Vasileios Mezaris bmezaris@iti.gr http://www.iti.gr/~bmezaris

top related