vidivideo and im3i

76
Automatic Metadata Extraction Marco Bertini Università di Firenze - MICC www.micc.unifi.it giovedì 1 luglio 2010

Upload: media-integration-and-communication-center

Post on 19-Dec-2014

1.193 views

Category:

Technology


0 download

DESCRIPTION

Presentation held by Marco Bertini at the first EUscreen Open Workshop in Mykonos, Greece, on June 23 and 24, 2010 on the Videivideo and IM3I projects

TRANSCRIPT

Page 1: Vidivideo and IM3I

Automatic Metadata Extraction

Marco BertiniUniversità di Firenze - MICC

www.micc.unifi.it

giovedì 1 luglio 2010

Page 2: Vidivideo and IM3I

The problem

The massive increase in digital audio-visual information poses high demands on advanced storage and search engines for consumers and professional archives.

Video is now a natural form of communication for the Internet and mobile devices.

Video search engines are the product of progress in many technologies: visual and audio analysis, machine learning techniques, as well as visualization and interaction.

giovedì 1 luglio 2010

Page 3: Vidivideo and IM3I

Two solutions

www.im3i.euwww.vidivideo.info

giovedì 1 luglio 2010

Page 4: Vidivideo and IM3I

VidiVideo: project overview

The VidiVideo project addressed the challenge of creating a substantially enhanced semantic access to video, implemented in a search engine.

The outcome of the project is an audio-visual search engine, composed of two parts: a automatic annotation part, that runs off-line, where detectors for more than 1000 semantic concepts are collected in a thesaurus to process and automatically annotate the video and an interactive part that provides a video search engine for both technical and non-technical users.

giovedì 1 luglio 2010

Page 5: Vidivideo and IM3I

VidiVideo: project resultsThe automatic annotation part of the system performs audio and video segmentation, speech recognition, speaker clustering and semantic concept detection.

The VidiVideo system has achieved the highest performance in the most important object and concept recognition international contests (PASCAL VOC and TRECVID).

The interactive part provides a desktop-based and a web-based search engines. The system permits different query modalities (free text, natural language, graphical composition of concepts using boolean and temporal relations and query by visual example) and visualizations for video retrieval and browsing.

giovedì 1 luglio 2010

Page 6: Vidivideo and IM3I

VidiVideo: project partners

Page 1/73

© 2009 EUTV Consortium EUTV Adaptive Channels in Europe

CONFIDENTIAL

EUTV – Adaptive Channels in Europe

Type CP

Call Identifier FP7-SME-2010-1

Submitted 03 December 2009 Name of the co-ordinating person Dr.-Ing. Georgios Ioannidis

E-Mail [email protected]

Fax +49-179-33-2286677 No. Participant Name Type Short Name Country 1 IN2 search interfaces development Ltd SME IN2 UK 2 spring techno GmbH SME SPRING DE 3 VISup Srl SME VISUP IT 4 Hogeschool voor de Kunsten Utrecht RTDP HKU NL 5 University Firenze RTDP UNIFI IT 6 Instituto de Engenharia de Sistemas e

Computadores RTDP INESC-ID PT

giovedì 1 luglio 2010

Page 7: Vidivideo and IM3I

IM3I: project overviewIM3I aims to provide the creative media sector with new ways of searching, summarising and visualising large multimedia archives.

IM3I will provide a service-oriented architecture that allow multiple viewpoints upon multimedia data that are available in a repository, and provide better ways to interact and share rich media. This paves the way for a multimedia information management platform which is more flexible, adaptable and customisable than current repository software.

This in turn enables new opportunities for content owners to exploit their digital assets.

giovedì 1 luglio 2010

Page 8: Vidivideo and IM3I

IM3I: project resultsDeveloped a set of tools for automatic audio-visual annotation and search

Developed a set of web services to manage, create and orchestrate the indexing services

Developed a set of specialized search and management interfaces

IM3I authoring platform: allows professional users to import and publish repositories of digital media, authoring of web-based environments for the end-users, creation of elaborate workflow patterns and search & retrieval interfaces to allow a diversity of end-user interactions and scenarios

giovedì 1 luglio 2010

Page 9: Vidivideo and IM3I

IM3I: project partners

giovedì 1 luglio 2010

Page 10: Vidivideo and IM3I

The IM3I backend

giovedì 1 luglio 2010

Page 11: Vidivideo and IM3I

Visual annotation

• Split a video detecting shots and large content changes with very fast algorithm

• Use different annotation strategies and types of detectors:

• low level (color, B/W, motion)

• Haar-based boosted classifiers

• HOG + SVMs

• Bag-of-words

• k-NN + voting (for tag suggestion)

• simple MPEG-7 XML format (full and fragment)

giovedì 1 luglio 2010

Page 12: Vidivideo and IM3I

Baseline: typical BoW

Learning

Featureextract.

Hierarch.clustering

visual wordshisto

giovedì 1 luglio 2010

Page 13: Vidivideo and IM3I

Fusion schemes

• Early fusion: integrates unimodal features before learning concepts.

• Late fusion: first reduces unim. feat. to separately learned concepts scores, then these scores are integrated to learn concepts.

giovedì 1 luglio 2010

Page 14: Vidivideo and IM3I

Fusion schemes

• Early fusion: integrates unimodal features before learning concepts.

• Late fusion: first reduces unim. feat. to separately learned concepts scores, then these scores are integrated to learn concepts.

giovedì 1 luglio 2010

Page 15: Vidivideo and IM3I

Early fusion approach

• Hypothesis: MSER isolate semantically relevant information.

• Idea: represent points that have some spatial relation with regions that are inside, outside, just on the border

• Sampling: SIFT-SURF, dense.

Hierarch.clustering

giovedì 1 luglio 2010

Page 16: Vidivideo and IM3I

Late fusion approach

• Use SURF/SIFT + MSER

• Use geometric descriptors for MSERs

!1 !2

!"###$%#&'%(!")#*%+,$-#&'-(!")#*%+......$%#&'%(!")#*/+,$-#&'-(!")#*/+#

!"#

Hierarch.clustering

Hierarch.clustering

giovedì 1 luglio 2010

Page 17: Vidivideo and IM3I

Test: baseline

• Best: SURF 64 Grid 10 (accuracy, computational cost)

• SURF 64 Grid 5: +7-8% accuracy, +300% time

• the number of points influences accuracy

Method Sampling # points Time Time Avg. accuracy

Max accuracy

giovedì 1 luglio 2010

Page 18: Vidivideo and IM3I

Test: early fusion

• Best: EF SURF 64 Grid 10 (accuracy, computational cost)

• EF SURF 64 Borders: many points, accuracy ~ that of Grid 10 but higher computational costs

• EF SURF 64 Grid 10 is worst than SURF 64 Grid 10, but much faster (50% of execution time)

Avg. accuracy

Max accuracy

Method TimeSampling # points Time

giovedì 1 luglio 2010

Page 19: Vidivideo and IM3I

Test: late fusion

• weighting 0.6 (best method) and 0.4 (worst method) lead to good results

• best performance: dense sampling + sparse sampling

• best combination: SURF 64 + EF SURF 64 Grid 10 (improved accuracy, modest computational cost increase)

Method 1 Method 2 Accuracy

giovedì 1 luglio 2010

Page 20: Vidivideo and IM3I

Conclusions

• Early fusion strategies:

• ~ baseline accuracy

• faster

• Late fusion strategies:

• better accuracy than baseline

• each method corrects some errors made by the other

• fuse keypoints/regions (SURF, fusion of SURF and MSER)

• IM3I users will be able to chose what’s best for them

giovedì 1 luglio 2010

Page 21: Vidivideo and IM3I

The users

giovedì 1 luglio 2010

Page 22: Vidivideo and IM3I

Video search engine

Our goal is to provide a search engine for videos for both technical and non-technical users.

Provide different interfaces that permit different query modalities: free-text, natural language, graphical composition of concepts using boolean and temporal relations and query by visual example.

In addition, exploit ontologies and their structure to encode semantic relations between concepts permitting, for example, to expand queries to synonyms and concept specializations.

giovedì 1 luglio 2010

Page 23: Vidivideo and IM3I

Sirio and Orione

• System features:

• Sirio is a Rich Internet Application (in Adobe Flex) front end.

• Orione is web service search engine

• Support for multiple ontologies and ontology reasoning

• Results are in Media RSS format (queries treated as RSS feeds)

• New search engine able to scale to large number of instances of ontology concepts

• System interface query options:

• ontology exploration using a graph-based view

• compact keyframe-based results presentation / streaming videos

• concept drag&drop facility (to build complex queries)

• natural language query (with Boolean/temporal ops.)

• free text query (for Google-like search)

• Design goals/assumptions:

• semantic content-based retrieval

• efficient web-based interface

giovedì 1 luglio 2010

Page 24: Vidivideo and IM3I

Sirio and Orione

giovedì 1 luglio 2010

Page 25: Vidivideo and IM3I

Sirio and Orione

giovedì 1 luglio 2010

Page 26: Vidivideo and IM3I

Sirio and Orione

giovedì 1 luglio 2010

Page 27: Vidivideo and IM3I

Sirio and Orione

giovedì 1 luglio 2010

Page 28: Vidivideo and IM3I

Sirio and Orione

giovedì 1 luglio 2010

Page 29: Vidivideo and IM3I

Sirio and Orione

giovedì 1 luglio 2010

Page 30: Vidivideo and IM3I

Sirio and Orione

giovedì 1 luglio 2010

Page 31: Vidivideo and IM3I

Sirio and Orione

giovedì 1 luglio 2010

Page 32: Vidivideo and IM3I

Andromeda

• Design goals/assumptions:

• semantic content-based browsing

• efficient web-based interface using RIA

• System features:

• Query manager as a Rich Internet Application (in Adobe Flex). Connects to web service (search engine)

• Support for multiple ontologies and ontology reasoning

• System interface query options:

• Shows the concepts with more instances in a concept cloud view

• Graph representation of semantic data structure

• Multiple automatic layout algorithms for spatial positioning and manual drag & drop

• Thumbnails view of the instances of each concept

• Access to video metadata and video streaming

• Access to social content related to ontology concepts (Flickr, YouTube, and real time tweets from Twitter)

giovedì 1 luglio 2010

Page 33: Vidivideo and IM3I

Andromeda

giovedì 1 luglio 2010

Page 34: Vidivideo and IM3I

Andromeda

giovedì 1 luglio 2010

Page 35: Vidivideo and IM3I

Andromeda

giovedì 1 luglio 2010

Page 36: Vidivideo and IM3I

Andromeda

giovedì 1 luglio 2010

Page 37: Vidivideo and IM3I

Andromeda

giovedì 1 luglio 2010

Page 38: Vidivideo and IM3I

Andromeda

giovedì 1 luglio 2010

Page 39: Vidivideo and IM3I

Pan• Design goals/assumptions:

• complete/correct automatic annotations

• help in training new automatic concept detectors

• System features:

• Rich Internet Application (in Adobe Flex).

• video streaming using the same system of Sirio and Andromeda

• new backend

• geotagging using Google Maps

• System interface options

• Integrated with web-based search engine and automatic video annotation

• Multiple user profiles: a simple user may change his own annotations, while a super user can import the annotations of other users, e.g. to supervise the annotation process within an organization.

giovedì 1 luglio 2010

Page 40: Vidivideo and IM3I

Pan

!!giovedì 1 luglio 2010

Page 41: Vidivideo and IM3I

Pan

!!giovedì 1 luglio 2010

Page 42: Vidivideo and IM3I

Pan

!!giovedì 1 luglio 2010

Page 43: Vidivideo and IM3I

Pan

!!giovedì 1 luglio 2010

Page 44: Vidivideo and IM3I

Pan

giovedì 1 luglio 2010

Page 45: Vidivideo and IM3I

Pan

giovedì 1 luglio 2010

Page 46: Vidivideo and IM3I

Daphnis• Design goals/assumptions:

• build on image tagging made popular by Flickr and tag clouds

• connect to social web sites

• allow CBIR

• System features:

• Rich Internet Application (in Adobe Flex).

• Connects to Flickr (and also Facebook, if needed)

• Approximate nearest neighbour search using MPEG-7 descriptors, to scale to large number of images

• System interface options

• users can tag images and retrieve images based on tags, or use tags to filter the results of similarity based retrieval.

• Ongoing work:

• merging with automatic video annotation for automatic tagging

• adoption of mechanisms for tag suggestion, based on recent research work in this field (use content, tags and geolocalization)

giovedì 1 luglio 2010

Page 47: Vidivideo and IM3I

Daphnis

!!

giovedì 1 luglio 2010

Page 48: Vidivideo and IM3I

Daphnis

giovedì 1 luglio 2010

Page 49: Vidivideo and IM3I

Daphnis

!!

giovedì 1 luglio 2010

Page 50: Vidivideo and IM3I

Daphnis

giovedì 1 luglio 2010

Page 51: Vidivideo and IM3I

IM3I: authoring platformA CMS approach to repository

analysis, authoring and publication

giovedì 1 luglio 2010

Page 52: Vidivideo and IM3I

IM3I: authoring platform

Authoring IM3I end-user functionality typically covers 5 distinctive stages:

• Importing an existing repository from RSS and various XML streams

• Extending the associated datamodel

• Editing layout and editing features

• Editing Search and Retrieval interfaces

• Embedding the IM3I end-user interfaces in a (corporate) website

giovedì 1 luglio 2010

Page 53: Vidivideo and IM3I

Editing workflow demo

•Step 1: Importing a video-repository

•Step 2: Enhancing the datamodel

•Step 3: Authoring layouts

•Step 4: Publishing the repository

giovedì 1 luglio 2010

Page 54: Vidivideo and IM3I

I: Importing a repository

•Importing an existing repository to an internal and flexible datamodel

•Aggregating and harmonizing multiple repositories

•Visualisation of markup and preview of contents

•Flexibly mapping by drag-and-drop

giovedì 1 luglio 2010

Page 55: Vidivideo and IM3I

I: Importing a repository

Mapping the contents of video RSS to an IM3I Datamodel

giovedì 1 luglio 2010

Page 56: Vidivideo and IM3I

II: Enhancing the Datamodel•Datamodels contain the descriptions of your repository and in this way stipulate what can be shown to- or retrieved by an end-user.

•Datamodels can reference to each other

•Datamodels can be extended overtime by adding elements

•Elements are based on types: media files, URIs, date, string, etc.

•Elements can be shared across datamodels to allow search & retrieval across multiple collections

giovedì 1 luglio 2010

Page 57: Vidivideo and IM3I

II: Enhancing the Datamodel

Adding a ‘translation’ element to the datamodel

giovedì 1 luglio 2010

Page 58: Vidivideo and IM3I

II: Enhancing the Datamodel

Adding a ‘translation’ element to the datamodel

giovedì 1 luglio 2010

Page 59: Vidivideo and IM3I

III: Layout and Functionality

Easy manipulation of layout to a repository by:

•Table metaphor (easy editing of table characteristics)

•Drag and drop graphical elements

•Drag and drop contents of repository in cells

•Easy manipulation of look and feel

•Easy adding editing functionalities to a layout

•Easy preview and markup functionalities

giovedì 1 luglio 2010

Page 60: Vidivideo and IM3I

III: Layout and Functionality

Defining a layout table

giovedì 1 luglio 2010

Page 61: Vidivideo and IM3I

III: Layout and Functionality

Dragging repository contents to layout

giovedì 1 luglio 2010

Page 62: Vidivideo and IM3I

III: Layout and Functionality

Previewing layout

giovedì 1 luglio 2010

Page 63: Vidivideo and IM3I

IV: Embedding in website

Easy blend- in of layouts in corporate websites

•By means of plugins for CMSs (e.g. WebManager, WordPress, Typo3)

•Using <embed> </embed>

•Allowing for elaborate workflow patterns in combining multiple layouts

giovedì 1 luglio 2010

Page 64: Vidivideo and IM3I

IV: Embedding in website

Original contents Added

Translation Functionality

giovedì 1 luglio 2010

Page 65: Vidivideo and IM3I

The super users

giovedì 1 luglio 2010

Page 66: Vidivideo and IM3I

Atlante - process manager

• Web application that is used for creation, technical administration and monitoring of IM3I processing pipeline (e.g. automatic annotation process, media transcoding, etc.)

• This web application has multiple user profile:

• managers

• administrators

• Main functions of this application are:

• creation of new type of (distributed) process

• params setting for new type of process

• creation of “Multiprocess” composed by sets of single (distributed) Processes

• starting/pausing/stopping a process

• monitoring running processes

giovedì 1 luglio 2010

Page 67: Vidivideo and IM3I

Atlante

!!

giovedì 1 luglio 2010

Page 68: Vidivideo and IM3I

Atlante

!!

giovedì 1 luglio 2010

Page 69: Vidivideo and IM3I

Atlante

!!

giovedì 1 luglio 2010

Page 70: Vidivideo and IM3I

Gaia - media manager

• Web application that will be used for a technical administration and monitoring of the database

• Main functions of this application are:

• media management

• configuration of metadata, broadcasters, Annotations types, Concept types and Media types

• media annotations monitoring by technical backend

giovedì 1 luglio 2010

Page 71: Vidivideo and IM3I

Gaia

!!giovedì 1 luglio 2010

Page 72: Vidivideo and IM3I

Gaia

!!

giovedì 1 luglio 2010

Page 73: Vidivideo and IM3I

One more thing...

giovedì 1 luglio 2010

Page 74: Vidivideo and IM3I

giovedì 1 luglio 2010

Page 75: Vidivideo and IM3I

giovedì 1 luglio 2010

Page 76: Vidivideo and IM3I

ACM MM 2010 Workshop3rd International Workshop on Automated Information Extraction in Media Production

AIEMPro'10

Organizers:

Dr. Robbie De Sutter

Vlaamse Radio- en Televisieomroep - Medialab

Jean-Pierre Evain

European Broadcasting Union . Union Européenne de Radiotélévision

Dr. Gerald Friedland

ICSI (International Computer Science Institute)

Dr. Alberto Messina

RAI Radiotelevisione Italiana, Centre for Research and Technological Innovation

Dr. Masanori Sano

NHK (Japan Broadcasting Corporation) Science and Technology Research Laboratories

giovedì 1 luglio 2010