web system ontology based

4
A WEB SYS TEM FOR ONTOLOGY-BASE D MUL TIMEDIA ANNOTA TION, BROWSI NG AND SEARCH  M. Bertini, G. Becchi, A. Del Bimbo, A. Ferracani, D. Pezzatini University of Florence - MICC Firenze, Italy ABSTRACT In this paper we present a complete system for semantic and syn- tactic annotation, browsing and search of multimedia data, that is based on a servi ce orient ed architectur e, with web-based interf aces developed following the Rich Internet Application paradigm. The system has been designed to be: i) exible and extendable, allowing users to select only the services they need or to add their own tools to the multimedia processing pipelines; ii) distributed, with services that can be executed in a cloud computing infrastruc- ture and access ed throug h web application s; iii) user-friendly, with interfaces that have a uniform interface on every platform and that have an interaction level similar to that of desktop applications. Exten siv e user trials in real- world setup, perfor med by archi ve and broadcaster professionals, have shown the efcacy and usability of the proposed solution.  Index T erms Multi media databas e, multi media author ing, content analysis, content-based retrieval. 1. INTRODUCTION Recently two surveys to gather user requirements for video annota- tion and search systems have been conducted within the EU funded research projects VidiVideo 1 and IM3I 2 . More than 50 profession- als working in broadcasters, national video archives, photographic archives and cultural heritage organizations have participated to the surv eys. One of the main outcome s is that multimedia annotati on and management system have to be web-based. In fact, this require- ment was deemed “mandatory” by 75% of the interviewees and “de- sirable” by another 20% [1, 2]. Other interesting results are that con- trolled lexicons and ontologies are widely used, respectively by 64% and 39% of the interviewees, and that 71% of users requested the possibility to have combinations of search mechanisms that account for structured (e.g. metadata, controlled lexicons and ontologies) and unstructured data (e.g. free text and transcriptions). However , most of the annotation and search system developed by the scientic mul- timedia community are desktop applications [3, 4, 5, 6, 7, 8] that hav e search and browsing tools designed for partic ipatio n to scien- tic competition s, like TRECVID and Vi deoOly mpics , rather than for end-users, like broadcaster and video archive professionals. Re- cently some video search engines have been designed as web appli- cations [9, 10, 11] because of the convenience of using browsers as clients that access a common search engine. To satisfy the needs expressed by the surveys we have developed a system that offers an integ rated service-o riente d envi ronmen t for 1 http://www.vidivideo.info 2 http:/www.im3i.eu processing, analysing, indexing, tagging, and searching multimedia content, at the syntactic and semantic level. 2. THE SYSTEM The system presented in this paper 3 provides a service-oriented ar- chitecture (SOA) that allows for multiple viewpoints of multimedia data inside repositories, providing better ways to reuse, repurpose and share rich media. This pav es the way for a multimedi a infor- mation management platform that is more exible, adaptable, and customizable . In fact, a SOA prov ides methods for systems devel- opment and integration by packaging system functionalities as inter- operable services, that are the building blocks of the system. A SOA infrastructure allows different applications to communicate with one another, in loosely coupled way, by passing data in a shared format or by orchestrating the activity of the services. One of the outcomes of this architectural choice is that system deployment in existing in- frastructures and workows does not require to redesign them, since it becomes possible to simply complement them, adding only the services that are required. This latter point is particularly important when considering organizations like broadcasters or national video archives, that can not completely redesign their existing systems. An overview of the system architecture, composed by four main layers , is sho wn in Fig. 1: Inter face and Authori ng Layer; Archi- tecture layer and Analysis Layer. Communication between analysis and interface layers is routed through the architecture layer, which also takes care of the main repository functions. The Analysis layer is responsible for extracting low level fea- tures and semantic annotations from media les, through a series of processing pipelines that can be executed in a cloud of servers, or- chestrated by apposite services. The Interface and Authoring layers are composed of several components, ranging from specialized inter- faces for annotation and search to basic UI widgets. A main compo- nent in the gure is the authoring layer. This component is dedicated to the composition and creation of search, browsing, and editing interfaces for end-users, combining ready-made interface building blocks. Aut oma tic multimedia annota tion is per for med by use r denable processing pipelines; the system provides a number of services for syntac tic and semantic audio and video content annotat ion. These services can be combined in processing pipelines, to create more comple x services. For examp le, a video annotat ion pipeline , that can be created, modied and managed using some of the services provided by this system, is shown in Fig. 2. Annotation of visual content is performed using an implemen- tation of the Bag-of-Visual-Words paradigm, based on a fusion of MSER [12], SURF [13] and SIFT [14] features and the Pyramid 3 Available for testing at: URL hidden for double blind review

Upload: andrea-ferracani

Post on 07-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Web System Ontology Based

8/6/2019 Web System Ontology Based

http://slidepdf.com/reader/full/web-system-ontology-based 1/4

A WEB SYSTEM FOR ONTOLOGY-BASED MULTIMEDIA ANNOTATION, BROWSING ANDSEARCH

 M. Bertini, G. Becchi, A. Del Bimbo, A. Ferracani, D. Pezzatini

University of Florence - MICCFirenze, Italy

ABSTRACT

In this paper we present a complete system for semantic and syn-

tactic annotation, browsing and search of multimedia data, that is

based on a service oriented architecture, with web-based interfaces

developed following the Rich Internet Application paradigm.The system has been designed to be: i) flexible and extendable,

allowing users to select only the services they need or to add their

own tools to the multimedia processing pipelines; ii) distributed,

with services that can be executed in a cloud computing infrastruc-

ture and accessed through web applications; iii) user-friendly, with

interfaces that have a uniform interface on every platform and that

have an interaction level similar to that of desktop applications.

Extensive user trials in real-world setup, performed by archive

and broadcaster professionals, have shown the efficacy and usability

of the proposed solution.

  Index Terms— Multimedia database, multimedia authoring,

content analysis, content-based retrieval.

1. INTRODUCTION

Recently two surveys to gather user requirements for video annota-

tion and search systems have been conducted within the EU funded

research projects VidiVideo1 and IM3I2. More than 50 profession-

als working in broadcasters, national video archives, photographic

archives and cultural heritage organizations have participated to the

surveys. One of the main outcomes is that multimedia annotation

and management system have to be web-based. In fact, this require-

ment was deemed “mandatory” by 75% of the interviewees and “de-

sirable” by another 20% [1, 2]. Other interesting results are that con-

trolled lexicons and ontologies are widely used, respectively by 64%

and 39% of the interviewees, and that 71% of users requested the

possibility to have combinations of search mechanisms that account

for structured (e.g. metadata, controlled lexicons and ontologies) and

unstructured data (e.g. free text and transcriptions). However, most

of the annotation and search system developed by the scientific mul-

timedia community are desktop applications [3, 4, 5, 6, 7, 8] that

have search and browsing tools designed for participation to scien-

tific competitions, like TRECVID and VideoOlympics, rather than

for end-users, like broadcaster and video archive professionals. Re-

cently some video search engines have been designed as web appli-

cations [9, 10, 11] because of the convenience of using browsers as

clients that access a common search engine.

To satisfy the needs expressed by the surveys we have developed

a system that offers an integrated service-oriented environment for

1http://www.vidivideo.info

2http:/www.im3i.eu

processing, analysing, indexing, tagging, and searching multimedia

content, at the syntactic and semantic level.

2. THE SYSTEM

The system presented in this paper3 provides a service-oriented ar-chitecture (SOA) that allows for multiple viewpoints of multimedia

data inside repositories, providing better ways to reuse, repurpose

and share rich media. This paves the way for a multimedia infor-

mation management platform that is more flexible, adaptable, and

customizable. In fact, a SOA provides methods for systems devel-

opment and integration by packaging system functionalities as inter-

operable services, that are the building blocks of the system. A SOA

infrastructure allows different applications to communicate with one

another, in loosely coupled way, by passing data in a shared format

or by orchestrating the activity of the services. One of the outcomes

of this architectural choice is that system deployment in existing in-

frastructures and workflows does not require to redesign them, since

it becomes possible to simply complement them, adding only the

services that are required. This latter point is particularly importantwhen considering organizations like broadcasters or national video

archives, that can not completely redesign their existing systems.

An overview of the system architecture, composed by four main

layers, is shown in Fig. 1: Interface and Authoring Layer; Archi-

tecture layer and Analysis Layer. Communication between analysis

and interface layers is routed through the architecture layer, which

also takes care of the main repository functions.

The Analysis layer is responsible for extracting low level fea-

tures and semantic annotations from media files, through a series of 

processing pipelines that can be executed in a cloud of servers, or-

chestrated by apposite services. The Interface and Authoring layers

are composed of several components, ranging from specialized inter-

faces for annotation and search to basic UI widgets. A main compo-

nent in the figure is the authoring layer. This component is dedicatedto the composition and creation of search, browsing, and editing

interfaces for end-users, combining ready-made interface building

blocks.

Automatic multimedia annotation is performed by user definable

processing pipelines; the system provides a number of services for

syntactic and semantic audio and video content annotation. These

services can be combined in processing pipelines, to create more

complex services. For example, a video annotation pipeline, that

can be created, modified and managed using some of the services

provided by this system, is shown in Fig. 2.

Annotation of visual content is performed using an implemen-

tation of the Bag-of-Visual-Words paradigm, based on a fusion of 

MSER [12], SURF [13] and SIFT [14] features and the Pyramid

3Available for testing at: URL hidden for double blind review

Page 2: Web System Ontology Based

8/6/2019 Web System Ontology Based

http://slidepdf.com/reader/full/web-system-ontology-based 2/4

Semantic

search / browse

Syntactic search / 

browse

Database

Media

storage

Video/Image

analysis pipeline

Audio analysis

pipeline

user-definedpipeline

Local SOA architecture

System SOA

System

repository

Corporate CMS

End-user

interfaces

Authoring

environment

File

storage

Analysis

Interface layer Authoring layer

SOA Architecture layer

Analysis layer

Fig. 1. Overall view of the system architecture.

Ingestion/ transcoding

Segmentation

BoW annotation

Video streaming 

transcodingCBIR indexing

Audio extraction

To the audioprocessing

pipeline

Fig. 2. Example of the automatic annotation pipeline for videos,

built using services provided by the system. Users can create their

own processing pipeline combining the services provided by the sys-

tem, or other existing pipelines.

Matching Kernel [15]. Audio annotation is based on a fusion of 

timbral texture features like ZCR, MFCCs, chroma and spectral fea-

tures and SVM classifiers. CBIR retrieval is performed using rhythm

and pitch features for audio and MPEG-7 features for visual data, in

particular using a combination of Scalable Color, Color Layout and

Edge Histogram descriptors. To address the problem of scalability

in large-scale archive these features are indexed using the approx-

imate similarity search approach presented in [16]. Semantic-level

search and browsing is performed through a search engine that uses

the ontology design presented in [17]; ontology-based reasoning us-

ing concept relations, subsumption and WordNet synonyms is em-

ployed for query expansion. The graph of the ontology concepts is

used also to browse the media archives (Fig. 4). The search engine

works also with free text annotations and transcriptions, and can be

used as a web service or through specific web applications. Other

services and specialized interfaces allow tagging and syntactic-level

content-based retrieval. Publishing functionalities are provided by a

set of services and interfaces of the authoring platform. This plat-

form allows to import and publish existing media repositories and

to author web-based environments that let end-users to interact with

the repositories. Authors can create elaborate workflow patterns and

Fig. 3. Screenshots of some of the annotation tools; top) AJAX tool

for tagging, ontology-based and audio transcription video annota-

tion, bottom) adding geographical metadata to concept annotations.

Fig. 4. Screenshot of the browse application: the concept cloud isused to start browsing, the graph shows a reduced view of the on-

tology around a selected concept. Users can inspect instances of the

concept stored in the system or search them in other repositories like

Youtube and Flickr.

search interfaces, that can be embedded in a variety of commercial

CMS systems.

Using AJAX, Flash/Flex, Silverlight and other Rich Internet Ap-

plication (RIA) technologies [18] make it is possible to develop web

applications that are highly responsive [19] and allow more advanced

interaction. The quality of interaction, made essential for users by

modern desktop applications and operating systems, is achieved by

means of drag&drop, advanced widgets and advanced multimedia

Page 3: Web System Ontology Based

8/6/2019 Web System Ontology Based

http://slidepdf.com/reader/full/web-system-ontology-based 3/4

Fig. 5. Screenshots of some of the search tools; top) advanced

ontology-based video search (Google-like search is also available),

bottom) CBIR and image tagging. Video keyframes can be used to

select visually similar videos and images.

support, that are not available in traditional web-based applications

[20]. Other benefits are the improvement of the server performance

because of the distribution of the computational burden also on the

client and the easy distribution of new versions of the application,

that is downloaded by the clients every time that it is used. All theweb applications of the system have been developed according to

the RIA paradigm. In particular the applications of the Interface and

Authoring layers are developed in AJAX and Flash/Flex, while data

is exchanged using SOAP, RSS and JSON for metadata and RTMP

for video streaming. Fig. 3, 4 and 5 show some screenshots of the

manual annotation (to check automatic annotations, add metadata or

create ground truth annotations to train new automatic concept detec-

tors), browse , search (using different modalities) and tagging/CBIR

tools.

3. EXPERIMENTS

The system presented in this paper has been thoroughly tested in

several field trials with the participation of 19 multimedia archiveand broadcaster professionals in The Netherlands, Hungary, Italy

and Germany. The system was running on our servers while users

were at the venue of their organization, using the same PCs they use

for their daily work.

The goal of the field trials was to assess the usability of the sys-

tem, in particular letting the users to interact with the search en-

gine and its interfaces, to pose semantic- and syntactic-level queries,

but also to annotate, automatically and manually, some videos. The

methodology used follows the practices defined in the ISO 9241

standard, and considered the following four factors: usability, effec-

tiveness, efficiency and satisfaction. A set of activities that involved

the various interfaces have been selected. These activities allowed

to test several aspects of both the automatic and manual annotation

system (this activity was performed by a subset of 6 users) and the

features of the search/browse engines using different search modal-

ities (structured/unstructured/similarity based). The trials have been

followed by a debriefing of the users, that had to fill a questionnaire

to evaluate their impressions of the system and the perceived effec-

tiveness and usability. Given the fact that a such system is not yet of such a widespread use, and the fact that the interfaces of these types

of systems may require to understand the meaning and scope of var-

ious widgets, a user manual has been prepared for the users, to let

them obtain a basic understanding of the system. In addition to the

short manual, a simple system walkthrough (about 10 minutes long)

was presented to the users by test monitors. These monitors have

also taken observational notes and recorded verbal feedback from

users during the tests. These notes and the questionnaires have been

considered in a second stage of system design to improve the overall

usability, considering interface and workflow design.

Fig. 6. Overview of usability evaluation for the search tests: over-all usability of the system, usability of the combination of search

modalities.

The overall experience is very positive and the system proved to

be easy to use, despite the objective difficulty of interacting with a

complex system for which the testers received only a very limited

training. Fig 6 reports two results for the search activities. Users ap-

preciated the combination of different interfaces and functions. The

type of search modality that proved to be more suitable for the ma-

 jority of the users is the advanced interface, because it lets to build

queries with Boolean/temporal relations between concepts and con-

cepts’ relations, and of the possibility to use geographical and video

metadata, that is appealing for professional archivists.

Also the usability of the annotation components, both automatic

Page 4: Web System Ontology Based

8/6/2019 Web System Ontology Based

http://slidepdf.com/reader/full/web-system-ontology-based 4/4

Fig. 7. Overview of usability evaluation for the annotation tests: overall usability of the automatic annotation system, usability of the manual

annotation tool.

and manual, was satisfactory although some concerns remain regard-

ing the precision of automatic annotation, that is still too low for the

high standards of archivists. Fig. 7 reports two results for the au-

tomatic and manual annotation tools. None of the users had any

previous work experience of automatic video annotation systems but

they were trained in using a manual annotation tool developed within

their organization.

In general the comments recorded during the trials and those

gathered with the anonymous questionnaire have shows a high de-

gree of satisfaction for the system, and have provided interesting

hints for further improvement of the interfaces that, in part, have

already taken into account for further development.

4. CONCLUSIONS

In this paper we have presented a system, based on SOA back-end

and RIA front-end, that has been jointly designed by industrial and

academic partners of EU funded research projects. The system ar-

chitecture make it easily deployable, also in organizations that have

a well established multimedia management workflow.

The system provides functionalities for the management of au-

tomatic multimedia analysis pipelines, manual annotation tools,

searching and browsing tools and the authoring interfaces. It has

been thoroughly tested in a real-world setup by industry profession-

als, with good results, and is still under active development within

the scope of a EU funded technology transfer project.

5. REFERENCES

[1] “Deliverable D7.6 - validation of the user interface of the VIDI-Videosystem,” Tech. Rep., VidiVideo consortium, 2009.

[2] “Deliverable D2.1 - initial user requirements study,” Tech. Rep., IM3Iconsortium, 2009.

[3] J. Pickens, J. Adcock, M. Cooper, and A. Girgensohn, “FXPAL interac-tive search experiments for TRECVID 2008,” in Proc. of the TRECVID

Workshop, 2008.

[4] A. Natsev, W. Jiang, M. Merler, J.R. Smith, J. Tesic, L. Xie, and R. Yan,

“IBM Research TRECVid-2008 video retrieval system,” in Proc. of the

TRECVID Workshop, 2008.

[5] J. Cao, Y.-D. Zhang, B.-L. Feng, L. Bao, L. Pang, and J.-T. Li,“TRECVID 2009 of MCG-ICT-CAS,” in Proc. of the TRECVID Work-

shop, 2009.

[6] C.G.M. Snoek, K. E. A. van de Sande, O. de Rooij, B. Huurnink,J. R. R. Uijlings, M. van Liempt, M. Bugalho, I. Trancoso, F. Yan,M.A. Tahir, K. Mikolajczyk, J. Kittler, M. de Rijke, J.-M. Geusebroek,T. Gevers, M. Worring, D.C. Koelma, and A.W.M. Smeulders, “TheMediaMill TRECVID 2009 semantic video search engine,” in Proc. of 

the TRECVID Workshop, Gaithersburg, USA, November 2009.

[7] O. de Rooij and M. Worring, “Browsing video along multiple threads,” IEEE Transactions on Multimedia (TMM), vol. 12, no. 2, pp. 121 –130,2010.

[8] Y.-T. Zheng, S.-Y. Neo, X. Chen, and T.-S. Chua, “VisionGo: towardstrue interactivity,” in Proc. of CIVR, 2009.

[9] M. Bertini, G. D’Amico, A. Ferracani, M. Meoni, and G. Serra, “Sirio,Orione and Pan: an integrated web system for ontology-based video

search and annotation,” in Proc. of ACM MM , 2010.

[10] W. Bailer, W. Weiss, G. Kienast, G. Thallinger, and W. Haas, “A videobrowsing tool for content management in postproduction,” Interna-

tional Journal of Digital Multimedia Broadcasting, 2010.

[11] S. Vrochidis, A. Moumtzidou, P. King, A. Dimou, V. Mezaris, andI. Kompatsiaris, “VERGE: A video interactive retrieval engine,” inProc. of CBMI , 2010.

[12] J. Matas, O. Chum, M. Urban, and T. Pajdla, “Robust wide-baselinestereo from maximally stable extremal regions,” Image and Vision

Computing, vol. 22, no. 10, pp. 761 – 767, 2004.

[13] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, “SURF: Speeded UpRobust Features,” Computer Vision and Image Understanding (CVIU),

vol. 110, no. 3, pp. 346–359, 2008.

[14] D. G. Lowe, “Distinctive image features from scale-invariant key-points,” International Journal of Computer Vision (IJCV), vol. 60, no.2, pp. 91–110, 2004.

[15] K. Grauman and T. Darrell, “The pyramid match kernel: Efficientlearning with sets of features,” Journal of Machine Learning Research

(JMLR), vol. 8, pp. 725–760, 2007.

[16] G. Amato and P. Savino, “Approximate similarity search in metricspaces using inverted files,” in Proce. of InfoScale, 2008.

[17] L. Ballan, M. Bertini, A. Del Bimbo, and G. Serra, “Video annotationand retrieval using ontologies and rule learning,” IEEE MultiMedia,

vol. 17, no. 4, pp. 80–88, Oct.-Dec. 2010.

[18] P. Fraternali, G. Rossi, and F. Sanchez-Figueroa, “Rich internet appli-cations,” IEEE Internet Computing, vol. 14, pp. 9–12, 2010.

[19] T. Leighton, “Improving performance on the internet,” Communica-

tions of the ACM , vol. 52, pp. 44–51, February 2009.

[20] P. Fraternali, S. Comai, A. Bozzon, and G. T. Carughi, “Engineeringrich internet applications with a model-driven approach,” ACM Trans-

actions on the Web (TWEB), vol. 4, pp. 7:1–7:47, April 2010.