r1 cubrik integrated platform release

R1 CUBRIK INTEGRATED PLATFORM RELEASE Human-enhanced time-aware multimedia search

CUBRIK Project IST-287704

Deliverable D9.3 WP9

Deliverable Version 1.0 – 31 August 2012

Document. ref.: cubrik.D93.ENG.WP9.V1.0

R1 CUbRIK Integrated Platform release D9.3 Version 1.0

Programme Name: ...................... IST Project Number: ........................... 287704 Project Title : .................................. CUBRIK Partners : ........................................ Coordinator: ENG (IT)

Contractors: UNITN, TUD, QMUL, LUH, POLMI, CERTH, NXT, MICT, ATN, FRH, INN, HOM, CVCE, EIPCM

Document Number : ..................... cubrik.D93.ENG.WP9.V1.0 Work-Package : ............................. WP9 Deliverable Type: ........................ Document Contractual Date of Delivery: ..... 31 August 2012 Actual Date of Delivery : .............. 31 August 2012 Title of Document : ....................... R1 CUBRIK Integrated Platform Release Author(s): ..................................... Vincenzo Croce (ENG) Approval of this report ............... Summary of this report: .............. CUbRIK Release 1 Accompanying document

History : .......................................... Keyword List : ............................... Availability .................................... This report is public

This work is licensed under a Creative Commons Attribution-NonCommercial-

ShareAlike 3.0 Unported License. This work is partially funded by the EU under grant IST-FP7-287704


Disclaimer

This document contains confidential information in the form of the CUbRIK project findings, work and products and its use is strictly regulated by the CUbRIK Consortium Agreement and by Contract no. FP7- ICT-287704.

Neither the CUbRIK Consortium nor any of its officers, employees or agents shall be responsible or liable in negligence or otherwise howsoever in respect of any inaccuracy or omission herein.

The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7-ICT -2011-7) under grant agreement n° 287704.

The contents of this document are the sole responsibility of the CUbRIK consortium and can in no way be taken to reflect the views of the European Union.


Table of Contents EXECUTIVE SUMMARY 1

1. CUBRIK RELEASE 1 DESCRIPTION 2

1.1 CUBRIK PIPELINE 2 1.2 PIPELINES VS SEARCH PROBLEM 2 1.3 H-DEMO CONCEPT 4 1.4 WHAT IS IN R1 5

1.4.1 Release 1 in a Nutshell 6 1.5 CUBRIK SVN INFRASTRUCTURE 7 1.6 DEPLOYMENT ENVIRONMENT 8

1.6.1 SMILA 1.0 environment installation 9 1.6.2 SMILA 1.1 environment installation 9

2. PIPELINES OF R1 11

2.1 CUBRIK H-DEMO LOGO DETECTION 11 2.1.1 Logo detection H-Demo vs Cubrik pipeline 12 2.1.2 Data set description 13 2.1.3 Architecture overview 13 2.1.4 Third party library 14 2.1.5 Components integrated in SMILA for CUBRIK R1 14 2.1.6 How to install the Logo detection h-demo 15

2.2 CUBRIK MEDIA ENTITY ANNOTATION H-DEMO 25 2.2.1 Media Entity Annotation H-Demo vs Cubrik pipeline 25 2.2.2 Media Entity Annotation H-demo architecture 26 2.2.3 Multimedia crawlers and SMILA pipelets 26 2.2.4 How to install and run the Media Entity Annotation h-demo 26

3. CONCLUSION 28

Figures / Tables Figure 1: CUbRIK Pipeline structure ......................................................................................... 2 Figure 2: CUbRIK Pipelines – Processes tier ........................................................................... 3 Figure 3 Logo Detection H-Demo: Pipeline Kinds Vs Search Problem .................................... 4 Figure 4: CUbRIK H-Demos per Release ................................................................................. 5 Figure 5: CUbRIK R1 composition ............................................................................................ 5 Figure 6: Logo Detection - Content Analysis and Enrichment Pipeline .................................. 11 Figure 7: Search for non-indexed Brand name ....................................................................... 12 Figure 8 Logo Detection H-Dem: Pipelines ............................................................................. 13 Figure 9: Logo Detction H-Demo GUI ..................................................................................... 25 Figure 10: Media Entity Annotation h-demo architecture ........................................................ 26 Figure 11: SMILA pipeline for media Entity Annotation h-demo ............................................. 27

Table 1: Release 1 in a nutshell ................................................................................................ 6

R1 CUbRIK Integrated Platform release Page 1 D9.3 Version 1.0

Executive Summary

This deliverable is the document accompanying the first release of the CUbRIK platform as reported in D9.2 Delivery management Plan and testing specification .The latter introduces the concept of the CUbRIK Pipeline and describes in detail what is planned to have in each platform release. Some of the concepts are resumed in this document for Reader convenience. Moreover, from D.9.2 chapter 2, the release scheduling is reported in order to allow to check the plan vs what is actually delivered.

Chapter 1 describes what is contained in the release; the introduction in D9.2 Delivery management Plan and testing specification is extended to provide a comprehensive description of release parts and arrangement, Moreover, a description of the CUbRIK SVN infrastructure and Deployment environment are provided.

Chapter 2 provides an in depth description of the structure and installation details for the H-Demo and pipelines belonging to it, the components and third part libraries.

In Chapter 3 the Release Plan is checked against what was actually delivered. A reference of the paragraphs with delivery description is reported.


1. CUbRIK Release 1 description CUbRIK differs from the development of a monolithic do-it-all architecture. It follows a differential approach based on SMILA1 as the underlying framework for supporting workflow definition and execution. CUbRIK relies on a framework for executing processes (aka pipelines), consisting of collections of tasks. In order to understand the CUbRIK structure, that is reflected in the Release, it is necessary to provide to the Reader the resume of CUbRIK Pipeline concept. Moreover, for this release the H-Demo has to be referred too.

1.1 CUbRIK pipeline A CUbRIK pipeline is a conceptual workflow constituting a fragment of Search application business logic. It is composed by Jobs -automatic operations- and human activities that are chained in a sequence. Each pipeline is described by a workflow of tasks, allocated to executors. Task executors can be software components (e.g., data analysis algorithms, metadata indexing tools, search engines of different nature, result presentation modules, etc.) and can also be allocated to individual human users (e.g., via a gaming interfaces) or to an entire community (e.g., by a crowdsourcing component).

Figure 1: CUbRIK Pipeline structure

Each CUbRIK Job is implemented as a SMILA workflow that is constitute by Actions aggregation; an Action can be a:

1. Worker: a single processing component in an asynchronous workflow

2. Pipelet: a reusable component in a BPEL workflow used to process data contained in records

3. Pipeline. a synchronous BPEL process (or workflow) that orchestrates pipelets and other BPEL services (e.g. web services).

1.2 Pipelines vs Search problem The vast majority of search application can be represented with a high-level architecture structured in tiers, each one grouping functional modules, with the specific goal of connecting users with content.

In the context of CUbRIK this is facilitated by three main kind of processes –the Pipelines- constituting a comprehensive three steps approach:

� Content Analysis and Enrichment pipelines

1 http://www.eclipse.org/smila/


� Query execution pipelines

� Feedback acquisition and processing pipelines

The architecture figure below clearly shows the Process tier grouping of the three kind of pipelines.

Figure 2: CUbRIK Pipelines – Processes tier

Content analysis and Enrichment pipelines manage for make content searchable, that is, media low level features extraction ad further high level features inferring leveraging on the previous ones. Additionally, according to the Human in the loop approach, the pipelines manage the integration with human-executed tasks to enrich the “understanding” of the media content.

Query Pipelines are responsible for search User experience management. Simple and federated queries are analysed, refined, transformed, expanded and personalized. The Pipelines leverage on User Profile and Community knowledge in order to personalize and fine tune query processing. The latter, is another facet of Human in the loop approach in gaining query effectiveness exploiting the Crowd evaluation.

Feedback acquisition and processing pipelines are essentially focussed on supporting retrofit content analysis and enrichment with the gathered feedbacks from the Community. In particular relevance feedback via social computation techniques is managed in order to develop feedback experiments, both in closed lab environment and using crowdsourcing.

A typical example of a Search driven application, realized on CUbRIK Platform, in the specific domain of logo detection in videos of grocery interiors is the H-Demo Logo


detection. The following figure depicts the Big picture of the Pipeline Vs Search problem approach:

Figure 3 Logo Detection H-Demo: Pipeline Kinds Vs S earch Problem

1.3 H-Demo concept In the DoW, the realization of two CUbRIK applications is planned in accordance to vertical domains identified: for History of Europe and SME Innovation. In addition to these applications some demonstrators were realized; the aim is to have intermediate demonstrators, not planned in the DoW, to proof and demonstrate some general purpose functionalities, These refer to horizontal domains identified as potential demonstration fields.

In CUbRIK seven H-Demos were defined according to the analysis of domain of practice :

1. Logo detection

2. News history

3. People Identification

4. LikeLines: Time-point specific search via implicit-user derived information

5. Media Entity Annotation

6. Crosswords

7. Accessibility aware Relevance feedback

Logo detection and Media Entity Annotation H-demos are the two H-demo planned to be part of this CUbRIK release. The following table is reported from D9.2 Delivery Management Plan:


Figure 4: CUbRIK H-Demos per Release

1.4 What is in R1 D9.2 Delivery Management Plan describes the structure of each release; it consists essentially of:

1. Release accompanying Document

2. Artefacts

Figure 5: CUbRIK R1 composition

The first is the Release guideline handling the comprehensive platform installation including the environment set up and dependencies management. It is constituted by this document.

As introduced CUbRIK framework is not a monolithic piece of software. It is structured in four layers. Layers group the CUbRIK framework artefacts, that are: Interfaces, CUbRIK Apps , Pipelines – including automatic and human tasks-, components –of different kind- and core services. According to this structure each CUbRIK release is composed by group of Artefacts corresponding to the advancement achieved at the specific time, provided in packaged way.

In general the three different kind of artefacts that belong each release are:

� Vertical Demos

� Horizontal demos

� Platform Service.

Actually Vertical Demos are planned to be release as part of final CUbRIK release R5 (M36).


So, for this Release 1 what is planned to be delivered and packaged are Horizontal demos and Platform services.

Figure 4: CUbRIK H-Demos per Release reports the two H-Demo planned for this release: � Logo Detection: a use case foreseen a User query consisting of a brand name and to

produce a report that identifies all the occurrences of logos of that brand in a given set of video files recorded inside a Mall.

� Media Entity Annotation: use case demonstrating the harvesting of representative images for named entities stored in the entity repository; the goal of this version is to enhance the entity repository to include also multimedia based content which can be used to visualize the named entities, e.g. in entity search results.

Lastly, Platform services is in charge to implement services like execution engine, task management, persistency & cache support etc. As described in D9.8 Architecture Description, SMILA is exploited as CUbRIK framework supporting workflow definition and execution, it is essentially corresponding to the Platform Services layer. Starting from some existent services –like execution engine, task management, persistency & cache support- SMILA is extended in the course of the project with the goal of having a full functional coverage of CUbRIK Platform Services layer. Based on the features list of SMILA v.0.9 (October 2011) an analysis was performed to identify extensions to be realized; it is reported in D9.2 Delivery Management Plan, in particular for this Release 1 the plan includes

o Self-scaling ETL - dynamic scaling of data import.

1.4.1 Release 1 in a Nutshell

The table below, from D9.2 Delivery Management Plan, provides Release 1 in a nutshell , it reports the plan for the first year of the project. What is going to be delivered column, groups, at the same level, pipelines, components and related datasets belonging the Artefacts of R1, with details included:

Delivery Date

Release version

What is going to be delivered

In detail

M12 1.0 Pipelines for multimodal content analysis & enrichment

� Logo Detection H-Demo

� Media Entity Annotation H-Demo (space related)

Pipelines for query processing

� Logo Detection H-Demo (crowd source content tagging and query expansion)

Pipelines for relevance feedback

� Logo Detection H-Demo (crowd source for conflict resolution)

Component and pipeline support services

� Components belonging Logo Detection and Media Entity Annotation

Space & Time extension � data-set for space domain including the methodology for cleaning the data-set

Table 1: Release 1 in a nutshell

As depicted by Figure 4: CUbRIK H-Demos per Release Logo Detection demo includes three pipelines covering the three different kind of CUbRIK Pipelines: Content analysis and Enrichment, Query and Feedback acquisition and processing pipelines; detailed description


is provided in section 2.1 of this document. Differently, Media Entity demo implements only a Content analysis and Enrichment pipeline; more detail about this are reported in section 2.2.

Pipelines for multimodal content analysis & enrichment are described in D5.1 2deliverable; Pipelines for query processing are described in D63.1; Regarding Pipelines for relevance feedback, even if one pipeline is already delivered as part of this release, the correspondent deliverable D7.1 is planned to be release on M17 so it is not available as consolidated version yet.

In implementing its Pipelines, each H-Demo, relies on components for automatic Job execution -Actions- and exploits frameworks for human activities. Figure 1: CUbRIK Pipeline structure depicts the relations among Pipelines, Human activities, Jobs and Actions. Open approach of CUbRIK fosters the re-use of existent components, anyway, in general, proprietary components may be part of the Pipelines and further part of the Demos. In chapters 2.1 and 2.2, respectively, the components exploited and the ones developed for each specific H-Demo are reported.

Deliverable D8.1 4gives an overall description of components for content analysis (e.g text, audio, image), which are available in the open source landscape; starting from this analysis of state-of-art, some components were selected to be reused in the CUBRIK platform. Pipelines developed exploits a SMILA integrated version of these components; work done consists in developing a wrapper version (OSGIs services) of these components to be used in the CUBrIK pipelines running on SMILA; the latter was chosen as the underlying framework for supporting workflow definition and execution and is part of Platform Services.

The Space & Time extension manages media content to be suitable as data-set for the H-Demo; the date-set is the collection of entities resulted by the processing of the Media Entity Annotation H-Demo over a collection of images related to Italian monuments. More detail can be found in D4.1 5deliverable.

On respect of Platform services, as above reported, the plan specifies SMILA extensions, in the course of the project, to have full functional coverage of CUbRIK Platform Services layer. For this Release 1 the plan foresees the support of:

o Self-scaling ETL - dynamic scaling of data import.

At the time of this document release the latest version of SMILA is v.1.1 (July 2012), it is fully supporting feed crawler implementations to self-scaling ETL and the integration of Solr 3.5. This is representing a further upgrading of the Self-scalable ETL previously released in SMILA v.1.0 (February 2012). So the planned support is accomplished .

1.5 CUbRIK SVN infrastructure Release 1 of the CUbRIK platform, as defined in the D9.2 delivery management plan, is composed by a group of artefacts and an accompanying document. The Artefatcs are Interfaces, CUbRIK Apps , Pipelines – including automatic and human tasks-, components –of different kind- and core services. The Release is conceived as a consolidated version of these artefact, delivered at established time frame and available in a common repository with software versioning and revisioning control features. The technology used is based on Apache Subversion ™.

The repository is stored in an ENG server and it is reachable via https.

The CUbRIK SVN structure reflects the list of CUbRIK components following the CUbRIK platform architecture organization. The root folder, contains all components folders used by developers for files checking-out/in. Each CUbRIK component corresponds to a sub folder, named “componentname_partner name”. E.g. <FOO_ENG>. This is the first nesting level.

2 R1 PIPELINES FOR MULTIMODAL CONTENT ANALYSIS & ENRICHMENT 3 R1 Pipelines for query processing 4 R1 COMPONENT AND PIPELINE SUPPORT SERVICES 5 Space and time entity repository


Beside the components folders there is a Demos folder containing the demos provided by the project team. Inside the latter, for R1 (M12), there are 2 folders containing the 2 released demos:

� LogoDetection_POLMI_ENG

� MediaEntityAnnotation_UNITN_CERTH_LUH

Each one of these folders is further structured in 3 sub-folders:

� docs containing the provided documentation such as description of the demo, installation instruction, licence information

� src: containing the code produced. The source code is structured in eclipse projects. Each folder is a module of the demo

� dist: contains the software released

The repository is accessible via HTTPS using registered account. Each partner has some accounts to read/write the repository. Specific credentials were created in order to allow the Release downloading

Demos repository URL https:// 89.97.237.243/svn/CUBRIK/Demos

Account name XXXXXXXXXX

Account password XXXXXXXXXX

1.6 Deployment Environment The two demo are exploiting CUbRIK Platform service via the installation of SMILA framework. At this time the latest release of SMILA is v.1.1, that is the one the Media Entity Annotation H-Demo is relying on. However Logo detection H-Demo started earlier the development using the penultimate version, the V.1.0. So even if main part of the environment is common for both the demos there are a few parts that require different configuration.

Both H-Demo are installed on a server with the following SW requirements:

OS Type OS Updates JDK version

32-bit operating system Windows XP Service Pack 3 V 1.6 - Logo Detection

V 1.7 – Media Entity Annotation

On respect of SMILA the table below collects these requirements and provide also links for downloading.

Logo Detection H-demo Media Entity H-demo

Eclipse SDK

version

3.7.1 Available at:

http://archive.eclipse.org/eclipse/downloads/drops/R-3.7.1-201109091335/

4.2.0 Available at:

http://www.eclipse.org/downloads/download.php?file=/eclipse/downloads/drops4/R-4.2-201206081400/eclipse-SDK-4.2-win32.zip

Eclipse Delta Pack

version

3.7.1 Available at:

http://archive.eclipse.org/eclipse/downloads/drops/R-3.7.1-201109091335/download.p

4.2.0 Available at:

http://download.eclipse.org/eclipse/downloads/drops4/R-4.2-201206081400/download.php


Logo Detection H-demo Media Entity H-demo

hp?dropFile=eclipse-3.7.1-delta-pack.zip

?dropFile=eclipse-4.2-delta-pack.zip

SMILA source code

version

1.0 Available at:

http://www.eclipse.org/downloads/download.php?file=/rt/smila/releases/1.0/SMILA-1.0-core-source.zip

1.1 Available at:

http://www.eclipse.org/downloads/download.php?file=/rt/smila/releases/1.1/SMILA-1.1-core-source.zip

JDK version

1.6 http://www.oracle.com/technetwork/java/javaee/downloads/java-ee-sdk-6u3-jdk-6u29-downloads-523388.html

1.7 http://www.oracle.com/technetwork/java/javase/downloads/java-se-jdk-7-download-432154.html

1.6.1 SMILA 1.0 environment installation

As mentioned one of the main requirement for running CUbRIK based application is to have installed SMILA environment, As requirements for SMILA 1.0 installation, the following items have to be downloaded, properly installed and configured:

� Eclipse SDK 3.7.1

� Eclipse Delta Pack for Eclipse 3.7.1

� SMILA 1.0 source code zip pack

For SMILA 1.0 environment installation follows instructions below:

Download the eclipse-SDK-3.7.1-win32.zip file and extracts it twice in 2 different folders one called Eclipse SDK 3.7.1 and the other one called Eclipse SDK 3.7.1 Delta Pack”; in this way there are 2 different directory such us D:\Eclipse SDK 3.7.1 and D:\Eclipse SDK 3.7.1 Delta Pack. Note that they have to contain not the eclipse folder as result of unpacking the zip file but its content.

Download the eclipse-3.7.1-delta-pack.zip folder, extract it and copy both features and plugins folders (overwriting if asked) in the Eclipse SDK 3.7.1 Delta Pack folder you already created.

Create an empty folder that will be used as workspace folder called “H-DEMOLogoDetection” Download the SMILA 1.0 source code zip file and extract in the SMILA-1.0-core-source folder.

1.6.2 SMILA 1.1 environment installation

In similar way of what is described in the previous paragraph for the SMILA 1.1 installation the following items have to be downloaded, properly installed and configured:

� Eclipse SDK 4.2.0

� Eclipse Delta Pack for Eclipse 4.2.0

� SMILA 1.1 source code zip pack

Download the eclipse-SDK-4.2.0-win32.zip file and extracts it twice in 2 different folders one called Eclipse SDK 4.2.0 and the other one called Eclipse SDK 4.2.0 Delta Pack”; in this way there are 2 different directory such us D:\Eclipse SDK 4.2.0 and D:\Eclipse SDK 4.2.0 Delta Pack. Note that they have to contain not the eclipse folder as result of unpacking the zip file but its content.

Download the eclipse-4.2.0-delta-pack.zip folder, extract it and copy both features and plugins folders (overwriting if asked) in the Eclipse SDK 4.2.0 Delta Pack folder you already


created.

Create an empty folder that will be used as workspace folder called “H-DemoMultimediaEntityAnnotation”” Download the SMILA 1.1 source code zip file and extract in the SMILA-1.1-core-source folder.


2. Pipelines of R1 Table 1: Release 1 in a nutshell, in previous chapter, reports the details of the four pipelines belonging the two H-Demo that are part of Release 1, Logo detection is a comprehensive application including all the three types of pipeline; Otherwise, Media Entity Annotation has a unique pipeline that is a Content Analysis and Enrichment pipeline:

Logo Detection:

� Content Analysis and Enrichment pipeline

� Query pipeline

� Feedback acquisition and processing pipeline

Multimedia Entity Annotation

� Content Analysis and Enrichment pipeline

2.1 CUBRIK H-demo Logo detection The logo detection demo is an artificial use case conceived to showcase all the main technical capabilities of CUbRIK. The goal is to receive from a user a query consisting of a brand name and to produce a report that identifies all the occurrences of logos of that brand in a given set of video files.

The CUbRIK H-demo Logo detection offers to user three different functionalities:

� Inject a new video to a collection and make it searchable

� Search for a brand logo and produce a report including all the occurrences of the brand logos in a video collection

� Search for a new brand logo not processed yet

The first is essential an operation of content injection in the video collection and further analysis to make it searchable. It is specific for an Administration Use typical of a Content Provider domain of action.

Figure 6: Logo Detection - Content Analysis and Enr ichment Pipeline

To be searchable, the video is segmented and key-frames are indexed. Moreover the matching phase between the new video and logo instances available from previous queries is performed.


The second allows user to detect brand logo in collections of video through keyword-based queries. In this case, matches of the queried brand in the available collection have been identified and indexed. Lastly, the third one allows user to perform a search also in case of keyword corresponding to a non-indexed brand name. The image below shows this third case:

Figure 7: Search for non-indexed Brand name

Going in detail: User provides as input a textual keyword indicating the brand logo (Step A); a list of available images representing that brand logo are downloaded from an external web service, e.g., Google Images and showed to be validated by the crowd with respect to its relevance to the brand name (Step B). Once the logo images are validated, those video frames containing occurrences of the logos of the brand are retrieved, so a matching between the image of brand logo and collection of videos is performed. Moreover, the validation of videos retrieved follows in order to refine the results list and relative ranking; matches of the queried brand are added to the index(Step C).

2.1.1 Logo detection H-Demo vs Cubrik pipeline

The three main functionalities of this H-Demo , above described, are actually implemented via three CUbRIK Pipelines:

� Multimedia content analysis

� Query processing

� Relevant feedback evaluation

The Multimedia content analysis pipeline is responsible for video content processing. Each video provided in the Grozi dataset is first converted in some defined formats, then is segmented and KeyFrames are extracted. Each keyframe is analyzed extracting its SIFT descriptor, then the frame image file is stored in a proper data server used to collect both video frames and video converted files. At this point a matching between keyframes extracted and searched logos is performed through SIFT descriptors comparison; For each frame-logo match a record is created. According to the score obtained by frame-logo match, such record is, either stored or forwarded to be validated via crowd-sourcing.

The Query processing pipeline works as follow: it is triggered by User providing the brand name as textual keyword. The latter is used to collects images URLs candidates to be the brand logos. These URLs are collected throw a Federated search among various web images search engines (Google, Brand Of the World). Once retrieved, these brand logo


images are validated by crowd via a proper crowd application. Validation consists on vote the right association between brand name and logo images. When the crow application collects enough feedback from the crowd, it sends the result to the last part of the pipeline portion where the logo images are collected, analyzed extracting SIFT descriptor and stored on a data server for further retrieve.

Relevance Feedback is executed at the end of matching process and involves uncertain Video keyframes-Logo image matches that are sent to a specific crowd application to be validated. During the validation, crowd experts checks relevance between logo image and video frame, The confirmed matches go into the pipeline where the events list is updated and serialized. Then they are converted into indexes and stored into the SOLR engine, where the automated matches with good score where previously archived. So, matches from Multimedia content analysis pipeline are enriched exploiting the Community evaluation.

2.1.2 Data set description

The H-demo Logo detection exploits the Grozi-120 dataset6. It is a multimedia database of 120 grocery products. The database contains both a collection of images (representing the products as isolated objects in ideal imaging conditions) and a collection of 29 videos (taken in a shop). The matching phase will be performed against this video collection.

The dataset comprises a ground truth, i.e., each video is provided with annotations about the possible occurrences of logos in each frame.

2.1.3 Architecture overview

This section provides a description of jobs and human activities implementing the pipelines according to CUbRIK pipeline structure:

Figure 8 Logo Detection H-Dem: Pipelines

Content Analysis Pipeline

A part the videos crawler used to inject the Video in the collection, the pipeline is composed by 2 jobs processing videos and matching the already processed logos vs them

6 http://grozi.calit2.net/grozi.html


1. Job 1 groups the logical steps of Video processing and Analysis. Video processing converts the videos in several format, segments them and extracts the keyframes. The Analysis calculates the keyframes SIFT descriptors. Then it uploads videos and frames on a storage server.

2. Job 2: it matches previously processed logos versus the keyframe and creates, if existing, the matches. This is part of the Match Logo.

Query Processing Pipeline

It is composed by 3 jobs and 1 human activity. Following the User textual query formulation, the Pipeline harvests some logo images candidates, ask to the crowd which are the most representative ones, process the selected logo images and match them vs the analysed videos:

1. Job 1: retrieve the logo URLs from a specific source, or more than one sources in case of implementation of a federated search

2. Human activity: sends logo URLs collection to an expert Crowd in order to Validate them. The Crowd responses sending back the Validated logo images.

3. Job 2: extracts the SIFT descriptors of logo images

4. Job 3: relying on the already processed video keyframe descriptors, processed by the Content Analysis Pipeline, this job finds the matches of Validated logos SIFT descriptors inside the video key-frames descriptors.

Relevant feedback

It is composed by 2 jobs and 1 human activity. After a Query Processing Pipeline the automated matches having a confidence level under a configured threshold are sent to an expert crowd. The aim is to collect human evaluation on logo matches where the automatic process did not achieved an enough level of confidence. The validated matches are merged inside the indexer server.

1. Job 1: When a matching is performed the level of confidence is calculated. Matching with an high confidence result are stored in the High confidence results set for the production of the Report. Those matching that have a confidence level lower than an established threshold are saved into Low confidence results.

2. Human activity: Low confidence results are sent to an expert crowd that provides back the evaluation of good matches among these, in case. The results are stored in Validated results.

3. Job 2: In case the Crowd select some matches this Job merges the Validated results in the result set for the Report production.

2.1.4 Third party library

The H-Demo relies on a third part library for the video handling:

� ffmpeg 32 bit (actual link http://ffmpeg.zeranoe.com/builds/win32/static/ffmpeg-latest-win32-static.7z)

2.1.5 Components integrated in SMILA for CUBRIK R1

For the realization of this H-Demo some components were specifically developed as components integrated in SMILA:

� Google Image Search7 for Image retrieval � OpenCV/SIFT for similarity matching and descriptors extraction � IDMT temporal video segmentation (e.g. shot detection, key-frame detection)

7 https://developers.google.com/image-search/


The Google Image Search API is a third party, web-based component integrated in SMILA in order to retrieve images URLs from Google image search engine. The query pipeline was also developed in a federated version; the latter performs the query on Google Image Search and on Brand of The World8, whith a specific component. The OpenCV/SIFT is a third-party library integrated in SMILA in order to extract descriptors from images (e.g key-frame of video provided as input) and perform similarity matching. The executable file is available in SVN project repository:

� https://89.97.237.243/svn/CUBRIK/Demos/LogoDetection_POLMI_ENG_v0.2/dist/Indexer/

� https://89.97.237.243/svn/CUBRIK/Demos/LogoDetection_POLMI_ENG_v0.2/dist/Indexer/https://89.97.237.243/svn/CUBRIK/Demos/LogoDetectio n_POLMI_ENG_v0.2/dist/Matcher/

The IDMT temporal video segmentation is a component provided by a partner within the consortium and it was integrated in SMILA in order to perform temporal segmentation on video provided as input. Both executable file and dll files , for CUbRIK demo implementation, are available in SVN project repository at:

https://89.97.237.243/svn/CUBRIK/Demos/LogoDetection_POLMI_ENG_v0.2/dist/segmenter/bin/win32/

Please remember to download the overall content provided in the svn directory; both .exe file and dll files have to be put in a proper folder already created with the name of third-party library,such us Indexer for the OpenCV/SIFT

2.1.6 How to install the Logo detection h-demo

This section reports detailed description of the H-Demo installation. The installation will results in the actual installation of the three Pipeines integrated in a unique search driven application.

Cubrik environment configuration (SMILA)

Assuming that SMILA environment installation was properly performed as described in section1.6.1, in order to proceed with the configuration please follow the steps below:

1. Create an empty folder that will be used as workspace folder called “H-DEMOLogoDetection”

2. Run the eclipse.exe file under Eclipse SDK 3.7.1 folder and select the “H-DEMO LogoDetection” you created in step 1 as workspace

3. Check your java installation: going on the top menu, select window->preferences and go to java->installed JREs ; if jdk6 is not listed click on Add , select Standard VM and then Next ; brows your directory and select the Java JDK 1.6 folder, click on OK and after Eclipse loading the libraries click on Finish . At this point the jdk6 is listed in the java->installed JREs and you have to select it. Please refer to the following image:

8 http://www.brandsoftheworld.com/


1. Check the target platform: a. going on the top menu, select window->preferences and go to Plug-in

Development->Target Platform; click on add and choose Nothing: Start with an empty target definition; click next and put CUbRIK target platform in the Name form; then go on Add->installation->Next and browse the location where the Eclipse SDK 3.7.1 Delta Pack is located; select it and click on Finish . Please refer to the following image:


b. In the Target Platform view, Remove the default target platform and check

CUbRIK target platform(created at step a) then click on Edit and then add and choose Directory, click next and browse the location where the SMILA-1.0-source folder is located, then select SMILA-1.0-source\SMILA.extension\eclipse\plugin folder and click on Finish

2. Delete all references to junit plugin v4: move to Content tab and in the filter form edit

junit; at this point only plugins related of junit are showed. Uncheck all references to junit plugin v4 as depicted in the image below and click on Finish .

3. At this point, if this procedure was properly performed the following should appear on your installation:


Click on Apply and then OK

4. Import SMILA source in the workspace: in the Package Explorer panel right click and

select Import -> Existing projects into Workspace ->Next ->Browse and select the directory where the SMILA-1.0-source folder is located, ensure to uncheck no copy in the workspace and click on Finish. Please remember to remove both SMILA.application and SMILA.launch projects from the Package Explorer panel


At this point you can proceed with the importing of CUbRIK pipeline, so please refer to the following section.

SMILA extension for CUbrik pipeline

This section explains how to import the CUbrik pipelines implemented for the CUbRIK Logo detection H-Demo within the SMILA environment. Please proceed as follow:

1. Download the CUbRIK LogoDetection demo source code: create a folder called CUbRIKLogoDetection_source and proceed with the source code checkout using Tortoise as SVN Client (we are assuming it is already installed in your pc; other SVN client can be used in case) a. right click on folder CUbRIKLogoDetection_source and choose in content menu

SVN Checkout , then put in the text box of URL repository the following svn url: https://89.97.237.243/svn/CUBRIK/Demos/LogoDetection_POLMI_ENG_v0.2/src/, confirm and wait for the complete download.

2. Import the CUbRIK LogoDetection source code in the workspace: in the Package Explorer panel right clink and select Import -> Existing projects into Workspace ->Next ->Browse and select the directory where the CUbRIKLogoDetection_source folder is located and then click on Finish.

Note that the SMILA extension for CUbRIK pipeline is now available in the workspace but a configuration phase is required before to proceed with the CUbRIK Logo detection running demo. The configuration phase consists on modify specified properties files and some


parameters; in detail it needs to setup:

1. Path related to the 3rd party library and components integrated in SMILA for CUbRIK Release 1

2. path related to the video content processed by crawler

3. the output folder’s path used to store some files produced during the process

Configuration of the 3rd party library and components integrated in SMILA for CUbRIK R1

This configuration is required in order to properly exploit in the CUbRIK pipelines functionalities provided by both third party library and components integrated in SMILA for CUbRIK R1. Please proceed as follow:

1. In the Package Explorer view click on the SMILA .application ->configuration -> cubrikproject.service.polmi.VideoProcessing folder and have a look at 2 configuration files that are: DetectorWrapper.properties and FFMpegWrapper.properties files related respectively to IDMT temporal segmentation and ffmpeg executable file.

2. Open the DetectorWrapper.properties file containing the path of IDMT temporal segmentation executable file and the output folder’s path where results of video segmentation processing are stored; update both in the following way: a. Modify the value of property segmenterPath with the path of

TemporalVideoSegmentationInterfaceDemo.exe file b. Modify the value of property outFile with the path of folder where the xml file

produced by IDMT temporal video segmentation will be stored

3. Open the FFMpegWrapper .properties file containing the path of ffmpeg executable file and modify the value of ffmpegPath typing with the path of ffmpeg.exe

4. In the Package Explorer view click on the SMILA .application ->configuration -> cubrikproject.service.polmi.Indexing open the SIFTIndexerWrapper.properties file and modify the value of indexerPath with the path of OpenCV/SIFT executable file (Indexer.exe)

5. In the Package Explorer view click on the SMILA .application ->configuration -> cubrikproject.service.polmi.Matching open the SIFTMatcherWrapper.properties file and modify the value of matcherPath with the path of OpenCV/SIFT executable file (Matcher.exe)

Configuration of video files path

This configuration is required to specify folder where video files are located to be crawled and processed. Please proceed as follow:

1. In the Package Explorer view click on the SMILA .application ->configuration -> org.eclipse.smila.jobmanager folder and open the jobs.json file

2. Search the crawlFilesystem string inside it and modify the value of rootFolder in order to point to the folder where you put the video files.

Configuration of output files path

This configuration is required to specify folders where results of processing performed by CUbRIK pipeline will be stored. Please proceed as follow:

1. In the Package Explorer view click on the SMILA.application->configuration-> org.eclipse.smila.processing.bpel-> pipelines folder

2. Open the LogoProcessingPipeline.bpel file and search the LogosDownloadPipelet word inside it and modify the value of outputDir in order to point to the folder where


image files of logo will be saved

3. In the LogoProcessingPipeline.bpel file search the DescriptorExtractionPipelet

4. and modify the value of imagesDir pointing to folder where image files of logo will be saved, value of descriptorsDir pointing to folder where the SIFT file will be saved

5. Open the VideoProcessingPipeline.bpel file and search the VideoConvertionPipelet string inside it and modify the path for value outputDir pointing to the folder where converted video files will be saved.

6. In the VideoProcessingPipeline.bpel file search the KeyframeExtractionPipelet string and modify the value of outputDir pointing to the folder where extracted frame files will be saved.

7. In the VideoProcessingPipeline.bpel file search the DescriptorExtractionPipelet string and modify the value of imagesDir pointing to the folder where extracted frame files will be saved and the value of descriptorsDir pointing to the folder where the SIFT file will be saved

How to check SMILA extension for CUbrik pipeline

1. Open the “Run” main menu in Eclipse and click on “Run Configurations…”

2. In the tree menu on the left of the new opened window go to “OSGi Framework”, open it and select “SMILA”

3. If you are behind a proxy follow steps 4s, other way jump to point 5 4. In order to set the proxy please proceed as follow:

a. In the “Boundles” tab, type the word “proxy ”in the filter textbox and , the “cubrikproject.service.polmi.proxy” will appear.

b. Select and check it. Don’t change the default value for both “Start level” and “Auto-Start”


c. In the tab panel “Arguments”, there is the VM arguments” section containing some info, at the end of this section adds the following lines: - Dproxy.name=your_proxy_name - Dproxy.port=your_proxy_port - Dproxy.user=your_proxy_account_name - Dproxy.pwd= your_proxy_account_password and Replace the “your_proxy_...” values with the right ones”

5. Click on the “Run” button and the Console” panel will appear showing the log output; Having a look at this, it will be possible to check your installation. Soat the bottom of main Eclipse window you should read something like::.

...

[INFO ] HTTP server started successfully on port 80 80.

and just after the first lines log info likes the following will be showed.:

...

[INFO ] File utils service started!

[INFO ] Indexing service started!

[INFO ] SIFTIndexerWrapper created using following setting

[INFO ] -indexerPath=C:/CUBRIKPRJ/Demos/LogoDetect ion/apps/Indexer/Indexer.exe

[INFO ] Logo search service started!

[INFO ] Matching service started!

[INFO ] SIFTMatcherWrapper created using following setting

[INFO ] -matcherPath=C:/CUBRIKPRJ/Demos/LogoDetect ion/apps/Matcher/Matcher.exe

[INFO ] Video processing service started!

[INFO ] FFMpegWrapper created using following setti ng

[INFO ] -ffmpegPath=C:/CUBRIKPRJ/Demos/LogoDetecti on/apps/ffmpeg/bin/ffmpeg.exe

[INFO ] -skipExistingFile=true

[INFO ] DetectorWrapper created using following set ting

[INFO ] -segmenterPath=C:/CUBRIKPRJ/Demos/LogoDetection/apps /segmenter/bin/win32/TemporalVideoSegmentationInterfaceDemo.exe

[INFO ] -outFile=C:/CUBRIKPRJ/Demos/LogoDetection/apps/segme nter/bin/win32/temporalvideosegmentation_results.xml


[INFO ] FFMpegWrapper created using following setti ng

[INFO ] -ffmpegPath=C:/CUBRIKPRJ/Demos/LogoDetecti on/apps/ffmpeg/bin/ffmpeg.exe

...

6. If any wrapper service mismatch a third part executable file you will read an ERROR

message like the following ...

ERROR 120 Exception occurred while creating new ins tance of component Component[

name = SIFTIndexerWrapper

activate = activate

deactivate = deactivate

modified =

configuration-policy = optional

factory = null

autoenable = true

immediate = true

implementation = eu.cubrikprj.service.polmi.Indexing.wrappers.SIFTIn dexerWrapper

state = Unsatisfied

properties =

serviceFactory = false

serviceInterface = [eu.cubrikprj.service.polmi.Indexing.interfaces.Ind exerWrapper]

references = null

located in bundle = cubrikproject.service.polmi.In dexing_1.0.0.qualifier [17]

]

org.eclipse.smila.utils.config.ConfigurationLoadExc eption: I need that indexerPath property in cubrikproject.service.polmi.Indexing/SI FTIndexerWrapper.properties points to an existing file

at eu.cubrikprj.service.polmi.Indexing.wrappers.SIFTIn dexerWrapper.<init>(SIFTIndexerWrapper.java:50)

at sun.reflect.NativeConstructorAccessorImpl.newIn stance0(Native Method)

…

7. In this case, check steps of Configuration of the 3rd party library and components

integrated in SMILA for CUbRIK R1 ”

How to inject the content data set

In order to run the H-demo, a preliminary phase of content data set injection is requested. Videos belonging of content data-set defined in section 2.1.2 have to be put into the folder created as described in the Configuration of video files path section. At this point, please proceed as follow:

1. Open the “Run” main menu in Eclipse and click on “Run Configurations…” 2. On the left of the new opened window go to “Java Application”, click on the right button

and then select the “New”


3. Give a customized name for this configuration, like “H-Demo Logo Detection Pipeline manager”

4. In the “Main” tab click on “Browse…” button and select the project “cubrikproject.launcher.LogoDetection”

5. With the “Search…” button select the “GuiPipelineConsole” class as Main class. 6. Click on Apply to save this new running configuration 7. Click on “Run” and the “Gui Pipeline Console” window will be opened.

8. Click on the first “Start” button at the “LogoDetection Application” line to start all jobs 9. Click on "Crawl videos" button to start the process for injection of videos

How to run the h-demo

In order to run the h-demo, a GUI was developed and it is available at http://localhost:8080/SMILA/index.html


Figure 9: Logo Detction H-Demo GUI

2.2 CUBRIK Media Entity Annotation H-demo The media entity annotation horizontal demo (h-demo) demonstrates the harvesting of representative images for named entities stored in the entity repository. The goal of the first version of the h-demo (R1) is to enhance the entity repository to include also multimedia based content which can be used to visualize the named entities, e.g. in entity search results.

As an initial input data-set for this h-demo we used a set of famous Italian monuments with expert-generated metadata. In total, experts collected a set of ~100 monuments located in different Italian cities such as Rome and Florence. Entities related to the cities or monuments were also collected. The goal of the components and pipelets that were developed is to crawl online multimedia social networks in order to fetch multimedia content (in particular images) related to monuments and to update the records of the entity repository with the freshly retrieved multimedia content and its metadata. The related entities, relations, and attributes are also imported. In this first release, only the automated components are available, but in the following CUBRIK releases the multimedia analysis and crowdsourcing will be introduced.

2.2.1 Media Entity Annotation H-Demo vs Cubrik pipeline

In this first release of the Media Entity Annotation h-demo, multimedia content analysis pipeline is used to fetch content (mainly images) from online services and to update the Entity records in the Entity repository. In the future releases, the enhanced media analysis and crowdsourcing components (relevance feedback pipeline ) will also be used in order to clear the multimedia content for better and more representative results.


2.2.2 Media Entity Annotation H-demo architecture

Figure 10: Media Entity Annotation h-demo architect ure

Figure 10 presents the architecture of the h-demo and the partners involved in the implementation of the components. The aim of the h-demo is to present the processes of harvesting media content, cleaning it from noisy and redundant information and make it available for search and retrieval. The first release of the h-demo concentrates on the media harvesting.

2.2.3 Multimedia crawlers and SMILA pipelets

The initial implementation of the h-demo is built around a SMILA pipeline that interconnects the different pipelets and will integrate also crowdsourcing and other modules in the future releases.

2.2.4 How to install and run the Media Entity Annotation h-demo

The procedure to install the Media Entity Annotation h-demo is completely analogue to the procedure for installing the Logo Detection h-demo. After installing the SMILA environment, the required pipelets and pipelines can be downloaded via SVN from the following location:

https://89.97.237.243/svn/CUBRIK/Demos/MediaEntityAnnotation_UNITN_CERTH_LUH/

The SMILA pipelines can be called through sending the following request:

POST: http://localhost:8080/smila/pipeline/MediaEntityAnnotation/process/

body: {"query" : "San Petronio Basilica"}

The query “San Petronio Basilica” will be used for retrieving the corresponding entity from the Entity Repository. A set of crawlers will be started for retrieving related images and the images will be connected to the entity. Finally, the Entity Repository will updated.


Figure 11: SMILA pipeline for media Entity Annotati on h-demo

Under the umbrella of the Media Entity Annotation h-demo a set of multimedia crawlers that fetch content from online services were implemented. These are:

� Crawler for Flickr images

� Crawler for Panoramio images

� Crawler for Picasa images

� Crawlers for images shared through twitter

� YouTube crawlers for video metadata

The following SMILA pilelets were implemented for this release of the h-demo:

� MediaCrawlerPipeletFlickr.java

� MediaCrawlerPipeletPanoramio.java

� MediaCrawlerPipeletPicasa.java

� MediaCrawlerPipeletTwitter.java

SMILA pilelets were implemented in order to communicate with entity repository for searching and updating entities:

� EntitySearchPipelet.java

� EntityUpdatePipelet.java


3. Conclusion The document provides a comprehensive overview of the H-Demos, related pipelines and exploited components that belonging the first release of CUbRIK Platform. Moreover a step by step guideline is provided for the setting up of proper environment, starting from operating system, third part libraries and Integrated development environment (IDE) for the usage of SMILA.

From Section 1.4.1 - Release 1 in a Nutshell- the summary table of what is release as R1 is reported with and additional References . The latter refers to the sections of this document where the corresponding actual Artefact releasing is described.

Delivery Date

Release version


In detail Reference of the Artefact delivery

M12 1.0 Pipelines for multimodal content analysis & enrichment

• Logo Detection H-Demo

• 2.1.6 How to install the Logo detection h-demo

• Media Entity Annotation H-Demo (space related)

• 2.2.4How to install and run the Media Entity Annotation h-demo

Pipelines for query processing

• Logo Detection H-Demo (crowd source content tagging and query expansion)


Pipelines for relevance feedback

• Logo Detection H-Demo (crowd source for conflict resolution)


Component and pipeline support services

• Components belonging Logo Detection

• 2.1.4 Third party library

• 2.1.5 Components integrated in SMILA for CUBRIK R1

• Components belonging Media Entity Annotation

• 2.2.3 Multimedia crawlers and SMILA pipelets

Space & Time extension

• data-set for space domain

• Actual dataset is the result of Media Entity Annotation H-Demo over a collection of


Delivery Date

Release version


In detail Reference of the Artefact delivery

images related to Italian monuments

• methodology for cleaning the data-set

• 2.2.2 Methodology is the one implemented in Media Entity Annotation h-demo

CUbRIK Platform services

• Self-scaling ETL - dynamic scaling of data import

• First implementation in SMILA v.1.0 (February 2012).

• Improvement in SMILA v.1.1 (July 2012)

r1 cubrik integrated platform release

Documents

cubrik release

cubrik r1

cubrik partners

cubrik project findings

cubrik consortium agreement

cubrik hdemo logo detection

cubrik svn infrastructure

hdemo concept