costantino grana , roberto vezzani, rita cucchiara

http://imagelab.ing.unimo.it

Università degli Studi di Modena and Reggio EmiliaDipartimento di Ingegneria dell’Informazione

Prototypes selection with context based Prototypes selection with context based intra-class clustering for video intra-class clustering for video

annotation with MPEG-7 featuresannotation with MPEG-7 featuresCostantino Grana, Roberto Vezzani, Rita Cucchiara

DELOS ConferenceTirrenia (PI), Italy, 13-14 February 2007

Prototypes selection with context based…Prototypes selection with context based…

2

INTRODUCTIONA system for the semi automatic annotation of videos, by means of Pictorially Enriched Ontologies is presented. These are ontologies for context-based video digital libraries, enriched by pictorial concepts for video annotation, summarization and similarity-based retrieval. Extraction of pictorial concepts with video clips clustering, ontology storing with MPEG-7, and the use of the ontology for stored video annotation are described.

Video Shot &

Sub-Shot detection

Extraction Features

Classification with ontology

Save results in Mpeg 7

camera car


3

SIMILARITY OF VIDEO CLIPSSIMILARITY OF VIDEO CLIPS

1. The color histogram, in 256 bins of HSV color space (Scalable Color Descriptor).

2. The 64 spatial color distributions: to account for the spatial distribution of the colors, an 8x8 grid is superimposed to the frame and the mean YCbCr color is computed for each area (Color Layout Descriptor).

3. The four main motion vectors: they are computed as the average of the MPEG motion vector, extracted in each quarter of frame. The median value has been adopted since MPEG motion vector are not always reliable and are often affected by noise .


4

SIMILARITY OF VIDEO CLIPSSIMILARITY OF VIDEO CLIPS

• Generalization of image similarity• Usually single key frame per shot, but variation may be too large.• Our approach:

1. M representative frames uniformly selected from the shot

2. Remove the worse match to improve reliable correspondences

T

1 1 1

1 1,i i i i

M M Nj j

u v u v j u vi i j

d S S k F FM M

k V V


5

PICTORIAL ENRICHED ONTOLOGY CREATION

1. After the definition of the textual domain ontology, a pictorial enriched ontology requires the selection of the prototypal clips that can constitute pictorial concepts as specialization of each ontology category. Large training sets of clips for each category must be defined and an automatic process extracts some visual prototypes for every category.

2. Using the previously defined features and dissimilarity function, we employ a hierarchical clustering method, based on Complete Link.

3. This technique guarantees that each clip must be similar to every other in the cluster and any other clip outside the cluster has dissimilarity greater than the maximum distance between cluster elements.

4. For this clustering method the dissimilarity between two clusters is defined as

,

( , ) max ,x i y j

i j x yS W S WD W W d S S


6


7

Automatic clustering level selection

• A rule has to be based on cluster topology concerns: a trade-of between data representation and small number of clusters

• It is not possible to choose the right one even if we can define an objective function, the data may badly fit with that definition, in real cases.

• An example is provided by our experience with the Dunn’s Separation Index, which was conceived for this particular clustering approach.

• Better results (in terms of a subjective evaluation) have been obtained with the following approach: start from the definition of diameter and delta distance

• We can obtain from these the corresponding maximum and minimum at level n.

,i i iW D W W ,

( , ) min ,x i y j

i j x yS W S WW W d S S

, ,

max

min ,i n

i j n

n iW E

n i jW W E i j

W

W W


8

Automatic clustering level selection

• We define the Clustering Score at level n as:

• The selected level is the one which maximizes the clustering score. (Note that the paper says “minimize”… sigh)

• Note that both the maximum diameter and the minimum delta distance between clusters are monotonically increasing, so we can stop the clustering when a local minimum is reached (leading to a computational improvement).

1min ,n n nCS


9

Intra-class clustering with context data

• The presented choice of prototypes is guided by how similar the original clips are in the feature space, without considering the elements belonging to the other classes (context data).

• This may lead to a prototype selection which is indeed representative of the class but lacks the properties useful for discrimination purposes.

-4

-2

0

2

4

-10 -8 -6 -4 -2 0 2 4 6 8 10

A B C D


10


• We define an isolation coefficient for each clip as

• Then we can introduce a class based dissimilarity measure between two clips as:

• Even if the central points (B,C) are closer each other than to the corresponding colored ones (A and D respectively), the interposed purple distribution largely increases their dissimilarity measure, preventing their merge in a single cluster.

1,

1 ,,

v i

L

u u ji i j S C u v

S S Cd S S

, ,u v u v u vd S S d S S S S


11


AB BC CD

distance 6.9 3.7 7.7dissimilarity 849 2135 1050

-4

-2

0

2

4

-10 -8 -6 -4 -2 0 2 4 6 8 10

A B C D


12

ONTOLOGIES IN MPEG-7

Ontologies may be effectively defined with OWL, but this language does not contain any construct for including a pictorial representation. On the other hand, such feature is present in the MPEG-7 standard. MPEG-7 has much less sophisticated tools for knowledge representation, since its purpose of standardization limits the definition of new data types, concepts and complex structures. Nevertheless, the MPEG-7 standard can naturally include pictorial elements such as objects, key-frames, clips and visual descriptors in the ontology description.Therefore, our system stores the pict-en ontology following the directions of the MPEG-7 standard, and in particular uses a double description provided by the combined with a which includes a Collection ModelType DS

<Description xsi:type="ClassificationSchemeDescriptionType"> <ClassificationScheme uri="urn:mpeg:mpeg7:cs:OntologiaMpeg7"> <Term termID="CameraCar"/> <Term termID="External car view "/> <Term termID="Spectators"/> <Term termID="People"/> </ClassificationScheme></Description>

ClassificationSchemeDescriptionType DS

<Description xsi:type="ModelDescriptionType"> <Model xsi:type="CollectionModelType"> <Label href="urn:mpeg:mpeg7:cs:OntologiaMpeg7:CameraCar"/> <Collection xsi:type="ContentCollectionType" id="prototype0"> …………………………… <Collection xsi:type="ContentCollectionType" id="prototype0"> …………………………… </Model></Description>

ModelDescriptionType DS


13

Example results

Ski Bob F1

#Training set 300 300 500

#Test set 912 1122 1839

# of Visual Concepts

NN 300 300 500

CL 84 126 191

CBCL 78 122 203

Results on training set

NN 300 (100%) 300 (100%) 500 (100%)

CL 299 (99.7%) 292 (97.3%) 478 (95.6%)

CBCL 300 (100%) 285 (95%) 492 (98.4%)

Results on test set

NN 660 (72.4%) 854 (75.8%) 1181 (64.2%)

CL 654 (71.7%) 846 (75.1%) 1159 (63.0%)

CBCL 657 (72%) 852 (75.7%) 1209 (65.7%)


14

Semi automatic annotation analysis (annotatate and correct)

0,6

0,65

0,7

0,75

0,8

0,85

0,9

0,95

1

0 10 20 30 40 50 60 70 80

% Manually annotated clips

Auto

mat

ic c

lass

ifica

tion

scor

e

0

200

400

600

800

1000

1200

1400

1600

1800

Tota

l Ann

otat

ion

Tim

e [s

]

Automatic Classification Score Total Annotation Time


15


16

Screenshot of the classification scheme manager window of the semi-automatic annotation framework, showing the classification scheme together with the selected prototypes.


17

Conclusions and future work

• We presented a system for the creation of a specific domain ontology, enriched with visual features and references to multimedia objects.

• The ontology is stored in MPEG-7 compliant format, and can be used to annotate new videos.

• This approach allows a system to behave differently by simply providing a different ontology, thus expanding its applicability to mixed sources Digital Libraries.

• We are working on the extension of the similarity measures from key frames to clip matching, by means of the Mallows distance (a linear programming based distance).

• Other features are being considered, and extensive tests are performed, in order to assess the scalability of the proposed approach.

costantino grana , roberto vezzani, rita cucchiara

Documents