recognition of dynamic visual images based on group transformations

8
ISSN 10546618, Pattern Recognition and Image Analysis, 2012, Vol. 22, No. 1, pp. 180–187. © Pleiades Publishing, Ltd., 2012. INTRODUCTION Recognition of dynamic visual images extends the static image recognition theory owing to the set of fea tures that characterize a moving object at successive time instants. Observation of a dynamic object makes it possible to provide several projections for recogni tion (prehistory), when it is known beforehand that they belong to the same actual object. Recognition of dynamic objects is one of the fundamental problems in surveillance systems of various purposes [1, 2]. The feature is that the observed projections of the same object undergo isomorphic and/or homomor phic transformations owing to close timings. It is rea sonable to use the modified collective decisionmak ing method in this case, which makes it possible to obtain good projections of the object, particularly ones with no overlapping with projections of other objects. In order to compare shapes of visual objects, metrics based on Hausdorff, Gromov–Hausdorff, and Fre chet distances, as well as the ones based on natural pseudodistances, are widely used [3–6]. However, these metrics cannot guarantee correctness of the obtained results in actual problems, for instance, when there is noise in video sequences, as soon as the lower boundary of functionals is not a minimum. PROBLEM STATEMENT We assume that each state of a moving object is defined by a set of features (including instantaneous speed and acceleration and their directions), and the vector corresponding to the object’s description may change at different perspectives and generate a certain trajectory in the space of features at the same time. Thus, the video object recognition problem may be defined as classification of sets of states, classification of trajectories, or both of them combined. We introduce operators :IS Os IS Zs , :IS Ps IS Os , :IS Fs IS Ps , and :IS Rs IS Fs , which characterize generation processes of the set of images of normalized objects IS Os from the set of reference patterns IS Zs , the set of observed objects IS Ps in arbitrary projections from the set of normalized objects IS Os , the set of filtered images IS Fs from the set of observed images IS Ps , and actual images IS Rs from filtered images IS Fs . The fea ture of the considered kind of problems is that the mapping :IS Ps IS Fs and the corresponding inverse mapping :IS Fs IS Ps in them depend not only on the nuisance parameter β but also on time t. We designate the mapping function :IS Ps IS Fs that changes in time as () β, t . The dynamic image model of the observed object belonging to the image is determined by the equation (1) where γ is the parameter that takes into consideration the prehistory of the object’s behavior. A simplified IS p OsZs IS p PsOs IS p FsPs IS p RsFs IS p PsFs IS p FsPs IS p PsFs IS p PsFs V j IS IS j rs IS p FsRs IS j fs h , ( ) , = IS j fs IS p PsFs IS p OsPs IS p ZsOs IS j zs ( ) γ ( ) ( ) β t , , = Recognition of Dynamic Visual Images Based on Group Transformations M. N. Favorskaya Siberian State Aerospace University, ul. Krasnoyarskii rabochii 31, Krasnoyarsk, 660014 Russia email: [email protected] Abstract—A new dynamic image recognition method for image sequences based on twolevel recognition of individual states and a set of states as a whole is considered in the paper. Integrated and invariant estimations of internal automorphisms are suggested. The modified collective decisionmaking method for dynamic image recognition uses four types of pseudodistances, in order to obtain a measure of similarity of input dynamic images with dynamic reference images depending on representation of dynamic features, as sets of numerical parameters, sets of vectors, or sets of functions. The experimental data on dynamic image recog nition based on various measures of similarity of the sample with the reference pattern, which take into account isomorphic and permissible homomorphic transformations of visual projections, are presented. Keywords: integrated and invariant features of projections of moving objects, isomorphic and homomorphic transformations. DOI: 10.1134/S1054661812010154 Received August 17, 2011 REPRESENTATION, PROCESSING, ANALYSIS AND UNDERSTANDING OF IMAGES

Upload: m-n-favorskaya

Post on 30-Sep-2016

219 views

Category:

Documents


2 download

TRANSCRIPT

ISSN 10546618, Pattern Recognition and Image Analysis, 2012, Vol. 22, No. 1, pp. 180–187. © Pleiades Publishing, Ltd., 2012.

INTRODUCTION

Recognition of dynamic visual images extends thestatic image recognition theory owing to the set of features that characterize a moving object at successivetime instants. Observation of a dynamic object makesit possible to provide several projections for recognition (prehistory), when it is known beforehand thatthey belong to the same actual object. Recognition ofdynamic objects is one of the fundamental problems insurveillance systems of various purposes [1, 2].

The feature is that the observed projections of thesame object undergo isomorphic and/or homomorphic transformations owing to close timings. It is reasonable to use the modified collective decisionmaking method in this case, which makes it possible toobtain good projections of the object, particularly oneswith no overlapping with projections of other objects.In order to compare shapes of visual objects, metricsbased on Hausdorff, Gromov–Hausdorff, and Frechet distances, as well as the ones based on naturalpseudodistances, are widely used [3–6]. However,these metrics cannot guarantee correctness of theobtained results in actual problems, for instance, whenthere is noise in video sequences, as soon as the lowerboundary of functionals is not a minimum.

PROBLEM STATEMENT

We assume that each state of a moving object isdefined by a set of features (including instantaneousspeed and acceleration and their directions), and the

vector corresponding to the object’s description maychange at different perspectives and generate a certaintrajectory in the space of features at the same time.Thus, the video object recognition problem may bedefined as classification of sets of states, classificationof trajectories, or both of them combined.

We introduce operators :ISOs ISZs,

:ISPs ISOs, :ISFs ISPs, and

:ISRs ISFs, which characterize generationprocesses of the set of images of normalized objectsISOs from the set of reference patterns ISZs, the set ofobserved objects ISPs in arbitrary projections from theset of normalized objects ISOs, the set of filteredimages ISFs from the set of observed images ISPs, andactual images ISRs from filtered images ISFs. The feature of the considered kind of problems is that the

mapping :ISPs ISFs and the corresponding

inverse mapping :ISFs ISPs in them dependnot only on the nuisance parameter β but also on time

t. We designate the mapping function :ISPs

ISFs that changes in time as (⋅)β, t. The dynamicimage model of the observed object belonging to the

image is determined by the equation

(1)

where γ is the parameter that takes into considerationthe prehistory of the object’s behavior. A simplified

ISpOsZs

ISpPsOs ISp

FsPs

ISpRsFs

ISpPsFs

ISpFsPs

ISpPsFs

ISpPsFs

VjIS

ISjrs ISp

FsRs ISjfs h,( ),=

ISjfs ISp

PsFs ISpOsPs ISp

ZsOs ISjzs( )γ( )( )β t, ,=

Recognition of Dynamic Visual Images Based on Group Transformations

M. N. FavorskayaSiberian State Aerospace University, ul. Krasnoyarskii rabochii 31, Krasnoyarsk, 660014 Russia

email: [email protected]

Abstract—A new dynamic image recognition method for image sequences based on twolevel recognition ofindividual states and a set of states as a whole is considered in the paper. Integrated and invariant estimationsof internal automorphisms are suggested. The modified collective decisionmaking method for dynamicimage recognition uses four types of pseudodistances, in order to obtain a measure of similarity of inputdynamic images with dynamic reference images depending on representation of dynamic features, as sets ofnumerical parameters, sets of vectors, or sets of functions. The experimental data on dynamic image recognition based on various measures of similarity of the sample with the reference pattern, which take intoaccount isomorphic and permissible homomorphic transformations of visual projections, are presented.

Keywords: integrated and invariant features of projections of moving objects, isomorphic and homomorphictransformations.

DOI: 10.1134/S1054661812010154

Received August 17, 2011

REPRESENTATION, PROCESSING,ANALYSIS AND UNDERSTANDING OF IMAGES

PATTERN RECOGNITION AND IMAGE ANALYSIS Vol. 22 No. 1 2012

RECOGNITION OF DYNAMIC VISUAL IMAGES 181

model with γ = const may be considered. (However, it

should be remembered that the transformation becomes much more complex owing to the concept ofmultilevel motion in an image sequence). Then

(2)

Such a model reflecting the object’s dynamicparameters is significantly different from the staticimage model. The static image model implies thatthere is always a set of transformations g ∈ G for anytransition βk β

v for which the following equation

is satisfied:

(3)

where , ∈ ISFs. The difference of the gener

alized model (2) is that all transitions (βk βv) ∈ Bc

possess the property (3), while transitions tn tn + 1

do not; i.e., no correspondence similar to (3) may beestablished between sets of transitions (tn tn + 1) ∈tc and the set of transformations on the set ISFs generated by them. It is this peculiarity of image sequencesthat makes it impossible to use ordinary recognitionmethods for their classification, although the problemstatement does not change its formal content.

PERMISSIBLE TRANSFORMATIONSOF PROJECTIONS

We set the topological mapping ϕ of a certainneighborhood ΩU of the group identity G into theregion ΩV of the Euclidean coordinate space ℜE, inwhich the group unit is mapped into the origin ofcoordinates. Then a system of real numbers p1, …, pr

representing coordinates of the point ϕ(p) ∈ ℜE corresponds to each point p ∈ ΩU of the local group G.

Let G be the local Lie group and D a certain differentiable system of coordinates in the group G. We designate coordinates of an arbitrary point p in the systemD as pi. Let

(4)

be the system of triply continuously differentiablefunctions for which

(5)

We suppose

(6)

and assume the determinant of the matrix benonzero. Then, the system of equations

(7)

ISpOsPs

ISjrs ISp

FsRs IS jfs h,( ),=

IS jfs ISp

PsFs ISpOsPs ISj

zs( )( )β t, .=

gIS kfs gISp

ZsOs ISpPsFs ISj

rs( )βk( )=

= ISpZsOs ISp

PsFs ISjrs( )β

v

( ) ISv

fs,=

IS kfs IS

v

fs

ϕi p( ) ϕi p1 … pr, ,( ), i 1 … r, ,= =

ϕi e( ) ϕi 0 … 0, ,( ) 0.= =

Pji ∂ϕi e( )

∂pj=

Pji

p'i ϕi p1 … pr, ,( ), i 1 … r, ,= =

may be interpreted as a new system of coordinates D 'in which numbers p 'i are taken as new coordinates ofthe point p. Apparently, the system D ' is differentiableand analytical if the system D and the transformation(7) are analytical. Transition from the system D to D 'is called a differentiable and analytical transformationof coordinates. The inverse transition from D ' to D is adifferentiable and, accordingly, analytical transformation of coordinates as well.

The following statement is derived from the theorem of invariance of transformations of Lie groupcoordinates proved by Pontryagin [7]: any local automorphism ϕ of the local group G is described in thesystem of coordinates by the equations

(8)

with triply continuously differentiable (analytical)functions ϕ j and nonzero functional determinant (s isa point in the neighborhood ΩU). Functions ϕ j are linear if the systems of coordinates are canonical ones ofthe first kind. Internal automorphism of the group iswritten as ϕa(p) = a–1pa, where a is the fixed elementof the group G.

Brightness functions of multigradation imagescome within the purview of the definition of triplycontinuous differentiable functions (it is sufficient tohave twice differentiable functions of binary masks inpractice), and the method for estimating the interframe difference makes it possible to determine theregions of object displacement and to find the“immovable region” being the internal automorphismof group coordinates describing the object.

It can be easily shown that the Euclidean space ℜE

is the topological space for which closure of set Mis determined as the set of all points belonging to M orbeing limiting for M. Let G' and G be two linearly connected topological groups of space ℜE correspondingto projections of the same objects in two neighboringframes, where group G is singly connected and locallyconnected. Let f be a certain local homomorphism ofgroup G to group G'. In this case, it is possible to continue the local homomorphism f to homomorphism ϕof the whole group G to group G' uniquely. Continuation of homomorphism f is understood in the sensethat f and ϕ coincide in a certain vicinity W of groupidentity G. Actually, in vicinity W, functions f and ϕcoincide, and since function f is continuous, functionϕ is continuous everywhere. If group G' is singly connected and locally connected, and function f is a certain local isomorphism, homomorphism ϕ is an isomorphism. Thus, the subgroups G, G', …, being theprojections of the same object in the orthogonal planeand close owing to a small interframe shift, are eitherisomorphic under constant direction and velocity ofrelative movement or homomorphic if the direction ofmotion is changed. Moreover, the found regions ofprojection shift are also compact subgroups, and by

p 'j ϕj p1 … pr, ,( ), j 1 … s, ,= =

M

182

PATTERN RECOGNITION AND IMAGE ANALYSIS Vol. 22 No. 1 2012

FAVORSKAYA

their variation, it is possible to clarify the behavior ofmotion for separate parts of objects (regions).

If projections of several objects overlap in timeseries, it is possible to follow the generation of projection homomorphism for the background objects toglobal homomorphism until total disappearance of theprojection. It is evident that the same situation takesplace if the object projection appears or disappearsfrom the sensor’s limited field of view. To recognize thesituation in the scene, it is required to analyze permissible transitions between groups of transformations.Since 2D images have a dual nature (they map the projection variation of the individual object and objectinteraction), the transitions between groups of transformations become wider. Affine and projectivegroups of transformations, as well as transformationsnot described by a group, may be considered isomorphic. Homomorphic transitions are divided into global transformations (when projections changesharply) and local transformations (when projectionsoverlap).

PREHISTORY MODEL OF OBJECT MOTION

Let us accept that the dynamic and static structuresare simplified graphical primitives of the real images ofthe objects. The integral estimations of variation ofobject projections in n sequenced frames are as follows: the shape of contour Kc of the common part ofthe projection between conditionally neighboringframes, represented as a set of boundary pixels or as aset of normalized vectors, and the area of the commonpart Sc. These parameters are calculated by using theestimations of the motion of dynamic structures byusing the local motion vector [8]. They describe properly the isomorphic transformations of technogenicobject images. The estimations are invariant to objectshape variation: the correlation function of commonparts of the projections Fcor and the structural con

stants of the Lie group make it possible to estimatethe variability and to reveal the behavior of objectmotion. Analysis of integral and invariant estimationsmakes it possible to find cases where the projectionsoverlap or objects disappear/appear in the sensor’s

field of view. If all four parameters Kc, Sc, Fcor, and change greatly, it means that the object track is “broken.” The rules in accordance with which the groupsof transformations are classified for anthropogenicobjects are developed at the stage of system learning.

Assume that the motion masks MFM of individualregions obtained at the stage of image sequence processing within a certain observation interval ti – n, …, ti,…, ti + n, where [i – n, i + n] are discrete time instants,are known. Since the discrete time instants are small,it may be accepted that a certain stable isolated set ofmotion masks characterizes the same object, which is

cjki

cjki

to be recognized, and change of projections occurssmoothly. We will consider that the lighting of thescene and weather conditions do not change withintime of observation of a single object.

Then, the prehistory of object motion will be represented by multidimensional time series of the following variables:

1. Motion trajectories TRC are constructed on thebasis of motion of the point of the object’s center ofgravity or according to a certain characteristic of theobject point (used for video surveillance of the recognized object).

2. Changes in the object shape SHO in accordancewith its motion in 3D space (used for object recognition).

3. Changes in the object shape SHI caused byinteraction of objects in the scene and appearance/disappearance of the object in the sensor’s view(used for recognition of actions and events in thescene).

Case 1 is elementary and does not require individual consideration. In cases 2 and 3, identical shapefeatures of an external contour are used; however,case 3 is complicated owing to extension of permissible transformation groups of projections (homomorphic transformations, when object projections overlap).

The motion prehistory model in the scene Msc

includes a set of static SS and dynamic DS structures,as well as their interactions:

(9)

where Ω is the set of points of the observed image(scene); SSi and DSj are single static and dynamic

structures, i = , Nss is the number of single static

structures in the scene, j = , Nds is the number

of single dynamic structures in the scene; and

represent static and dynamic structures in case of

overlapping, k = , Nss, ds is the number of overlapping structures in the scene; t is the observationtime.

Since interframe changes in contour shape of amoving region are rather small, we will calculate thenormal correlation function Rnorm(τ) throughout kframes with preliminary transition to polar coordinates:

Msc SSi

i 1=

Nss

∑ DSj

j 1=

N sd

∑+⎝⎜⎛

Ω

∫=

+ SSkI DSk

I+( )k 1=

Nss ds,

∑ ⎠⎟⎞

dt,

1 Ns,

1 Nds,

SSkI

DSkI

1 Nss ds,,

PATTERN RECOGNITION AND IMAGE ANALYSIS Vol. 22 No. 1 2012

RECOGNITION OF DYNAMIC VISUAL IMAGES 183

(10)Rnorm τ( )

struc ρ ϕ,( )struc ' ρ τ– ϕ τ–,( )ρ ϕ,( ) STRUC∈

struc2 ρ ϕ,( )ρ ϕ,( ) STRUC∈

∑ struc '2 ρ τ– ϕ τ–,( )

ρ ϕ,( ) STRUC∈

∑,=

where SS, DS, SSI, DSI ⊂ STRUC, and τ is the timedelay between the selected frames.

We introduce threshold values of the correlationfunction for the same object in different frames, whenprojections coincide with accuracy up to noise THh,and when they do not coincide:

—with accuracy up to the selected group of transformations (initial hypothesis) THg;

—with accuracy up to a change in transformationgroup due to object motion in 3D space THd;

—with accuracy up to a change in transformationgroup due to overlapping of several projections THc;

—with accuracy up to a change in transformationgroup due to disappearance or appearance of theobject in the sensor’s view THe.

The parameter values THh, THg, and THd areobtained using test samples, whereas the parametersTHc and THe are determined on the basis of the coordinates of the location of the moving region in the sensor’s view and sharp change in the motion mask area.Thus, we obtain the multidimensional time series ofthe scene (T, TRC, SHO, SHI), with each object possessing its own specific set of fixed variables (TRC j,SHO j, SHI j) at each instant T = ti – n, …, ti, …, ti + n.

RECOGNITION OF DYNAMIC IMAGES BASED ON PREHISTORY

To analyze the object prehistory, let us modify theprocedure for collective decision making in the following way. We will generate decisions Stl = R(Xtl), tl =t1, t2, …, tL being discrete moments of time, where Stl isthe decision obtained with the help of unique algorithm R under varying situation Xtl. In this case, it isknown that decisions Stl are connected by isomorphicor homomorphic transformations and characterize theobject’s dynamics.

Function F defines the way for generalizing theindividual decisions , , …, of the members of

the collective. It is possible to choose the procedure forequitable voting for the members of the collective,under which a decision is reached by the majority ofvotes:

(11)

where Kr(Xt) is the number of votes obtained by decision under situation Xt; M is the number of possible

decisions; r = 1, 2, …, M is the number of alternative

St1St2

StL

St F St1St2

… StLXt, , , ,( ) Sti

,= =

if KtiKr Xt( ),

r M∈max=

Str

decisions; and is the decision for benefit of the ith

image. Since it is forbidden to vote for the benefit ofmore than one decision, the following condition istrue:

(12)

However, voting flexibility is reached by weightingthe decision of each member of collective and organizing the voting process with weights. For this purpose,in function (10), it is sufficient to determine

(13)

where ηj ≥ 0 is the weight of rule Rη, and Lr is the set ofcollective members that voted in favor of decision r.The weights of decision rules are selected such thattheir sum will be equal to 1 for all possible Xt. In thiscase, the decision of the collective of decision rules St

is determined by the decision of that rule Rtl to which

region of authority Btl the output image belongs.

Such an approach means a two level procedure forrecognition. At the first level, it is recognized whether

image belongs to one region of authority oranother. At the second level, the decision rule whoseauthority is highest in the given region of authorityenters into force (synthesis of algorithm F). The decision of this rule is identified with the decision of thewhole collective. In general, the twolevel rule of thecollective can be written as follows:

(14)

In this case, the preference is given to the ti decision

rule to the region of authority of which the image belongs, i.e., where the authority coefficient is maximal. If the decision of this rule is = , the decision

of the collective is determined as St = = , i.e.,

∈ . The basic stage of twolevel organization

for collective decision making is to learn to recognizethe regions of authority of the decision rule. Theregions of authority are generated on the basis of probabilistic rules, the compactness hypothesis, random

Sti

Kr

r 1=

M

∑ L.=

Kr ηj,

j Lr∈

∑=

IStR

IStR

IStR Vtj

, if F St1St2

… StL, , ,( )∈

= StiVtj

, where ti ηtl.

l L∈maxarg= =

IStR

StiVtj

StiVtj

IStR Vtj

184

PATTERN RECOGNITION AND IMAGE ANALYSIS Vol. 22 No. 1 2012

FAVORSKAYA

subregion selection, etc. To find the regions of authority, it is possible to use selflearning algorithms, forwhich it is not necessary to set the number of classes.In this case, first of all, the unknown abstract imagesare selected. After that, for each abstract image, itsown decision rule is generated. The decision rule reaction for any image belonging to the given abstractimage is identified by the collective decision.

Then, we consider classical ways of obtaining themeasure of similarity and extend them for the objectshaving morphisms. It is known that, to obtain themeasure of similarity of functions in the space of features, Hausdorff distance, Gromov–Hausdorff distance, Frechet distance, and natural pseudodistanceare used. Let there be two topological spaces XOs andXZs for which the epimorphic mapping pOsZs ⊆ XOs × XZs

is fulfilled (for the sake of simplicity, the designation pwill be used instead of pOsZs, O instead of XOs, and Zinstead of XZs). The inverse mapping p–1 is epimorphicin this case as well. Then the classical Hausdorff distance dH(o, z) between the elements o and z of twononempty compact sets O and Z in metric space (S,ds) will be determined as infp ∈ Csup(o, z) ∈ pdS(o, z),where C is the set of correspondences between the elements o and z [3]. The Gormov–Hausdorff and Frechet distances are calculated similarly, and a subset ofthe set of all permissible correspondences of elementsis sometimes considered in the case of using the Frechet distance. It should be noted that the expressioninfpF(p), where F represents a certain functional, issought in all three cases. Thus, if the Hausdorff distance is used, the equation F(p) = sup(o, z) ∈ pdS(o, s) issatisfied only when o ≠ z.

The concept of metric pseudodistance is introduced for continuous topological space characterizedby homomorphisms. If two sets O and Z having mappings ϕ: O ℜ, ψ: Z ℜ may be considered as aset Hom(O, Z) of all homomorphisms between O andZ, then the extended pseudodistance is obtained asinfh ∈ Hom(O, Z)maxo ∈ O |ϕ(o) – ψ(h(z))| or +∞ depending on whether the elements of sets O and Z are homomorphic [5].

We simplify the expressions presented aboveassuming that there is a homomorphism between theelements of topological spaces O and Z and demonstrate that the metric spaces are compact in this case.

Let C be a certain category with nonempty sets ofobjects Obj(C) and morphisms Mor(C) for which thefollowing conditions are satisfied:

1. All the objects belong to topological spaces.2. Each set (possibly an empty one) of morphisms

Mor(O, Z) between O and Z is a subset of a set of correspondences that includes all the possible homomorphisms of O into Z.

3. If p ∈ Mor(O, Z), then p–1 ∈ Mor(Z, O).We transform (O, Z) into the set Obj(C) × Obj(C).

We introduce the family of functionals F(O, Z): Mor(O,Z) ℜ possessing the following properties:

1. For each p ∈ Mor(O, Z), the equationF(O, Z)(p) ≥ 0 is satisfied.

2. If izO is an isomorphism of O, then F(O, O)(izO) = 0.

3. For each p ∈ Mor(O, Z), the equationF(O, Z)(p) = F(Z, O)(p–1) is satisfied.

4. If p ∈ Mor(O, Z) and q ∈ Mor(Z, W), then thecondition F(O, W)(q p) ≤ F(O, Z)(p) + F(Z, W)(q) is satisfied.

We determine the extended pseudodistance δ(O,Z) on the set Obj(C) based on the family of functionalsF(O, Z) as follows:

(15)

The pseudodistance turns into an ordinary distancein a specific case; the following axiom is satisfied: d(O,Z) = 0 O = Z.

We redefine the distances considered above with(15) taken into account:

⎯Hausdorff pseudodistance. The category Cincludes objects belonging to nonempty compact subsets of the space (S, ds). Then we establish the functional F(O, Z)(p) = sup(o, z) ∈ pdS(o, z) for each pair (O,Z) ∈ Obj(C) × Obj(C) and p ∈ Mor(O, Z).

⎯Gromov–Hausdorff pseudodistance. The category Cincludes objects belonging to the set of nonempty compact metric spaces (W, dW). Then, we introduce the func

tional F(O, Z)(p) = (f(o), g(z)),

where functions f(o) and g(z) describe all the possibleobject mappings of sets O and Z on metric spaces W,for each pair (O, Z) ∈ Obj(C) × Obj(C) and p ∈Mor(O, Z).

⎯Frechet pseudodistance. The category C includesobjects represented by curves γ: [0,1] ℜn. Morphisms of two curves γ1 and γ2 will be written as(γ1(α(t)), γ2(β(t))), where t ∈ [0, 1], and α, β:[0, 1] [0, 1] are two nondescending epipolar continuous functions. Then F(O, Z)(p) = .

⎯Metric pseudodistance. The category C includesthe objects described by continuous functions ϕ:O ℜ, where O is the covering of all varieties.Then, morphisms of two functions ϕ: O ℜ and ψ:Z ℜ provide the homomorphism h of objects fromO into Z, and the functional F(O, Z)(h) is as follows:

F(O, Z)(h) = .

δ O Z,( )

= infp Mor O Z,( )∈ F O Z,( ) p( ) for Mor O Z,( ) 0≠

+∞ for Mor O Z,( ) 0.=⎩⎨⎧

inf W dW,( ) f g, , sup o z,( ) p∈ dW

sup o z,( ) p∈ o z–

maxo O∈ ϕ o( ) ψ h o( )( )–

PATTERN RECOGNITION AND IMAGE ANALYSIS Vol. 22 No. 1 2012

RECOGNITION OF DYNAMIC VISUAL IMAGES 185

Thus, four types of pseudodistances may be used inorder to obtain the measure of similarity of inputdynamic images with dynamic reference imagesdepending on the representation of the dynamic feature, as a set of numerical parameters, set of vectors, orset of functions. Comparison of the input and the reference images occurs with permissible morphologicaltransformations, which are observed at a certain timeinterval, taken into account. In this case, unsuccessfulobservations of input images (when object projectionsoverlap, the scene is disturbed by light sources, etc.)may be rejected, and the most suitable observationsmay be selected.

EXPERIMENTAL RESULTS

To find the measure of function similarity in thefeature space, we use the extended Hausdorff, Gromov–Hausdorff, and Frechet pseudodistances, andalso the natural pseudodistance under the assumptionthat there is a homomorphism between the elements oftopological spaces of objects O and patterns Z [3–6].During the investigation, the functionals for all kindsof pseudodistances were found. The Hausdorffpseudodistance is calculated for the objects notbelonging to empty compact subsets of space (C, dC),the Gromov–Hausdorff pseudodistance is calculatedfor the objects belonging to the set of nonempty compact metric spaces (W, dW), the Frechet pseudodistance is calculated for the objects that are representedby curves, and the natural pseudodistance is calculatedfor the objects that are described by continuous functions.

In the experiments, we used test image sequences(Hamburg taxi, Rubics cube, Silent, video sequences,and infrared sequences taken from test baseOBCBVS). For learning and recognition, we used theintegral normalized estimations for the shape of contour Kc of the common part of object projectionbetween conditionally neighboring frames and thearea of the common part Sc. The correlation functionof common parts of projections Fcor was used as theinvariant estimation. Table 1 depicts the averagedresults of recognition of the technogenic objects (cars)taken from the video sequence Hamburg taxi for aseries of ten conditionally neighboring frames on thebasis of Hausdorff and Gromov–Hausdorff pseudodistances, respectively. As is seen from the presenteddata, the recognition according to the Gromov–Hausdorff pseudodistance, which considers both thetop and the bottom exact boundaries, is more accurate. The noise present in the video sequence influences the segmentation accuracy and, respectively, therecognition accuracy in the separate frames. In particular, owing to glares, the Object 3 image was not seg

mented absolutely clearly, which resulted in worse recognition.

Table 2 depicts the averaged results of recognitionof the anthropogenic objects (people) taken fromvideo sequence Sequence 1b (test base OTCBVS) forthe series of 20 conditionally neighboring frames onthe basis of the Frechet pseudodistance and the natural pseudodistance. In this case, we use the estimationsof the contour Kc shape and the correlation function ofcommon parts of projections Fcor, since the objects arecharacterized by high variability of the regions. Fromthe presented data, it follows that the best results areobtained if the natural pseudodistance is used. It considers the visual projection morphisms more accurately. Worse results of recognition relative to technogenic objects were obtained since in some frames several people moved in the same direction, which wasconsidered by the recognizing system as an integratedobject with a “break” of the separate object tracking.

Data for Object 4 and Object 5 are significantly different from the remaining data, since in this case people moved close to each other for a certain period oftime, and they were segmented as an integrated object.From here, it is possible to conclude that the segmentation of real video sequences influences greatly theresults of recognition. But if the modified method for

Table 1. Results of recognition of the objects taken from videosequence Hamburg taxi

Object

Hausdorff pseudodistance

Gromov–Hausdorff pseudodistance

accuracy, % false recognition, % accuracy, % false recog

nition, %

Object 1 95.34 2.01 96.98 1.96

Object 2 95.01 2.39 95.39 2.30

Object 3 94.12 2.61 94.87 2.13

Object 4 95.22 2.32 95.80 2.21

Table 2. Results of recognition of the objects taken from thevideo sequence Sequence 1b

Object

Freshe pseudodistance Metrics pseudodistance

accuracy, % false recognition, % accuracy, % false recog

nition, %

Object 1 93.02 2.69 93.87 2.65

Object 2 93.41 2.88 93.64 2.71

Object 3 93.08 2.70 93.17 2.83

Object 4 87.62 3.95 87.81 3.62

Object 5 86.98 4.28 87.02 3.96

Object 6 93.25 2.85 93.38 2.79

186

PATTERN RECOGNITION AND IMAGE ANALYSIS Vol. 22 No. 1 2012

FAVORSKAYA

collective decision making, which considers the resultsof recognition of n sequence frames, is used, it is possible to increase the recognition accuracy additionallyby 2.4–2.9%, on average.

The experimental estimations of the recognitionaccuracy and the number of computer operationsincluding processing and output of the results on thescreen for three methods, i.e., contour template [9]and 3D gradient [10] methods, as well as the developedmethod based on segmentation with respect tomotion, are presented in the Fig. 1 for video sequenceSequence 6b (test base OTCBVS). It can be seen fromthe constructed graphs that the accuracy of the proposed method is comparable to the one of the 3D gradient method, but the computational effort required islower.

CONCLUSIONS

The investigations carried out make it possible tofind the integral and invariant features for movingobject projections which have undergone isomorphicand homomorphic transformations for a given periodof time.

Four kinds of pseudodistances for finding the measure of similarity between input dynamic images anddynamic reference images (depending on dynamic

feature representation: the sets of numerical parameters, vectors, and functions) are suitable. The inputand pattern samples were compared by consideringpermissible morphological transformations takingplace at a certain time interval.

If the modified method for collective decisionmaking is used, it makes it possible to reject the unsuccessful observations for the input images (the caseswhere the object projections overlap, the scene is disturbed by light sources, etc.) and to choose the mostsuitable observations. The modified procedure for collective decision making makes it possible to increasethe recognition accuracy by 2.4–2.9%, on average.

REFERENCES

1. S. Ali and M. Shah, “A Lagrangian Particle DynamicsApproach for Crowd Flow Segmentation and StabilityAnalysis,” in Proc. IEEE Int. Conf. on Computer Visionand Pattern Recognition (Minneapolis, 2007), pp. 1–6.

2. D. Weinland, R. Ronfard, and E. Boyer, “Free Viewpoint Action Recognition Using Motion History Volumes,” Comp. Vision Image Understand., No. 104 (2),249–257 (2006).

3. D. Burago and S. Ivanov, “Boundary Rigidity and Filling Volume Minimality of Metrics Close to a Flat One,”Ann. Math. 171 (2), 1183–1211 (2010).

4. F. Memoli, “On the Use of Gromov–Hausdorff Distances for Shape Comparison,” in Proc. Eurographics

1009080706050403020

1000119812001398

1400159816001798

1800199820002198

2200239824002598

2600279828002998

1.4

1.3

1.2

1.0

0.9

0.8

1.1

750 754 758 762 766 770 774 778 782 786Frames

Test 1

Acc

ura

cy,

%C

omp

ute

r o

per

atio

ns,

mln

.

Test 2Test 3

Test 1Test 2Test 3

(a)

(b)

Fig. 1. Experimental results of comparison of contour template method, 3D gradient method, and the developed segmentationmethod with respect to motion: (a) recognition accuracy (missed data indicate lack of the observed object in the video sequence);(b) required computer operations.

PATTERN RECOGNITION AND IMAGE ANALYSIS Vol. 22 No. 1 2012

RECOGNITION OF DYNAMIC VISUAL IMAGES 187

Symp. on Point–Based Graphics (Prague, 2007),pp. 81–90.

5. P. Donatini and, P. Frosini, “Natural Pseudodistancesbetween Closed Curves,” Forum Math., No. 6, 981–999 (2009).

6. L. Pestov and G. Uhlmann, “Two–Dimensional Compact Simple Riemannian Manifolds Are Boundary Distance Rigid,” Ann. Math., No. 2, 1093–1110 (2005).

7. L. S. Pontryagin, Continuous Groups (Nauka, Moscow,1984) [in Russian].

8. M. N. Favorskaya, “The Way to Estimate Objects’Motion in Complicated Scenes on the Base of TensorApproach,” in Digital Signals Processing (Moscow,2010), No. 1, pp. 2–9.

9. S. Lee, I. D. Yun, and S. U. Lee, “Robust Bilayer VideoSegmentation by Adaptive Propagation of GlobalShape and Local Appearance,” J. Vision Commun.Image R, No. 21, 665–676 (2010).

10. A. Klaser, M. Marsza ek, and C. Schmid, “A SpatioTemporal Descriptor Based on 3DGradients,” inProc. British Machine Vision Conf. (London, 2004),pp. 995–1004 (2008).

l/

Margarita Nikolaevna Favorskaya. Born 1958. Graduated fromRybinsk Aviation Technology Institutein 1980. Received candidate’s degree in1985. Postdoctoral researcher at Reshetnev Siberian State AerospaceUniversity. Scientific interests: digitalimage and video sequence processing,pattern recognition, fractal imageprocessing, artificial intelligence,information technologies, remotemethods for natural resource moni

toring. Author of one monograph and 26 articles.