a robust perception based method for iris tracking

ARTICLE IN PRESSJID: PATREC [m5G;October 16, 2014;16:37]

Pattern Recognition Letters 000 (2014) 1–7

Contents lists available at ScienceDirect

Pattern Recognition Letters

journal homepage: www.elsevier.com/locate/patrec

A robust perception based method for iris tracking ✩

Vittoria Bruni a,∗, Domenico Vitulano b

a Department of Scienze di Base e Applicate per l’Ingegneria, University of Rome La Sapienza, Via A. Scarpa 16, Rome 00161, Italyb Istituto per le Applicazioni del Calcolo M. Picone—C.N.R., Via dei Taurini 19, Rome 00185, Italy

a r t i c l e i n f o

Article history:

Received 8 May 2014

Available online xxx

Keywords:

Iris tracking

human visual system

Jensen–Shannon divergence

a b s t r a c t

The paper presents an application of the modified kernel based object tracking to iris tracking. Human

perception rules are used for defining a proper feature space for iris that mainly accounts for the fact that

eyes represent instinctive visual fixation regions. In addition, a similarity metric that is close to the way human

vision compares and perceives the difference between scene components has been employed for finding the

exact iris location in subsequent frames. As a result, just one iteration of the mean shift algorithm allows us

to get a faithful estimation of iris location in subsequent frames. This property makes the proposed algorithm

implementable on mobile devices and useful for real time applications. Experimental results performed on

MICHE database show that the proposed method is robust to changes in illumination or scale, iris partial or

total occlusion for some subsequent frames, blinks.

© 2014 Elsevier B.V. All rights reserved.

1

o

t

i

h

r

m

r

i

t

a

s

o

f

t

j

p

[

m

t

s

n

c

i

u

g

t

p

m

I

w

s

i

t

m

t

s

t

m

i

e

t

o

p

m

t

p

t

h

0

. Introduction

Iris detection and tracking represent two important tasks not

nly in biometric applications. Several recent studies and applica-

ions proved that iris movement is very informative and can be used

n different ways in several fields. It is not only a fingerprint for

uman being, but it also plays a significant role in psychology,

obotics, security, and especially in neurological studies. Eye move-

ent is actually a widely investigated topic in neurology and in vision

esearch since it represents the basis of perceptual learning and gives

nformation about the way the brain codes visual information, reacts

o visual stimuli, learns from what it sees, selects what is important in

visual scenario [31,1,12,37,28,32]. On the other hand, neurological

tudies gave new flour to image processing applications since they

ffered new instruments for processing and understanding visual in-

ormation. In fact, independently of its origin and the way it has been

ransmitted, image is received and processed by human brain which

udges its quality and learns its content. As a result, it makes sense to

rocess image information in a perceptual guided way. Saliency maps

25], that represent the image content in a hierarchical way from the

ost visible to the less visible region, are a representative example of

his new way of processing image. Following this concept, it has been

hown that image anomalies can be detected as those image compo-

ents that are visible at the first glance [7]. In fact, they immediately

apture human attention since they are perceived as foreign objects

✩ This paper has been recommended for acceptance by Nappi Michele.∗ Corresponding author: Tel.: +39 06 49766648; fax: +39 06 4957647.

E-mail address: [email protected], [email protected] (V. Bruni).

a

I

l

i

o

ttp://dx.doi.org/10.1016/j.patrec.2014.09.001

167-8655/© 2014 Elsevier B.V. All rights reserved.

Please cite this article as: V. Bruni, D. Vitulano, A robust perception b

http://dx.doi.org/10.1016/j.patrec.2014.09.001

n the scene. In the same way, image content can be coded in a non

niform way according to its visual importance [40,22,39]. More in

eneral, the use of the mechanisms that regulate human vision allows

o optimize a wide class of image processing based systems and ap-

lications, like compression, restoration, printing, watermarking, seg-

entation, displaying, and also object tracking [40,22,39,5,6,33,8,17].

n fact, in a recent paper the perceptual interpretation (version) of a

ell known tracker allowed to achieve satisfying and quite impres-

ive results [5,6]. Inspired by this work, in this paper we are interested

n studying to what extent the use of some features that are related

o the way iris is perceived by human eye can help in its tracking,

aking tracking robust and at the same time quite fast. Actual iris

racking algorithms are based on optimization procedures like mean-

hift, Kalman filtering, particle filtering or their combination in order

o compensate the limits of each technique [24,2,43,20,42,35]. The

ajor effort consists of making those trackers robust to changes in

llumination, eye closure, face orientation, distance from the cam-

ra, etc. It often means to define specific and distinctive features for

he iris as its geometrical appearance (circle or ellipse), the presence

f a bright pupil effect, the high contrast with respect to the white

art of the eye, the color appearance and so on. That is why some

odels look at the iris contours, characterize their profile and track

hem in subsequent frames. However, those methods, even though

erforming, can be computationally expensive since more distinctive

arget features, especially the ones related to the geometrical appear-

nce, can require additional and not negligible computational load.

n addition they might be not robust to occlusions or changes in il-

umination or scale. One of the most interesting neurological results

s that human eyes are attracted by very few points (regions) of the

bserved scene [18,27,30]. In addition, human eye reacts to a moving

ased method for iris tracking, Pattern Recognition Letters (2014),


http://www.ScienceDirect.com

http://www.elsevier.com/locate/patrec

mailto:[email protected]

mailto: [email protected]



2 V. Bruni, D. Vitulano / Pattern Recognition Letters 000 (2014) 1–7


2

p

n

[

a

[

h

a

[

s

t

fi

c

o

b

k

a

S

t

i

t

s

s

s

S

w

t

g

m

W

μm

a

o

(t

c

t

o

f

h

t

t

s

o

p

s

t

t

i

f

d

t

t

u

a

s

S

w

a

stimulus which can be modeled as a combination of luminance mod-

ulated (first order motion) and contrast modulated (second order

motion) sinusoidal stimulus [36,23,19,29]. With regard to face pro-

cessing, eyes belong to these attentional points [21] and iris repre-

sents its prominent and highly contrasted component; moreover iris

is a moving and textured eye component. These concepts have been

confirmed also by recent studies in biometric [12] in which eye move-

ments (saccades) and fixations are deemed special features of indi-

viduals to be used for human identification. Hence, the main aim and

the novelty of the presented work is the use of perceptual concepts to

track the organ that is responsible for visual perception. To this aim,

mean shift algorithm is still used in the optimization process, but

some neurological results are exploited for defining a proper feature

space which takes into account the sensitiveness to a moving and tex-

tured stimuli as well as the measures that better characterize fixation

points, namely luminance and contrast. Finally, the Jensen–Shannon

divergence (JSD) is used as similarity metric due to its correlation

with the visual system when comparing different objects [4,8,3]. JSD

is able to recognize iris in subsequent frames even if iris appearance

changes due to partial eye closure, iris movement, change in illumina-

tion, distance from the camera. In fact, JSD is less sensitive to changes

in iris geometrical features thanks to its dependence on a more global

measure, like the distribution of the luminance and contrast in iris

region.

As a result, the focus of the paper, and its main difference with

the existing literature about the topic, is not the iris but the way it

is perceived by the observer. Human eye is sensitive to iris since it

belongs to a region of the face that is fixated by human eye at the first

glance and that is instinctively used by human eye in the recognition

process. The additional perceptual property of the iris is that it is a

moving and highly contrasted component of the scene and then it

attracts human attention more than other parts of the eye. As a re-

sult the use of specific characteristics of fixation points in terms of

luminance and temporal contrast as well as the use of a metric that

weights image information in a way that is consistent with the one

employed in the vision process, allow us to define a very simple and

almost inexpensive tracker that, on the one hand is able to compete

with existing trackers in terms of tracking precision, without exploit-

ing specific objective characteristics of the iris, and, on the other hand,

to outperform them in terms of required computational load, mak-

ing it suitable for real time applications and easily implementable on

mobile devices.

Experimental results show very promising tracking results. The

combination of the contrast/luminance based feature space and the

similarity metric allows mean shift to converge after one iteration

with a considerable reduction of the computing time, without intro-

ducing annoying drifting effects. In addition, the tracker is robust to

blink, changes in illumination, zoom, scale and contrast and it does

not require Kalman filtering or more specific morphological and/or

geometrical iris features to get satisfying performance.

The remainder of the paper is the following. Next section first gives

in depth motivations for the feature space and the similarity metric,

then describes the tracking algorithm. Section 3 presents some exper-

imental results and gives implementation details. Finally, Section 4

draws the conclusions and gives guidelines for future research.

2. Perception based feature space and similarity metric

In this section visual perception is used for defining a proper fea-

ture space for iris and for selecting a proper similarity metric to use

in iris tracking process. It is worth observing that the visual features

are defined just for tracking and not for iris detection. Iris detection

is a delicate topic and it is out of the scope of this paper. In the paper

we assume that iris has been detected in the first frame of the video

sequence using an existing iris detector [9,10] or it has been manually

indicated by the external user.



.1. Luminance and contrast based iris feature

Several neurological studies proved that during the observation

rocess few points attract human attention and those few points are

ecessary for understanding the scene content in the early vision

18,27,30]. Those fixation points depend on both the image subject

nd the luminance and contrast characteristics of the scene content

18,27,11]. With regard to face processing, it has been shown that

uman eye has a sort of preference for the regions containing eyes

nd nose, with particular and immediate importance for the left side

21]. It is due to the fact that the right part of the brain is the one

ensitive to low frequencies and actually the early vision corresponds

o a more global vision of the scene (a sort of adaptive low pass

ltering). Although the complexity of human vision, luminance and

ontrast remain the two main measurements that are involved in the

bservation process, especially in the early vision, and it is true for

oth static and moving scenes. The two interesting aspects of this

ind of vision process are that luminance and contrast seem to have

sort of independence in correspondence to fixation points [18,27].

pecifically, light adaptation (luminance gain) and contrast gain are

he two rapid mechanisms that control the gain of neural responses

n the early vision and they operate almost independently. It means

hat luminance and contrast can be considered as two independent

ources and the visual stimulus is the linear combination of these two

ources. More precisely, for each point (x, y) of the image I, the visual

timulus can be modeled as

(x, y) = 1

2L(x, y)+ 1

2C(x, y) (1)

here L(x, y) and C(x, y) represent the luminance and the visual con-

rast measures at the image point (x, y). The visual contrast is, in

eneral, measured as a sort of a normalized spatial variation of the lu-

inance [41]. More common definitions are the ones based on the

eber’s law, i.e. CW(x, y) = (σL(x, y))/(μL(x, y)) where σL(x, y) and

L(x, y) respectively are the image local standard deviation and the

ean of the luminance, which measures the visibility of an object with

uniform luminance with respect to a uniform background, or the

ne based on the definition of the Michelson contrast, i.e. CM(x, y) =max L(x, y)− min L(x, y))/(max L(x, y)+ min L(x, y)), which measures

he visibility of a sinusoidal stimuli. In this work, the Michelson visual

ontrast has been considered due to the textured nature of the iris and

he fact that there is a close relationship between the distributions

f Michelson contrasts in natural images and the contrast response

unctions of neurons [34].

However, the persistence of vision is another characteristic of

uman vision. It derives from the fact that neurons in the visual sys-

em summate information over both space and time. It turns out that

he receptive fields have spatio-temporal characteristics and not only

patial [36]. It means that the evolution of luminance and contrast

ver time in a fixed location cannot be neglected. In particular, it is

ossible to define a first order motion stimulus, that is related to the

patio-temporal variation of the luminance, and a second order mo-

ion stimulus which is related to the motion of a contrast modulated

exture [36,23,19,29]. Also in this case the two stimuli are almost

ndependent and the global visual stimulus is the sum of the two dif-

erent spatio-temporal stimuli. In general, the second order motion is

etected and modeled as the ratio between the temporal variation of

he luminance and its spatial variation. Based on these observations,

he feature space, that can be seen as the image of the perceived stim-

li, can be roughly modeled as the weighted sum of the luminance

nd contrast components whenever the contrast assumes a temporal

ignificance. In other words, the stimulus at time t can be written as

(x, y, t) = 1

2L(x, y, t)+ 1

2C(x, y, t) (2)

ith C(x, y, t) = (L(x, y, t)− L(x, y, t − 1))/(L(x, y, t)+ L(x, y, t − 1)). In

ddition, since the perception of scene content at the first glance and



V. Bruni, D. Vitulano / Pattern Recognition Letters 000 (2014) 1–7 3


0 100 2000

2000

4000

0 200 4000

1000

2000

3000

0 100 2000

1000

2000

3000

Fig. 1. Top: three different portions of the grayscale version of a video sequence in

MICHE database. Middle: the corresponding visual stimuli images, defined as in Eq. (2).

Bottom: histograms of the visual stimuli images.

f

v

v

t

h

f

s

m

t

q

w

a

a

c

p

w

f

s

p

i

t

t

v

o

f

w

2

s

t

b

l

B

c

m

t

p

p

a

i

t

i

(

r

D

D

i

g

i

J

w

α

q

ααa

g

a

J

H

i

i

I

d

b

m

a

m

p

e

t

o

i

f

J

or fast moving objects does not imply the perception of the actual

alue of the luminance but a sort of local mean value, the quantized

ersions of L and C have been considered before their summation and

he histogram of the achieved visual stimuli image S in the iris region

as been defined as the iris feature to be detected in subsequent

rames.

By denoting with u the bin, m the number of bins, x = (x, y) the

patial location of the pixels in the iris region, b the function that

aps x into the corresponding bin in the quantized feature space, k

he kernel function, the histogram qu of the reference iris is

= c

n∑i=1

k(‖xi‖2)δ(b(xi)− u),

ith δ the delta function, n the number of pixels in the iris region

nd c a constant such that∑m

u=1 q = 1, that is c = 1/(∑n

i=1 k(‖xi‖2)),ssuming the target centered at the spatial location 0 with normalized

oordinates. Similarly, the candidate iris feature is defined as

(y) = ch

nh∑i=1

k

(∥∥∥∥y − xi

h

∥∥∥∥2)

δ(b(xi)− u), (3)

ith ch = 1/(∑nh

i=1k(‖ y−xi

h‖2)), y the center of the iris in the current

rame, nh is the number of iris pixels in the current frame and h a

caling parameter that sets the bandwidth accounting for the mor-

hological changes of the target in the sequence. As it can be observed

n Fig. 1, even though the quantized S reduces image information in

erms of density of luminance values, it is still able to well charac-

erize iris region since it embeds the main luminance and contrast

ariations. Iris feature in the middle frame is more complex than the

ne of the iris in the rightmost frame since it belongs to a group of

rames in which it moves very fast. It is not so for the rightmost frame,

hich belongs to a static scene.

.2. The similarity metric

In the previous section, the histogram of the iris region in the

timulus space S has been defined as the iris feature. It turns out the

racking procedure must rely on a similarity metric between proba-

ility distributions. Among the plethora of available distances in the

iterature, the classical mean shift algorithm for iris tracking uses the

hattacharyya distance for this purpose mainly due to its theoreti-

al relationships with the Bayes error and Fisher measure of infor-

ation [20,14]. However, in a perceptual scenario also the metric



hat measures how close are two objects must rely on some visual

erception concepts. The Jensen–Shannon divergence has both these

roperties: it measures the distance between two density functions

nd it well correlates with human visual system. In addition, it has

nformation theoretic foundations that allows it to well correlate with

he vision system whenever the vision process is seen as an encod-

ng/decoding system in which human eye represents the receiver

decoder) [4,8,3].

If X and Y are two random variables with distributions p and q, the

elative entropy or Kullback–Leibler divergence [15] is

KL(p||q) =∫ −∞

−∞p(x) log

(p(x)

q(x)

)dx. (4)

KL ∈ [0,+∞), it is not symmetric and it does not satisfy the triangular

nequality. One of its symmetric version is the Jensen–Shannon diver-

ence (JSD) [16]. The Jensen–Shannon divergence of N pdfs p1, . . . , pN

s defined as convex sum of DKL computed over suitable pdfs, i.e.

SD(p1, . . . , pkN) =N∑

j=1

αjDKL(pj||r) (5)

here r = ∑Nj=1 αjpj is a mixture of the N pdfs p1, . . . , pN with weights

1, . . . , αN such that∑N

j=1 αj = 1.

In tracking process N = 2 and then p1 = p and p2 = q, where p and

respectively are the pdfs of the reference and candidate target. r =p + (1 − α)q is the mixture of p and q and depends on the parameter

, that measures the probability that the random variable having r

s pdf is equal to the variable having p as pdf. By setting α = 1/2, we

et the classical definition of the Jensen–Shannon divergence, that

ssumes the same probability for p and q. It is worth observing that

SD can be also rewritten as JSD(p, q) = H(m)− 12 H(p)− 1

2 H(q), where

(∗) denotes the Shannon entropy of (∗).The use of the Jensen–Shannon divergence in the tracking process

s equivalent to model the perception of a modification of the target

n successive frames as a problem of information transmission rate.

n fact, the perceived difference between the two objects not only

epends on changes in luminance, shape and content of the scene,

ut it also depends on the way human vision reacts to the different

odifications of the target. Hence, human vision works as a sort of

dditional noise in the information transmission process and the JSD

easures the capacity of a noisy information channel with two inputs,

and q, giving the output r [16,38].

Under the hypothesis that the target location does not consid-

rably change from one frame to the successive one, if y0 denotes

he location of the center of the iris in the previous frame, the first

rder Taylor expansion of JSD around the point p(y0)can well approx-

mated by JSD(p(y), q) evaluated at a generic point y in the current

rame. More precisely,

SD(p(y), q) ≈ JSD(p(y0), q)+ ∂

∂pJSD(p(y), q)‖y=y0

(p(y)− p(y0))

= 1

2DKL

(q,

p(y0)+ q

2

)−

∑u

(log

p(y0)+ q

2p(y0)

)p(y)

2

= 1

2DKL

(q,

p(y0)+ q

2

)+ 1

2log 2

− 1

2

∑u

p(y)log

(1 + q

p(y0)

)

≤ 1

2DKL

(q,

p(y0)+ q

2

)+ 1

2log 2

− 1

2

∑u

p(y)

(p(y0)q

p(y0)+ q

),





0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

no. frame

JSD(p,q)

Fig. 2. Top: three different frames of a video sequence in MICHE database. The frames

are in correspondence to an eye blink. Iris region is indicated by the black ellipse.

Bottom: Jensen–Shannon divergence between the candidate and reference iris in cor-

respondence to the frames containing the blink.

3

e

r

a

q

s

where the inequality log(1 + x) ≥ xx+1 , ∀ x ≥ 0 has been used. Using

Eq. (3) it becomes

JSD(p(y), q) ≤ 1

2DKL

(q,

p(y0)+ q

2

)

+ 1

2log 2 − 1

2

∑u

(p(y0)q

p(y0)+ q

)

· Ch

nh∑i=1

k

(‖y − xi

h‖2

)δ(b(xi)− u) (6)

where p and q respectively are iris histograms in the current and

reference frames of the analyzed sequence. Hence, the point y in

the current frame that minimizes JSD(p(y), q) is the one that better

detects the iris in the same frame. Since only the last term of the

previous equation depends on y, the minimum value of JSD(p(y), q)with respect to y is the one that realizes the maximum of the last term

of the previous equation. This term represents the density estimate

computed with the kernel profile at y in the current frame where the

data have been weighted by the following weights wi

wi =∑

u

p(y0)q

p(y0)+ qδ(b(xi)− u).

Hence, the mean shift algorithm can be used for iteratively finding

the point y1 that realizes the maximum value of the rightmost term

of Eq. (6). Following the mean shift optimization procedure [14,13], at

each iteration y1 = (∑nh

i=1wixig(

y0−xih

))/(∑nh

i=1wig(

y0−xih

)), where g(∗)is the first derivative of the kernel function k. It is worth stressing that

in the approximation of the JSD an upper bound of the metric has been

employed. This allows us to get a simpler and less computationally

expensive form for the weights wi to use in the mean shift algorithm.

2.3. The algorithm

The proposed iris tracking algorithm is a modification of the

kernel-based object tracking [14] and consists of the following steps:

1. Select iris region at the first frame (time t = 1) by defining an ellipse

containing the target;

2. Compute the weighted histogram of the stimuli space S (iris

feature)

q = c

n∑i=1

k(‖xi‖2)δ(b(xi)− u)

of the reference iris in the reference frame;

3. Compute the histogram pt(y0) of the candidate iris at the current

time t and evaluate the quantity JSD(pt(y0), q);4. Set the weights wi = ∑

upt(y0)q

pt(y0)+q δ(b(xi)− u);

5. Find the candidate iris central location y1 = (∑n

i=1 xiwig(y0−xi

h))/

(∑n

i=1 wig(y0−xi

h)) via the mean shift algorithm [14] using a linear

kernel function k;

6. Compute the new histogram pt(y1) of the candidate iris and eval-

uate JSD[pt(y1, q)];7. While JSD(pt(y1), q) > JSD[pt(y0), q), set y1 = 1

2 (y1 + y0) and eval-

uate JSD(pt(y1, q));8. If ‖y1 − y0‖ < ε then Stop; else if ‖JSD(pt(y1), q)− JSD

(pt−1(y1), q)‖ > T , then set h = τh and go to Step 3; other-

wise set y0 = y1 and go to Step 3.

The last step has been introduced for making the algorithm robust

to blinks or complete occlusion of the iris for some consecutive

frames. The central location of the iris region is fixed while the el-

lipse containing the iris is enlarged till iris is detected again, as shown

in Fig. 2.



. Experimental results and discussions

The proposed iris tracking algorithm has been tested on sev-

ral video sequences. In particular, the performance of the algo-

ithm has been evaluated on the MICHE video database available

t http://biplab.unisa.it/MICHE/database/. It is composed of 113 se-

uences acquired by mobile devices, such as Samsung Galaxy S4. The

ubjects of those sequences are faces of different people, gender and


http://biplab.unisa.it/MICHE/database/




0 50 100 150 200 250 300 3500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

no. frame

JSD(p,q)

Fig. 3. Top: eight frames of a video sequence in MICHE database showing blinks and iris

movement from the center to the left, back to the center, then toward the right and again

back to the center. Bottom: the corresponding measured Jensen–Shannon divergence.

The red and yellow rectangles respectively indicate blink and iris movements. (For

interpretation of the references to color in this figure legend, the reader is referred to

the web version of this article.)

a

z

i

h

r

t

I

w

r

t

e

c

f

J

t

l

m

b

t

a

o

p

o

t

i

1

i

t

i

a

p

o

t

0 50 100 150 200 2500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

no. frame

JSD(p,q)

Fig. 4. Jensen–Shannon divergence computed during iris tracking in a video sequence

from MICHE database having a male subject. The sequence is characterized by blinks

and instable luminance.

e

n

o

u

o

b

t

t

t

m

m

s

i

i

c

s

s

f

l

f

t

s

d

w

t

E

p

s

w

v

i

o

p

m

c

p

v

q

b

ge, acquired in different illumination conditions, pose, orientation,

ooming and so on. Before presenting and discussing some results, it

s necessary to spend some words about parameters settings.

For the computation of the visual stimuli feature space S, 16 bins

ave been assigned to the luminance L that is normalized to half the

ange of gray levels (i.e., 128), whereas 4 bins have been assigned to

he temporal contrast C, that is normalized to a range of width 128.

t is not necessary to assign more bins to the temporal contrast, since

e are interested in emphasizing fast moving and highly contrasted

egions that are able to immediately catch human eye attention.

With regard to the scaling parameter h, it is initially fixed equal

o 1; at each iteration of the mean shift algorithm, iris location is

stimated using three different values of the parameter h. More pre-

isely, by denoting with h0 the current h value, the algorithm runs

or h = h0, h = 1.1h0 and h = 0.9h0 and the one that gives the smaller

SD between the features of the reference q and candidate p(y) iris is

he one giving the best iris candidate in the current frame. This al-

ows the algorithm to be robust to changes in scale and zooming. The

ultiplicative coefficients 1.1 and 0.9 are the same used in the kernel

ased object tracking algorithm [14]. The parameter ε can be set by

he user and the default value is set equal to 0.1, according to toler-

nce values assigned in some existing iris tracking algorithms based

n the mean shift algorithm. However, the default version of the pro-

osed algorithm does not use any value for ε since just one iteration

f the mean shift algorithm is used, as it will be discussed later. On

he contrary, the threshold T is used for detecting those frames where

ris is partially or total occluded. The value of T has been set equal to

0 JSD(p2(y0), q), i.e. the variation of the distances between targets

n successive frames considerably differs from the distance between

argets at the beginning of the video sequence. The selected threshold

s able to detect frames in which there is a significant occlusion of the

nalyzed target and then it does not affect local changes of iris ap-

earance in the video sequence due to different pose and orientation

f the subject.

In order to qualitatively assess to what extent the use of percep-

ual measures can give a substantial contribution to iris tracking, we



valuate the precision in target localization in the presence of lumi-

ance instability, image defocusing, movement of the target (more

r less close to the camera) and blinks. Fig. 3 shows some consec-

tive frames of a video sequence that contains blinks, a movement

f the iris first toward the right and then toward the left. As it can

e observed, the proposed tracker is robust to the blinks thanks to

he enlargement of the ellipse that is caused by the large variation of

he similarity metric in consecutive frames. Moreover, even though

here are not enough frames to reduce the ellipse and better reseg-

enting the iris, the tracker is able to track the iris in its left to right

ovement thanks to its sensitiveness to moving objects. Fig. 3 also

hows the plot of the Jensen–Shannon divergence computed during

ris tracking in the whole sequence. The red and yellow rectangles

ndicate respectively the frames containing the blink and the ones

ontaining iris movement.

Fig. 4 depicts the Jensen–Shannon divergence relative to a different

equence: the subject is a boy and the orientation is seascape. The

equence contains two blinks and the first one occurs in the first

rames of the sequence. The sequence is also characterized by a global

uminance instability. Even though the ellipse is enlarged in the first

rames of the sequence, the tracker is able to recapture the iris in

he successive frames and the Jensen–Shannon divergence remains

maller than 0.1. On the contrary Fig. 5 depicts the Jensen–Shannon

ivergence relative to a third sequence where the subject is a girl

hile the orientation is again seascape. The sequence contains more

han one blink and image defocusing, as emphasized in the figure.

ven in this case the proposed tracker is not influenced by these two

henomena thanks to the combination of the visual stimuli feature

pace, that accounts for both luminance and temporal contrast, as

ell as the adopted similarity metric, that is able to catch the main

isual similarities in iris appearance in subsequent frames. These two

ngredients also contribute to the reduced computational complexity

f the proposed method. In fact, it is worth stressing that all the

resented results have been achieved by selecting the solution of the

ean shift optimization procedure after just one iteration, that is very

lose to the actual solution.

With regard to the computational complexity of the proposed

rocedure, it is also worth stressing that the computation of the

isual stimuli feature space is itself inexpensive since it simply re-

uires sum and difference of consecutive frames. In fact, the num-

er of operations that are required for processing one frame of the





0 50 100 150 2000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

no. frame

JSD(p,q)


from MICHE database having a female subject. The groups of frames containing blinks

(red), image defocusing (yellow) and defocusing and camera movement (blue) are

indicated by colored rectangles. (For interpretation of the references to color in this

figure legend, the reader is referred to the web version of this article.)

0 10 20 30 40 50 60 70 80 900

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

no. frame

distance(p,q)

Fig. 6. Jensen–Shannon divergence (solid line) and Bhattacharyya distance (dotted

line) computed during iris tracking in a video sequence from MICHE database having

a male subject. Only one iteration of the mean shift algorithm has been performed in

both cases.

0 50 100 150 200 2500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

no. frame

JSD(p,q)


from MICHE database having a male subject using only one iteration (solid) and three

iterations (dotted) of the mean shift algorithm.

i

i

p

i

t

w

t

c

t

o

c

t

s

J

v

r

i

4

b

h

t

t

l

t

c

p

s

t

c

v

o

i

p

c

t

r

o

p

t

video sequence is n log23√

m + 2nh log23√

m + 2 3√

mn + 8n + 20nh +5 3√

mnh + 4 3√

m − 3, where just 4n operations are required by the

visual stimuli feature space while the main computational load, i.e.

(nh log2(m)+ 2mnh + 4nh), is required by the computation of the his-

togram of the feature space. It also happens in the original kernel-

based object tracking and its modification for iris tracking [24];

however in these algorithms three color components are considered

as target features, resulting in an increased computational load for

histograms ( 3√

m is m in these algorithms). Hence, the computational

load of the proposed tracker is comparable to one iteration of some

existing and well performing trackers, as [26,24], that are based on

iterative procedures.

Even though direct comparisons with existing trackers have not

been provided in this paper, the closest one to the proposed method

is the one presented in [24], which uses color space as iris feature and

the mean shift combined with the Bhattacharyya distance for track-



ng. As it can be observed in Fig. 6, if the Jensen–Shannon divergence

s substituted for the Bhattacharyya distance, tracking results are less

recise: the Bhattacharyya values are larger than the correspond-

ng values of the Jensen–Shannon divergence; in addition the Bhat-

acharyya distance tend to increase as the number of frames grows,

hile it is not so for the Jensen–Shannon divergence. It turns out that

he Jensen–Shannon divergence allows the mean-shift algorithm to

onverge faster than the Bhattacharyya distance. It immediately gives

he best location of the candidate target in subsequent frames with-

ut requiring additional iterations. To further stress this point, Fig. 7

ompares the Jensen–Shannon divergence evaluated during the iris

racking process by performing one and three iteration of the mean

hift algorithm. As it can be observed, the temporal evolution of the

SD is quite the same, confirming that the mean shift algorithm con-

erges faster whenever the JSD and a perceptual feature space able to

epresent the visual distinctive features of the target are embedded

n the mean shift optimization procedure.

. Conclusions

The paper presented a fast iris tracking algorithm that is mainly

ased on the simulation of the human visual system in looking at

uman faces. The main ingredients of the work are the considera-

ion of visual and instinctive features of irises in the definition of the

arget feature space as well as the use of a metric that well corre-

ates with the way human vision processes and compares informa-

ion. Presented results are satisfying and encouraging since they are

omparable to existing competing trackers that need much more ex-

ensive procedures for target localization. In fact, the embedding of

imple rules that regulate the perception mechanisms especially in

he early vision, such as independence of luminance and contrast in

orrespondence to fixation points, sensitiveness to motion, human

ision seen as an information transmission process, allows the use

f simple and inexpensive operations in the process of tracking. That

s why the proposed work does not focus on the optimal setting of

arameters or on the definition of more distinctive iris feature: in this

ontext iris has been considered as a region that attracts human at-

ention since highly contrasted, textured (more complicated than the

emaining parts of the eye) and moving object. Future research will be

riented to a better setting of the parameters involved in the tracking

rocedure again using some perceptual arguments. The main goal is

o make the whole procedure automatic, adaptive and unsupervised.





F

m

c

s

r

t

i

I

l

t

a

o

t

t

d

p

r

m

l

t

a

o

R

[

[

[

[

[

[[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

or instance, in the definition of the visual stimuli feature space a

ore complex and precise definition of the visual contrast could be

onsidered, for example embedding ad hoc measures that are more

uitable for textured regions. In addition, a motion estimator could

efine the computation of the temporal contrast as well as measures

hat take into account the sensitiveness to second order motion that

s strictly related to the sensitiveness to motion of textured patterns.

t is worth stressing that, even though luminance and contrast are re-

ated to simple image statistics, they represent significant measures

hat guide the vision process especially in the early vision. That is why

feature space that is based on these measures could be powerful not

nly in tracking but also in other image processing based applica-

ions. The refinement of the visual stimuli feature space could allow

he development of an automatic method also for iris detection. The

etection algorithm would be useful as input of the proposed tracking

rocedure but it would be also used for checking, and eventually cor-

ecting, the results provided by the tracking algorithm. In addition the

odification and refinement of the visual stimuli space (definition of

uminance and contrast) or the similarity metric (generalization of

he Jensen–Shannon divergence) make the proposed tracker flexible

nd adaptable to different kinds of applications as well as to tracking

f different objects.

eferences

[1] T. Arnow, A. Bovik, Foveated visual search for corners, IEEE Trans. Image Proc. 16

(3) (2007) 813–823.[2] I. Bacivarov, M. Ionita, P. Corcoran, Statistical models of appearance for eye track-

ing and eye blink detection and measurement, IEEE Trans. Consumer Electr. 54(3) (2008) 1312–1320.

[3] V. Bruni, D. Vitulano, Evaluation of degraded images using adaptive Jensen-

Shannon divergence, in: Proceedings of 8th International Symposium on Imageand Signal Processing and Analysis (ISPA 2013), IEEE, 2013 536–541.

[4] V. Bruni, E. Rossi, D. Vitulano, On the equivalence between Jensen-Shannon di-vergence and Michelson contrast, IEEE Trans. Inform. Theory 58 (7) (2012) 4278–

4288.[5] V. Bruni, E. Rossi, D. Vitulano, Perceptual object tracking, in: IEEE Workshop on

BIOMS, 2012 1–7.[6] V. Bruni, D. Vitulano, A perception-based interpretation of the kernel-based object

tracking, in: Lecture Notes in Computer Science, vol. 8192, Proceedings of ACIVS

2013, Springer, 2013, pp. 596–607.[7] V. Bruni, D. Vitulano, A generalized model for scratch detection, IEEE Trans. Image

Proc. 13 (1) (2004) 44–50.[8] V. Bruni, E. Rossi, D. Vitulano, Jensen-Shannon divergence for visual quality assess-

ment, in: Z. Wang, V. Bruni, D. Vitulano (Eds.), Signal Image and Video Processing,Special Issue on Human Vision and Information Theory, 2013.

[9] M. De Marsico, C. Galdib, M. Nappi, D. Riccio, FIRME: face and iris recognition for

mobile engagement, Image Vision Comput., 2014. Available online.10] M. De Marsico, M. Nappi, D. Riccio, Noisy iris recognition integrated scheme,

Pattern Recogn. Lett. 33 (8) (2012) 1006–1011 (Special issue on the Recognitionof visible wavelength iris images captured at-a-distance and on-the-move).

11] V. Bruni, G. Ramponi, D. Vitulano, Image quality assessment through a subsetof the image data, in: Proceedings of IEEE ISPA 2011, Dubrovnik, Croatia, 2011

414–419.

12] V. Cantoni, G. Galdi, M. Nappi, M. Porta, D. Riccio, GANT: gaze analysis techniquefor human identification, Pattern Recogn., 2014. Available online.

13] D. Comaniciu, V. Ramesh, P. Meer, Real-time tracking of nonrigid objects us-ing mean shift , in: Proceedings of IEEE Conference on CVPR, vol. 2, 2000,

pp. 142–149.14] D. Comaniciu, V. Ramesh, P. Meer, Kernel-based object tracking, IEEE Trans. PAMI

25 (2) (2003) 564–577.



15] T.M. Cover, J.A. Thomas, Elements of Information Theory, Wiley, New York, 1991.16] J. Lin, Divergence measures based on the Shannon entropy, IEEE Trans. Inform.

Theory 37 (1) (1991) 145–151.17] S. Dubunque, T. Coffman, P. McCarley, A.C. Bovik, C.W. Thomas, A comparison

of foveated acquisition and tracking performance relative to uniform resolutionapproaches, Proc. SPIE 7321 (2009).

18] R. Frazor, W. Geisler, Local luminance and contrast in natural images, Vision Res.46 (2006) 1585–1598.

19] R. Goutcher, G. Loffler, Motion transparency from opposing luminance modulated

and contrast modulated gratings, Vision Res. 49 (2009) 660–670.20] D.W. Hansen, A.E.C. Pece, Iris tracking with feature free contour, IEEE International

Workshop on Analysis and Modeling of Faces and Gestures (AMFG), 2003 208–214.

21] J.H. Hsiao, G.W. Cottrell, Two fixations suffice in face recognition, Psychol. Sci. 9(10) (2008) 998–1006.

22] I.S. Hontsch, L.J. Karam, Adaptive image coding with perceptual distortion control,

IEEE Trans. Image Proc. 11 (3) (2002) 213–222.23] C.V. Hutchinson, T. Ledgeway, Sensitivity to spatial and temporal modulations of

first-order and second-order motion, Vision Res. 46 (2006) 324–335.24] M.M. Ibrahim, J.J. Soraghan, L. Petropoulakis, Non rigid eye movement tracking

and eye state quantification, in: Proceedings of IEEE IWSSIP, 2012 280–283.25] L. Itti, C. Koch, E. Niebur, A model of saliency-based visual attention for rapid

scene analysis, IEEE Trans. PAMI 20 (1998) 1254–1259.

26] F. Martinez, A. Carbone, E. Pissaloux, Radial symmetry guided particle filter forrobust iris tracking, in: Proceedings of International Conference on Computer

Analysis of Images and Patterns (CAIP), Lecture Notes in Computer Science, vol.6855, 2011, pp. 531–539.

27] V. Monte, R. Frazor, V. Bonin, W. Geisler, M. Corandin, Independence of luminanceand contrast in natural scenes and in the early visual system, Nat. Neurosci. 8 (12)

(2005).

28] J. Najemnik, W.S. Geisler, Optimal eye movement strategies in visual search,Nature 434 (2005) 387–391.

29] A.A. Petrov, T.R. Hayes, Asymmetric transfer of perceptual learning of luminanceand contrast modulated motion, J. Vision 10 (14) (2010) 1–22.

30] R. Raj, W.S. Geisler, R.A. Frazor, A.C. Bovik, Contrast statistics for foveated visualsystems: fixation selection by minimizing contrast entropy, J. Opt. Soc. Am. A 22

(10) (2005) 2039–2049.

31] U. Rajashekar, A.C. Bovik, L.K. Cormack, Visual search in noise: revealing theinfluence of structural cues by gaze-contingent classification image analysis,

J. Vision 6 (2006) 379–386.32] T.O. Salmon, Fixational Eye Movement, VS III: Ocular Motility and Binocular

Vision, NE State University, 2001.33] H.R. Sheikh, A.C. Bovik, Image information and visual quality, IEEE Trans. Image

Proc. 15 (2) (2006) 430–444

34] E.P. Simoncelli, B.A. Olshausen, Natural image statistics and neural representation,Ann. Rev. Neurosci. 24 (2011) 1193–1216.

35] S. Sirohey, A. Rosenfeld, Z. Duric, A method of detecting and tracking irises andeyelids in video, Pattern Recogn. 35 (6) (2002) 1389–1401.

36] M. Spering, A. Montagnini, Do we track what we see? Common versus indepen-dent processing for motion perception and smooth pursuit eye movements: a

review, Vision Res. 51 (2011) 836–852.37] A. Tavassoli, I. van der Linde, A.C. Bovik, L.K. Cormack, An efficient technique for

revealing visual search strategies with classification images, Percept. Psychophys.

69 (1) (2007) 103–112.38] F. Topsoe, Some inequalities for information divergence and related measures of

discrimination, IEEE Trans. Inform. Theory 46 (4) (2000) 1602–1609.39] Z. Wang, L. Lu, Member, A.C. Bovik, Foveation scalable video coding with automatic

fixation selection, IEEE Trans. Image Proc. 12 (2) (2003) 243–254.40] A.B. Watson, DCTune: a technique for visual optimization of DCT quantization

matrices for individual images, Soc. Inf. Display Dig. Tech. Papers XXIV (1993)

946–949.41] S. Winkler, Digital Video Quality-Vision Models and Metrics, John Wiley and Sons,

2005.42] P. Li, T. Zhang, A.E.C. Pece, Visual contour tracking based on particle filters, Image

Vision Comput. 21 (1) (2003) 111–123.43] Z. Zhu, Q. Ji, K. Fujimura, K. Lee, Combining Kalman filtering and mean shift for

real time eye tracking under active IR illumination, in: Proceedings of IEEE ICPR,

2002, pp. 318–321.


http://refhub.elsevier.com/S0167-8655(14)00254-2/bib001












































a robust perception based method for iris tracking

Documents