a robust perception based method for iris tracking
TRANSCRIPT
ARTICLE IN PRESSJID: PATREC [m5G;October 16, 2014;16:37]
Pattern Recognition Letters 000 (2014) 1–7
Contents lists available at ScienceDirect
Pattern Recognition Letters
journal homepage: www.elsevier.com/locate/patrec
A robust perception based method for iris tracking ✩
Vittoria Bruni a,∗, Domenico Vitulano b
a Department of Scienze di Base e Applicate per l’Ingegneria, University of Rome La Sapienza, Via A. Scarpa 16, Rome 00161, Italyb Istituto per le Applicazioni del Calcolo M. Picone—C.N.R., Via dei Taurini 19, Rome 00185, Italy
a r t i c l e i n f o
Article history:
Received 8 May 2014
Available online xxx
Keywords:
Iris tracking
human visual system
Jensen–Shannon divergence
a b s t r a c t
The paper presents an application of the modified kernel based object tracking to iris tracking. Human
perception rules are used for defining a proper feature space for iris that mainly accounts for the fact that
eyes represent instinctive visual fixation regions. In addition, a similarity metric that is close to the way human
vision compares and perceives the difference between scene components has been employed for finding the
exact iris location in subsequent frames. As a result, just one iteration of the mean shift algorithm allows us
to get a faithful estimation of iris location in subsequent frames. This property makes the proposed algorithm
implementable on mobile devices and useful for real time applications. Experimental results performed on
MICHE database show that the proposed method is robust to changes in illumination or scale, iris partial or
total occlusion for some subsequent frames, blinks.
© 2014 Elsevier B.V. All rights reserved.
1
o
t
i
h
r
m
r
i
t
a
s
o
f
t
j
p
[
m
t
s
n
c
i
u
g
t
p
m
I
w
s
i
t
m
t
s
t
m
i
e
t
o
p
m
t
p
t
h
0
. Introduction
Iris detection and tracking represent two important tasks not
nly in biometric applications. Several recent studies and applica-
ions proved that iris movement is very informative and can be used
n different ways in several fields. It is not only a fingerprint for
uman being, but it also plays a significant role in psychology,
obotics, security, and especially in neurological studies. Eye move-
ent is actually a widely investigated topic in neurology and in vision
esearch since it represents the basis of perceptual learning and gives
nformation about the way the brain codes visual information, reacts
o visual stimuli, learns from what it sees, selects what is important in
visual scenario [31,1,12,37,28,32]. On the other hand, neurological
tudies gave new flour to image processing applications since they
ffered new instruments for processing and understanding visual in-
ormation. In fact, independently of its origin and the way it has been
ransmitted, image is received and processed by human brain which
udges its quality and learns its content. As a result, it makes sense to
rocess image information in a perceptual guided way. Saliency maps
25], that represent the image content in a hierarchical way from the
ost visible to the less visible region, are a representative example of
his new way of processing image. Following this concept, it has been
hown that image anomalies can be detected as those image compo-
ents that are visible at the first glance [7]. In fact, they immediately
apture human attention since they are perceived as foreign objects
✩ This paper has been recommended for acceptance by Nappi Michele.∗ Corresponding author: Tel.: +39 06 49766648; fax: +39 06 4957647.
E-mail address: [email protected], [email protected] (V. Bruni).
a
I
l
i
o
ttp://dx.doi.org/10.1016/j.patrec.2014.09.001
167-8655/© 2014 Elsevier B.V. All rights reserved.
Please cite this article as: V. Bruni, D. Vitulano, A robust perception b
http://dx.doi.org/10.1016/j.patrec.2014.09.001
n the scene. In the same way, image content can be coded in a non
niform way according to its visual importance [40,22,39]. More in
eneral, the use of the mechanisms that regulate human vision allows
o optimize a wide class of image processing based systems and ap-
lications, like compression, restoration, printing, watermarking, seg-
entation, displaying, and also object tracking [40,22,39,5,6,33,8,17].
n fact, in a recent paper the perceptual interpretation (version) of a
ell known tracker allowed to achieve satisfying and quite impres-
ive results [5,6]. Inspired by this work, in this paper we are interested
n studying to what extent the use of some features that are related
o the way iris is perceived by human eye can help in its tracking,
aking tracking robust and at the same time quite fast. Actual iris
racking algorithms are based on optimization procedures like mean-
hift, Kalman filtering, particle filtering or their combination in order
o compensate the limits of each technique [24,2,43,20,42,35]. The
ajor effort consists of making those trackers robust to changes in
llumination, eye closure, face orientation, distance from the cam-
ra, etc. It often means to define specific and distinctive features for
he iris as its geometrical appearance (circle or ellipse), the presence
f a bright pupil effect, the high contrast with respect to the white
art of the eye, the color appearance and so on. That is why some
odels look at the iris contours, characterize their profile and track
hem in subsequent frames. However, those methods, even though
erforming, can be computationally expensive since more distinctive
arget features, especially the ones related to the geometrical appear-
nce, can require additional and not negligible computational load.
n addition they might be not robust to occlusions or changes in il-
umination or scale. One of the most interesting neurological results
s that human eyes are attracted by very few points (regions) of the
bserved scene [18,27,30]. In addition, human eye reacts to a moving
ased method for iris tracking, Pattern Recognition Letters (2014),
2 V. Bruni, D. Vitulano / Pattern Recognition Letters 000 (2014) 1–7
ARTICLE IN PRESSJID: PATREC [m5G;October 16, 2014;16:37]
2
p
n
[
a
[
h
a
[
s
t
fi
c
o
b
k
a
S
t
i
t
s
s
s
S
w
t
g
m
W
μm
a
o
(t
c
t
o
f
h
t
t
s
o
p
s
t
t
i
f
d
t
t
u
a
s
S
w
a
stimulus which can be modeled as a combination of luminance mod-
ulated (first order motion) and contrast modulated (second order
motion) sinusoidal stimulus [36,23,19,29]. With regard to face pro-
cessing, eyes belong to these attentional points [21] and iris repre-
sents its prominent and highly contrasted component; moreover iris
is a moving and textured eye component. These concepts have been
confirmed also by recent studies in biometric [12] in which eye move-
ments (saccades) and fixations are deemed special features of indi-
viduals to be used for human identification. Hence, the main aim and
the novelty of the presented work is the use of perceptual concepts to
track the organ that is responsible for visual perception. To this aim,
mean shift algorithm is still used in the optimization process, but
some neurological results are exploited for defining a proper feature
space which takes into account the sensitiveness to a moving and tex-
tured stimuli as well as the measures that better characterize fixation
points, namely luminance and contrast. Finally, the Jensen–Shannon
divergence (JSD) is used as similarity metric due to its correlation
with the visual system when comparing different objects [4,8,3]. JSD
is able to recognize iris in subsequent frames even if iris appearance
changes due to partial eye closure, iris movement, change in illumina-
tion, distance from the camera. In fact, JSD is less sensitive to changes
in iris geometrical features thanks to its dependence on a more global
measure, like the distribution of the luminance and contrast in iris
region.
As a result, the focus of the paper, and its main difference with
the existing literature about the topic, is not the iris but the way it
is perceived by the observer. Human eye is sensitive to iris since it
belongs to a region of the face that is fixated by human eye at the first
glance and that is instinctively used by human eye in the recognition
process. The additional perceptual property of the iris is that it is a
moving and highly contrasted component of the scene and then it
attracts human attention more than other parts of the eye. As a re-
sult the use of specific characteristics of fixation points in terms of
luminance and temporal contrast as well as the use of a metric that
weights image information in a way that is consistent with the one
employed in the vision process, allow us to define a very simple and
almost inexpensive tracker that, on the one hand is able to compete
with existing trackers in terms of tracking precision, without exploit-
ing specific objective characteristics of the iris, and, on the other hand,
to outperform them in terms of required computational load, mak-
ing it suitable for real time applications and easily implementable on
mobile devices.
Experimental results show very promising tracking results. The
combination of the contrast/luminance based feature space and the
similarity metric allows mean shift to converge after one iteration
with a considerable reduction of the computing time, without intro-
ducing annoying drifting effects. In addition, the tracker is robust to
blink, changes in illumination, zoom, scale and contrast and it does
not require Kalman filtering or more specific morphological and/or
geometrical iris features to get satisfying performance.
The remainder of the paper is the following. Next section first gives
in depth motivations for the feature space and the similarity metric,
then describes the tracking algorithm. Section 3 presents some exper-
imental results and gives implementation details. Finally, Section 4
draws the conclusions and gives guidelines for future research.
2. Perception based feature space and similarity metric
In this section visual perception is used for defining a proper fea-
ture space for iris and for selecting a proper similarity metric to use
in iris tracking process. It is worth observing that the visual features
are defined just for tracking and not for iris detection. Iris detection
is a delicate topic and it is out of the scope of this paper. In the paper
we assume that iris has been detected in the first frame of the video
sequence using an existing iris detector [9,10] or it has been manually
indicated by the external user.
Please cite this article as: V. Bruni, D. Vitulano, A robust perception b
http://dx.doi.org/10.1016/j.patrec.2014.09.001
.1. Luminance and contrast based iris feature
Several neurological studies proved that during the observation
rocess few points attract human attention and those few points are
ecessary for understanding the scene content in the early vision
18,27,30]. Those fixation points depend on both the image subject
nd the luminance and contrast characteristics of the scene content
18,27,11]. With regard to face processing, it has been shown that
uman eye has a sort of preference for the regions containing eyes
nd nose, with particular and immediate importance for the left side
21]. It is due to the fact that the right part of the brain is the one
ensitive to low frequencies and actually the early vision corresponds
o a more global vision of the scene (a sort of adaptive low pass
ltering). Although the complexity of human vision, luminance and
ontrast remain the two main measurements that are involved in the
bservation process, especially in the early vision, and it is true for
oth static and moving scenes. The two interesting aspects of this
ind of vision process are that luminance and contrast seem to have
sort of independence in correspondence to fixation points [18,27].
pecifically, light adaptation (luminance gain) and contrast gain are
he two rapid mechanisms that control the gain of neural responses
n the early vision and they operate almost independently. It means
hat luminance and contrast can be considered as two independent
ources and the visual stimulus is the linear combination of these two
ources. More precisely, for each point (x, y) of the image I, the visual
timulus can be modeled as
(x, y) = 1
2L(x, y)+ 1
2C(x, y) (1)
here L(x, y) and C(x, y) represent the luminance and the visual con-
rast measures at the image point (x, y). The visual contrast is, in
eneral, measured as a sort of a normalized spatial variation of the lu-
inance [41]. More common definitions are the ones based on the
eber’s law, i.e. CW(x, y) = (σL(x, y))/(μL(x, y)) where σL(x, y) and
L(x, y) respectively are the image local standard deviation and the
ean of the luminance, which measures the visibility of an object with
uniform luminance with respect to a uniform background, or the
ne based on the definition of the Michelson contrast, i.e. CM(x, y) =max L(x, y)− min L(x, y))/(max L(x, y)+ min L(x, y)), which measures
he visibility of a sinusoidal stimuli. In this work, the Michelson visual
ontrast has been considered due to the textured nature of the iris and
he fact that there is a close relationship between the distributions
f Michelson contrasts in natural images and the contrast response
unctions of neurons [34].
However, the persistence of vision is another characteristic of
uman vision. It derives from the fact that neurons in the visual sys-
em summate information over both space and time. It turns out that
he receptive fields have spatio-temporal characteristics and not only
patial [36]. It means that the evolution of luminance and contrast
ver time in a fixed location cannot be neglected. In particular, it is
ossible to define a first order motion stimulus, that is related to the
patio-temporal variation of the luminance, and a second order mo-
ion stimulus which is related to the motion of a contrast modulated
exture [36,23,19,29]. Also in this case the two stimuli are almost
ndependent and the global visual stimulus is the sum of the two dif-
erent spatio-temporal stimuli. In general, the second order motion is
etected and modeled as the ratio between the temporal variation of
he luminance and its spatial variation. Based on these observations,
he feature space, that can be seen as the image of the perceived stim-
li, can be roughly modeled as the weighted sum of the luminance
nd contrast components whenever the contrast assumes a temporal
ignificance. In other words, the stimulus at time t can be written as
(x, y, t) = 1
2L(x, y, t)+ 1
2C(x, y, t) (2)
ith C(x, y, t) = (L(x, y, t)− L(x, y, t − 1))/(L(x, y, t)+ L(x, y, t − 1)). In
ddition, since the perception of scene content at the first glance and
ased method for iris tracking, Pattern Recognition Letters (2014),
V. Bruni, D. Vitulano / Pattern Recognition Letters 000 (2014) 1–7 3
ARTICLE IN PRESSJID: PATREC [m5G;October 16, 2014;16:37]
0 100 2000
2000
4000
0 200 4000
1000
2000
3000
0 100 2000
1000
2000
3000
Fig. 1. Top: three different portions of the grayscale version of a video sequence in
MICHE database. Middle: the corresponding visual stimuli images, defined as in Eq. (2).
Bottom: histograms of the visual stimuli images.
f
v
v
t
h
f
s
m
t
q
w
a
a
c
p
w
f
s
p
i
t
t
v
o
f
w
2
s
t
b
l
B
c
m
t
p
p
a
i
t
i
(
r
D
D
i
g
i
J
w
α
q
ααa
g
a
J
H
i
i
I
d
b
m
a
m
p
e
t
o
i
f
J
or fast moving objects does not imply the perception of the actual
alue of the luminance but a sort of local mean value, the quantized
ersions of L and C have been considered before their summation and
he histogram of the achieved visual stimuli image S in the iris region
as been defined as the iris feature to be detected in subsequent
rames.
By denoting with u the bin, m the number of bins, x = (x, y) the
patial location of the pixels in the iris region, b the function that
aps x into the corresponding bin in the quantized feature space, k
he kernel function, the histogram qu of the reference iris is
= c
n∑i=1
k(‖xi‖2)δ(b(xi)− u),
ith δ the delta function, n the number of pixels in the iris region
nd c a constant such that∑m
u=1 q = 1, that is c = 1/(∑n
i=1 k(‖xi‖2)),ssuming the target centered at the spatial location 0 with normalized
oordinates. Similarly, the candidate iris feature is defined as
(y) = ch
nh∑i=1
k
(∥∥∥∥y − xi
h
∥∥∥∥2)
δ(b(xi)− u), (3)
ith ch = 1/(∑nh
i=1k(‖ y−xi
h‖2)), y the center of the iris in the current
rame, nh is the number of iris pixels in the current frame and h a
caling parameter that sets the bandwidth accounting for the mor-
hological changes of the target in the sequence. As it can be observed
n Fig. 1, even though the quantized S reduces image information in
erms of density of luminance values, it is still able to well charac-
erize iris region since it embeds the main luminance and contrast
ariations. Iris feature in the middle frame is more complex than the
ne of the iris in the rightmost frame since it belongs to a group of
rames in which it moves very fast. It is not so for the rightmost frame,
hich belongs to a static scene.
.2. The similarity metric
In the previous section, the histogram of the iris region in the
timulus space S has been defined as the iris feature. It turns out the
racking procedure must rely on a similarity metric between proba-
ility distributions. Among the plethora of available distances in the
iterature, the classical mean shift algorithm for iris tracking uses the
hattacharyya distance for this purpose mainly due to its theoreti-
al relationships with the Bayes error and Fisher measure of infor-
ation [20,14]. However, in a perceptual scenario also the metric
Please cite this article as: V. Bruni, D. Vitulano, A robust perception b
http://dx.doi.org/10.1016/j.patrec.2014.09.001
hat measures how close are two objects must rely on some visual
erception concepts. The Jensen–Shannon divergence has both these
roperties: it measures the distance between two density functions
nd it well correlates with human visual system. In addition, it has
nformation theoretic foundations that allows it to well correlate with
he vision system whenever the vision process is seen as an encod-
ng/decoding system in which human eye represents the receiver
decoder) [4,8,3].
If X and Y are two random variables with distributions p and q, the
elative entropy or Kullback–Leibler divergence [15] is
KL(p||q) =∫ −∞
−∞p(x) log
(p(x)
q(x)
)dx. (4)
KL ∈ [0,+∞), it is not symmetric and it does not satisfy the triangular
nequality. One of its symmetric version is the Jensen–Shannon diver-
ence (JSD) [16]. The Jensen–Shannon divergence of N pdfs p1, . . . , pN
s defined as convex sum of DKL computed over suitable pdfs, i.e.
SD(p1, . . . , pkN) =N∑
j=1
αjDKL(pj||r) (5)
here r = ∑Nj=1 αjpj is a mixture of the N pdfs p1, . . . , pN with weights
1, . . . , αN such that∑N
j=1 αj = 1.
In tracking process N = 2 and then p1 = p and p2 = q, where p and
respectively are the pdfs of the reference and candidate target. r =p + (1 − α)q is the mixture of p and q and depends on the parameter
, that measures the probability that the random variable having r
s pdf is equal to the variable having p as pdf. By setting α = 1/2, we
et the classical definition of the Jensen–Shannon divergence, that
ssumes the same probability for p and q. It is worth observing that
SD can be also rewritten as JSD(p, q) = H(m)− 12 H(p)− 1
2 H(q), where
(∗) denotes the Shannon entropy of (∗).The use of the Jensen–Shannon divergence in the tracking process
s equivalent to model the perception of a modification of the target
n successive frames as a problem of information transmission rate.
n fact, the perceived difference between the two objects not only
epends on changes in luminance, shape and content of the scene,
ut it also depends on the way human vision reacts to the different
odifications of the target. Hence, human vision works as a sort of
dditional noise in the information transmission process and the JSD
easures the capacity of a noisy information channel with two inputs,
and q, giving the output r [16,38].
Under the hypothesis that the target location does not consid-
rably change from one frame to the successive one, if y0 denotes
he location of the center of the iris in the previous frame, the first
rder Taylor expansion of JSD around the point p(y0)can well approx-
mated by JSD(p(y), q) evaluated at a generic point y in the current
rame. More precisely,
SD(p(y), q) ≈ JSD(p(y0), q)+ ∂
∂pJSD(p(y), q)‖y=y0
(p(y)− p(y0))
= 1
2DKL
(q,
p(y0)+ q
2
)−
∑u
(log
p(y0)+ q
2p(y0)
)p(y)
2
= 1
2DKL
(q,
p(y0)+ q
2
)+ 1
2log 2
− 1
2
∑u
p(y)log
(1 + q
p(y0)
)
≤ 1
2DKL
(q,
p(y0)+ q
2
)+ 1
2log 2
− 1
2
∑u
p(y)
(p(y0)q
p(y0)+ q
),
ased method for iris tracking, Pattern Recognition Letters (2014),
4 V. Bruni, D. Vitulano / Pattern Recognition Letters 000 (2014) 1–7
ARTICLE IN PRESSJID: PATREC [m5G;October 16, 2014;16:37]
0 10 20 30 40 500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
no. frame
JSD(p,q)
Fig. 2. Top: three different frames of a video sequence in MICHE database. The frames
are in correspondence to an eye blink. Iris region is indicated by the black ellipse.
Bottom: Jensen–Shannon divergence between the candidate and reference iris in cor-
respondence to the frames containing the blink.
3
e
r
a
q
s
where the inequality log(1 + x) ≥ xx+1 , ∀ x ≥ 0 has been used. Using
Eq. (3) it becomes
JSD(p(y), q) ≤ 1
2DKL
(q,
p(y0)+ q
2
)
+ 1
2log 2 − 1
2
∑u
(p(y0)q
p(y0)+ q
)
· Ch
nh∑i=1
k
(‖y − xi
h‖2
)δ(b(xi)− u) (6)
where p and q respectively are iris histograms in the current and
reference frames of the analyzed sequence. Hence, the point y in
the current frame that minimizes JSD(p(y), q) is the one that better
detects the iris in the same frame. Since only the last term of the
previous equation depends on y, the minimum value of JSD(p(y), q)with respect to y is the one that realizes the maximum of the last term
of the previous equation. This term represents the density estimate
computed with the kernel profile at y in the current frame where the
data have been weighted by the following weights wi
wi =∑
u
p(y0)q
p(y0)+ qδ(b(xi)− u).
Hence, the mean shift algorithm can be used for iteratively finding
the point y1 that realizes the maximum value of the rightmost term
of Eq. (6). Following the mean shift optimization procedure [14,13], at
each iteration y1 = (∑nh
i=1wixig(
y0−xih
))/(∑nh
i=1wig(
y0−xih
)), where g(∗)is the first derivative of the kernel function k. It is worth stressing that
in the approximation of the JSD an upper bound of the metric has been
employed. This allows us to get a simpler and less computationally
expensive form for the weights wi to use in the mean shift algorithm.
2.3. The algorithm
The proposed iris tracking algorithm is a modification of the
kernel-based object tracking [14] and consists of the following steps:
1. Select iris region at the first frame (time t = 1) by defining an ellipse
containing the target;
2. Compute the weighted histogram of the stimuli space S (iris
feature)
q = c
n∑i=1
k(‖xi‖2)δ(b(xi)− u)
of the reference iris in the reference frame;
3. Compute the histogram pt(y0) of the candidate iris at the current
time t and evaluate the quantity JSD(pt(y0), q);4. Set the weights wi = ∑
upt(y0)q
pt(y0)+q δ(b(xi)− u);
5. Find the candidate iris central location y1 = (∑n
i=1 xiwig(y0−xi
h))/
(∑n
i=1 wig(y0−xi
h)) via the mean shift algorithm [14] using a linear
kernel function k;
6. Compute the new histogram pt(y1) of the candidate iris and eval-
uate JSD[pt(y1, q)];7. While JSD(pt(y1), q) > JSD[pt(y0), q), set y1 = 1
2 (y1 + y0) and eval-
uate JSD(pt(y1, q));8. If ‖y1 − y0‖ < ε then Stop; else if ‖JSD(pt(y1), q)− JSD
(pt−1(y1), q)‖ > T , then set h = τh and go to Step 3; other-
wise set y0 = y1 and go to Step 3.
The last step has been introduced for making the algorithm robust
to blinks or complete occlusion of the iris for some consecutive
frames. The central location of the iris region is fixed while the el-
lipse containing the iris is enlarged till iris is detected again, as shown
in Fig. 2.
Please cite this article as: V. Bruni, D. Vitulano, A robust perception b
http://dx.doi.org/10.1016/j.patrec.2014.09.001
. Experimental results and discussions
The proposed iris tracking algorithm has been tested on sev-
ral video sequences. In particular, the performance of the algo-
ithm has been evaluated on the MICHE video database available
t http://biplab.unisa.it/MICHE/database/. It is composed of 113 se-
uences acquired by mobile devices, such as Samsung Galaxy S4. The
ubjects of those sequences are faces of different people, gender and
ased method for iris tracking, Pattern Recognition Letters (2014),
V. Bruni, D. Vitulano / Pattern Recognition Letters 000 (2014) 1–7 5
ARTICLE IN PRESSJID: PATREC [m5G;October 16, 2014;16:37]
0 50 100 150 200 250 300 3500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
no. frame
JSD(p,q)
Fig. 3. Top: eight frames of a video sequence in MICHE database showing blinks and iris
movement from the center to the left, back to the center, then toward the right and again
back to the center. Bottom: the corresponding measured Jensen–Shannon divergence.
The red and yellow rectangles respectively indicate blink and iris movements. (For
interpretation of the references to color in this figure legend, the reader is referred to
the web version of this article.)
a
z
i
h
r
t
I
w
r
t
e
c
f
J
t
l
m
b
t
a
o
p
o
t
i
1
i
t
i
a
p
o
t
0 50 100 150 200 2500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
no. frame
JSD(p,q)
Fig. 4. Jensen–Shannon divergence computed during iris tracking in a video sequence
from MICHE database having a male subject. The sequence is characterized by blinks
and instable luminance.
e
n
o
u
o
b
t
t
t
m
m
s
i
i
c
s
s
f
l
f
t
s
d
w
t
E
p
s
w
v
i
o
p
m
c
p
v
q
b
ge, acquired in different illumination conditions, pose, orientation,
ooming and so on. Before presenting and discussing some results, it
s necessary to spend some words about parameters settings.
For the computation of the visual stimuli feature space S, 16 bins
ave been assigned to the luminance L that is normalized to half the
ange of gray levels (i.e., 128), whereas 4 bins have been assigned to
he temporal contrast C, that is normalized to a range of width 128.
t is not necessary to assign more bins to the temporal contrast, since
e are interested in emphasizing fast moving and highly contrasted
egions that are able to immediately catch human eye attention.
With regard to the scaling parameter h, it is initially fixed equal
o 1; at each iteration of the mean shift algorithm, iris location is
stimated using three different values of the parameter h. More pre-
isely, by denoting with h0 the current h value, the algorithm runs
or h = h0, h = 1.1h0 and h = 0.9h0 and the one that gives the smaller
SD between the features of the reference q and candidate p(y) iris is
he one giving the best iris candidate in the current frame. This al-
ows the algorithm to be robust to changes in scale and zooming. The
ultiplicative coefficients 1.1 and 0.9 are the same used in the kernel
ased object tracking algorithm [14]. The parameter ε can be set by
he user and the default value is set equal to 0.1, according to toler-
nce values assigned in some existing iris tracking algorithms based
n the mean shift algorithm. However, the default version of the pro-
osed algorithm does not use any value for ε since just one iteration
f the mean shift algorithm is used, as it will be discussed later. On
he contrary, the threshold T is used for detecting those frames where
ris is partially or total occluded. The value of T has been set equal to
0 JSD(p2(y0), q), i.e. the variation of the distances between targets
n successive frames considerably differs from the distance between
argets at the beginning of the video sequence. The selected threshold
s able to detect frames in which there is a significant occlusion of the
nalyzed target and then it does not affect local changes of iris ap-
earance in the video sequence due to different pose and orientation
f the subject.
In order to qualitatively assess to what extent the use of percep-
ual measures can give a substantial contribution to iris tracking, we
Please cite this article as: V. Bruni, D. Vitulano, A robust perception b
http://dx.doi.org/10.1016/j.patrec.2014.09.001
valuate the precision in target localization in the presence of lumi-
ance instability, image defocusing, movement of the target (more
r less close to the camera) and blinks. Fig. 3 shows some consec-
tive frames of a video sequence that contains blinks, a movement
f the iris first toward the right and then toward the left. As it can
e observed, the proposed tracker is robust to the blinks thanks to
he enlargement of the ellipse that is caused by the large variation of
he similarity metric in consecutive frames. Moreover, even though
here are not enough frames to reduce the ellipse and better reseg-
enting the iris, the tracker is able to track the iris in its left to right
ovement thanks to its sensitiveness to moving objects. Fig. 3 also
hows the plot of the Jensen–Shannon divergence computed during
ris tracking in the whole sequence. The red and yellow rectangles
ndicate respectively the frames containing the blink and the ones
ontaining iris movement.
Fig. 4 depicts the Jensen–Shannon divergence relative to a different
equence: the subject is a boy and the orientation is seascape. The
equence contains two blinks and the first one occurs in the first
rames of the sequence. The sequence is also characterized by a global
uminance instability. Even though the ellipse is enlarged in the first
rames of the sequence, the tracker is able to recapture the iris in
he successive frames and the Jensen–Shannon divergence remains
maller than 0.1. On the contrary Fig. 5 depicts the Jensen–Shannon
ivergence relative to a third sequence where the subject is a girl
hile the orientation is again seascape. The sequence contains more
han one blink and image defocusing, as emphasized in the figure.
ven in this case the proposed tracker is not influenced by these two
henomena thanks to the combination of the visual stimuli feature
pace, that accounts for both luminance and temporal contrast, as
ell as the adopted similarity metric, that is able to catch the main
isual similarities in iris appearance in subsequent frames. These two
ngredients also contribute to the reduced computational complexity
f the proposed method. In fact, it is worth stressing that all the
resented results have been achieved by selecting the solution of the
ean shift optimization procedure after just one iteration, that is very
lose to the actual solution.
With regard to the computational complexity of the proposed
rocedure, it is also worth stressing that the computation of the
isual stimuli feature space is itself inexpensive since it simply re-
uires sum and difference of consecutive frames. In fact, the num-
er of operations that are required for processing one frame of the
ased method for iris tracking, Pattern Recognition Letters (2014),
6 V. Bruni, D. Vitulano / Pattern Recognition Letters 000 (2014) 1–7
ARTICLE IN PRESSJID: PATREC [m5G;October 16, 2014;16:37]
0 50 100 150 2000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
no. frame
JSD(p,q)
Fig. 5. Jensen–Shannon divergence computed during iris tracking in a video sequence
from MICHE database having a female subject. The groups of frames containing blinks
(red), image defocusing (yellow) and defocusing and camera movement (blue) are
indicated by colored rectangles. (For interpretation of the references to color in this
figure legend, the reader is referred to the web version of this article.)
0 10 20 30 40 50 60 70 80 900
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
no. frame
distance(p,q)
Fig. 6. Jensen–Shannon divergence (solid line) and Bhattacharyya distance (dotted
line) computed during iris tracking in a video sequence from MICHE database having
a male subject. Only one iteration of the mean shift algorithm has been performed in
both cases.
0 50 100 150 200 2500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
no. frame
JSD(p,q)
Fig. 7. Jensen–Shannon divergence computed during iris tracking in a video sequence
from MICHE database having a male subject using only one iteration (solid) and three
iterations (dotted) of the mean shift algorithm.
i
i
p
i
t
w
t
c
t
o
c
t
s
J
v
r
i
4
b
h
t
t
l
t
c
p
s
t
c
v
o
i
p
c
t
r
o
p
t
video sequence is n log23√
m + 2nh log23√
m + 2 3√
mn + 8n + 20nh +5 3√
mnh + 4 3√
m − 3, where just 4n operations are required by the
visual stimuli feature space while the main computational load, i.e.
(nh log2(m)+ 2mnh + 4nh), is required by the computation of the his-
togram of the feature space. It also happens in the original kernel-
based object tracking and its modification for iris tracking [24];
however in these algorithms three color components are considered
as target features, resulting in an increased computational load for
histograms ( 3√
m is m in these algorithms). Hence, the computational
load of the proposed tracker is comparable to one iteration of some
existing and well performing trackers, as [26,24], that are based on
iterative procedures.
Even though direct comparisons with existing trackers have not
been provided in this paper, the closest one to the proposed method
is the one presented in [24], which uses color space as iris feature and
the mean shift combined with the Bhattacharyya distance for track-
Please cite this article as: V. Bruni, D. Vitulano, A robust perception b
http://dx.doi.org/10.1016/j.patrec.2014.09.001
ng. As it can be observed in Fig. 6, if the Jensen–Shannon divergence
s substituted for the Bhattacharyya distance, tracking results are less
recise: the Bhattacharyya values are larger than the correspond-
ng values of the Jensen–Shannon divergence; in addition the Bhat-
acharyya distance tend to increase as the number of frames grows,
hile it is not so for the Jensen–Shannon divergence. It turns out that
he Jensen–Shannon divergence allows the mean-shift algorithm to
onverge faster than the Bhattacharyya distance. It immediately gives
he best location of the candidate target in subsequent frames with-
ut requiring additional iterations. To further stress this point, Fig. 7
ompares the Jensen–Shannon divergence evaluated during the iris
racking process by performing one and three iteration of the mean
hift algorithm. As it can be observed, the temporal evolution of the
SD is quite the same, confirming that the mean shift algorithm con-
erges faster whenever the JSD and a perceptual feature space able to
epresent the visual distinctive features of the target are embedded
n the mean shift optimization procedure.
. Conclusions
The paper presented a fast iris tracking algorithm that is mainly
ased on the simulation of the human visual system in looking at
uman faces. The main ingredients of the work are the considera-
ion of visual and instinctive features of irises in the definition of the
arget feature space as well as the use of a metric that well corre-
ates with the way human vision processes and compares informa-
ion. Presented results are satisfying and encouraging since they are
omparable to existing competing trackers that need much more ex-
ensive procedures for target localization. In fact, the embedding of
imple rules that regulate the perception mechanisms especially in
he early vision, such as independence of luminance and contrast in
orrespondence to fixation points, sensitiveness to motion, human
ision seen as an information transmission process, allows the use
f simple and inexpensive operations in the process of tracking. That
s why the proposed work does not focus on the optimal setting of
arameters or on the definition of more distinctive iris feature: in this
ontext iris has been considered as a region that attracts human at-
ention since highly contrasted, textured (more complicated than the
emaining parts of the eye) and moving object. Future research will be
riented to a better setting of the parameters involved in the tracking
rocedure again using some perceptual arguments. The main goal is
o make the whole procedure automatic, adaptive and unsupervised.
ased method for iris tracking, Pattern Recognition Letters (2014),
V. Bruni, D. Vitulano / Pattern Recognition Letters 000 (2014) 1–7 7
ARTICLE IN PRESSJID: PATREC [m5G;October 16, 2014;16:37]
F
m
c
s
r
t
i
I
l
t
a
o
t
t
d
p
r
m
l
t
a
o
R
[
[
[
[
[
[[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
or instance, in the definition of the visual stimuli feature space a
ore complex and precise definition of the visual contrast could be
onsidered, for example embedding ad hoc measures that are more
uitable for textured regions. In addition, a motion estimator could
efine the computation of the temporal contrast as well as measures
hat take into account the sensitiveness to second order motion that
s strictly related to the sensitiveness to motion of textured patterns.
t is worth stressing that, even though luminance and contrast are re-
ated to simple image statistics, they represent significant measures
hat guide the vision process especially in the early vision. That is why
feature space that is based on these measures could be powerful not
nly in tracking but also in other image processing based applica-
ions. The refinement of the visual stimuli feature space could allow
he development of an automatic method also for iris detection. The
etection algorithm would be useful as input of the proposed tracking
rocedure but it would be also used for checking, and eventually cor-
ecting, the results provided by the tracking algorithm. In addition the
odification and refinement of the visual stimuli space (definition of
uminance and contrast) or the similarity metric (generalization of
he Jensen–Shannon divergence) make the proposed tracker flexible
nd adaptable to different kinds of applications as well as to tracking
f different objects.
eferences
[1] T. Arnow, A. Bovik, Foveated visual search for corners, IEEE Trans. Image Proc. 16
(3) (2007) 813–823.[2] I. Bacivarov, M. Ionita, P. Corcoran, Statistical models of appearance for eye track-
ing and eye blink detection and measurement, IEEE Trans. Consumer Electr. 54(3) (2008) 1312–1320.
[3] V. Bruni, D. Vitulano, Evaluation of degraded images using adaptive Jensen-
Shannon divergence, in: Proceedings of 8th International Symposium on Imageand Signal Processing and Analysis (ISPA 2013), IEEE, 2013 536–541.
[4] V. Bruni, E. Rossi, D. Vitulano, On the equivalence between Jensen-Shannon di-vergence and Michelson contrast, IEEE Trans. Inform. Theory 58 (7) (2012) 4278–
4288.[5] V. Bruni, E. Rossi, D. Vitulano, Perceptual object tracking, in: IEEE Workshop on
BIOMS, 2012 1–7.[6] V. Bruni, D. Vitulano, A perception-based interpretation of the kernel-based object
tracking, in: Lecture Notes in Computer Science, vol. 8192, Proceedings of ACIVS
2013, Springer, 2013, pp. 596–607.[7] V. Bruni, D. Vitulano, A generalized model for scratch detection, IEEE Trans. Image
Proc. 13 (1) (2004) 44–50.[8] V. Bruni, E. Rossi, D. Vitulano, Jensen-Shannon divergence for visual quality assess-
ment, in: Z. Wang, V. Bruni, D. Vitulano (Eds.), Signal Image and Video Processing,Special Issue on Human Vision and Information Theory, 2013.
[9] M. De Marsico, C. Galdib, M. Nappi, D. Riccio, FIRME: face and iris recognition for
mobile engagement, Image Vision Comput., 2014. Available online.10] M. De Marsico, M. Nappi, D. Riccio, Noisy iris recognition integrated scheme,
Pattern Recogn. Lett. 33 (8) (2012) 1006–1011 (Special issue on the Recognitionof visible wavelength iris images captured at-a-distance and on-the-move).
11] V. Bruni, G. Ramponi, D. Vitulano, Image quality assessment through a subsetof the image data, in: Proceedings of IEEE ISPA 2011, Dubrovnik, Croatia, 2011
414–419.
12] V. Cantoni, G. Galdi, M. Nappi, M. Porta, D. Riccio, GANT: gaze analysis techniquefor human identification, Pattern Recogn., 2014. Available online.
13] D. Comaniciu, V. Ramesh, P. Meer, Real-time tracking of nonrigid objects us-ing mean shift , in: Proceedings of IEEE Conference on CVPR, vol. 2, 2000,
pp. 142–149.14] D. Comaniciu, V. Ramesh, P. Meer, Kernel-based object tracking, IEEE Trans. PAMI
25 (2) (2003) 564–577.
Please cite this article as: V. Bruni, D. Vitulano, A robust perception b
http://dx.doi.org/10.1016/j.patrec.2014.09.001
15] T.M. Cover, J.A. Thomas, Elements of Information Theory, Wiley, New York, 1991.16] J. Lin, Divergence measures based on the Shannon entropy, IEEE Trans. Inform.
Theory 37 (1) (1991) 145–151.17] S. Dubunque, T. Coffman, P. McCarley, A.C. Bovik, C.W. Thomas, A comparison
of foveated acquisition and tracking performance relative to uniform resolutionapproaches, Proc. SPIE 7321 (2009).
18] R. Frazor, W. Geisler, Local luminance and contrast in natural images, Vision Res.46 (2006) 1585–1598.
19] R. Goutcher, G. Loffler, Motion transparency from opposing luminance modulated
and contrast modulated gratings, Vision Res. 49 (2009) 660–670.20] D.W. Hansen, A.E.C. Pece, Iris tracking with feature free contour, IEEE International
Workshop on Analysis and Modeling of Faces and Gestures (AMFG), 2003 208–214.
21] J.H. Hsiao, G.W. Cottrell, Two fixations suffice in face recognition, Psychol. Sci. 9(10) (2008) 998–1006.
22] I.S. Hontsch, L.J. Karam, Adaptive image coding with perceptual distortion control,
IEEE Trans. Image Proc. 11 (3) (2002) 213–222.23] C.V. Hutchinson, T. Ledgeway, Sensitivity to spatial and temporal modulations of
first-order and second-order motion, Vision Res. 46 (2006) 324–335.24] M.M. Ibrahim, J.J. Soraghan, L. Petropoulakis, Non rigid eye movement tracking
and eye state quantification, in: Proceedings of IEEE IWSSIP, 2012 280–283.25] L. Itti, C. Koch, E. Niebur, A model of saliency-based visual attention for rapid
scene analysis, IEEE Trans. PAMI 20 (1998) 1254–1259.
26] F. Martinez, A. Carbone, E. Pissaloux, Radial symmetry guided particle filter forrobust iris tracking, in: Proceedings of International Conference on Computer
Analysis of Images and Patterns (CAIP), Lecture Notes in Computer Science, vol.6855, 2011, pp. 531–539.
27] V. Monte, R. Frazor, V. Bonin, W. Geisler, M. Corandin, Independence of luminanceand contrast in natural scenes and in the early visual system, Nat. Neurosci. 8 (12)
(2005).
28] J. Najemnik, W.S. Geisler, Optimal eye movement strategies in visual search,Nature 434 (2005) 387–391.
29] A.A. Petrov, T.R. Hayes, Asymmetric transfer of perceptual learning of luminanceand contrast modulated motion, J. Vision 10 (14) (2010) 1–22.
30] R. Raj, W.S. Geisler, R.A. Frazor, A.C. Bovik, Contrast statistics for foveated visualsystems: fixation selection by minimizing contrast entropy, J. Opt. Soc. Am. A 22
(10) (2005) 2039–2049.
31] U. Rajashekar, A.C. Bovik, L.K. Cormack, Visual search in noise: revealing theinfluence of structural cues by gaze-contingent classification image analysis,
J. Vision 6 (2006) 379–386.32] T.O. Salmon, Fixational Eye Movement, VS III: Ocular Motility and Binocular
Vision, NE State University, 2001.33] H.R. Sheikh, A.C. Bovik, Image information and visual quality, IEEE Trans. Image
Proc. 15 (2) (2006) 430–444
34] E.P. Simoncelli, B.A. Olshausen, Natural image statistics and neural representation,Ann. Rev. Neurosci. 24 (2011) 1193–1216.
35] S. Sirohey, A. Rosenfeld, Z. Duric, A method of detecting and tracking irises andeyelids in video, Pattern Recogn. 35 (6) (2002) 1389–1401.
36] M. Spering, A. Montagnini, Do we track what we see? Common versus indepen-dent processing for motion perception and smooth pursuit eye movements: a
review, Vision Res. 51 (2011) 836–852.37] A. Tavassoli, I. van der Linde, A.C. Bovik, L.K. Cormack, An efficient technique for
revealing visual search strategies with classification images, Percept. Psychophys.
69 (1) (2007) 103–112.38] F. Topsoe, Some inequalities for information divergence and related measures of
discrimination, IEEE Trans. Inform. Theory 46 (4) (2000) 1602–1609.39] Z. Wang, L. Lu, Member, A.C. Bovik, Foveation scalable video coding with automatic
fixation selection, IEEE Trans. Image Proc. 12 (2) (2003) 243–254.40] A.B. Watson, DCTune: a technique for visual optimization of DCT quantization
matrices for individual images, Soc. Inf. Display Dig. Tech. Papers XXIV (1993)
946–949.41] S. Winkler, Digital Video Quality-Vision Models and Metrics, John Wiley and Sons,
2005.42] P. Li, T. Zhang, A.E.C. Pece, Visual contour tracking based on particle filters, Image
Vision Comput. 21 (1) (2003) 111–123.43] Z. Zhu, Q. Ji, K. Fujimura, K. Lee, Combining Kalman filtering and mean shift for
real time eye tracking under active IR illumination, in: Proceedings of IEEE ICPR,
2002, pp. 318–321.
ased method for iris tracking, Pattern Recognition Letters (2014),