photometric normalization for face recognition using local discrete cosine transform
TRANSCRIPT
PHOTOMETRIC NORMALIZATION FOR FACE
RECOGNITION USING LOCAL DISCRETE
COSINE TRANSFORM
HEYDI MENDEZ-V �AZQUEZ
Pattern Recognition Department
Advanced Technologies Application Center7a # 21812 b/218 and 222, Siboney, Playa, P.C. 12200, Havana, Cuba
JOSEF KITTLER* and CHI HO CHAN†
Center for Vision, Speech and Signal Processing
University of Surrey, Guildford, Surrey, GU2 7XH, UK*[email protected]
EDEL GARCÍA-REYES
Pattern Recognition DepartmentAdvanced Technologies Application Center
7a # 21812 b/218 and 222, Siboney, Playa, P.C. 12200, Havana, Cuba
Received 30 June 2011
Accepted 10 August 2012Published 29 May 2013
Variations in illumination is one of major limiting factors of face recognition system perfor-
mance. The e®ect of changes in the incident light on face images is analyzed, as well as itsin°uence on the low frequency components of the image. Starting from this analysis, a new
photometric normalization method for illumination invariant face recognition is presented.
Low-frequency Discrete Cosine Transform coe±cients in the logarithmic domain are used in a
local way to reconstruct a slowly varying component of the face image which is caused byillumination. After smoothing, this component is subtracted from the original logarithmic image
to compensate for illumination variations. Compared to other preprocessing algorithms, our
method achieved a very good performance with a total error rate very similar to that produced
by the best performing state-of-the-art algorithm. An in-depth analysis of the two preprocessingmethods revealed notable di®erences in their behavior, which is exploited in a multiple classi¯er
fusion framework to achieve further performance improvement. The superiority of the proposal
is demonstrated in both face veri¯cation and identi¯cation experiments.
Keywords : Face recognition; illumination variations; photometric normalization; local discrete
cosine transform.
International Journal of Pattern Recognitionand Arti¯cial Intelligence
Vol. 27, No. 3 (2013) 1360005 (27 pages)
#.c World Scienti¯c Publishing Company
DOI: 10.1142/S0218001413600057
1360005-1
Int.
J. P
att.
Rec
ogn.
Art
if. I
ntel
l. 20
13.2
7. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by U
NIV
ER
SIT
Y O
F O
TA
GO
on
09/3
0/13
. For
per
sona
l use
onl
y.
1. Introduction
Face recognition is one of the most popular biometric techniques. Although a great
number of algorithms have been developed, face recognition is still an open and very
challenging problem, especially in applications where the imaging conditions are
changing. In di®erent face recognition studies it has been shown that variation in
lighting is one of the major limiting factors of face recognition system performance.22
To cope with the problem of face recognition under illumination variation, several
methods have been proposed. Conceptually, they can be grouped into ¯ve main
categories: preprocessing, invariant feature extraction, face image acquisition
modeling, illumination variation learning, and postprocessing. Preprocessing meth-
ods normalize the input face image, aiming to obtain a stable representation of the
face under di®erent lighting conditions. The second approach attempts to extract
facial features invariant to illumination. The third one simultaneously constructs a
3D face model and a lighting model that give rise to the observed image. Once the 3D
face model is estimated, the illumination conditions can be normalized by re-lighting
the model with a canonical illumination source. The fourth alternative involves
collecting a large database of training face images which are representative of a vast
range of illumination conditions. Using such a database, generative or discriminative
models with the capacity to represent face images under all possible imaging con-
ditions can be learnt. Last, but not least, the e®ect of illumination variation can be
minimized by postprocessing techniques which normalize the impact of illumination
changes on the similarity score computed for the query and the target gallery image
pair with that observed for a reference nontarget pairs.
Each of these ¯ve approaches has di®erent merits and may not be appropriate for
some application scenarios. For instance, it may be di±cult to collect su±ciently
large database of images that would be representative of all the possible imaging
conditions. For some veri¯cation applications it may be inconvenient to employ
postprocessing techniques, as a cohort of nontarget gallery images may not be
available. There are also di®erences in the computational complexities of the
potential solutions. By the same token, these techniques are not mutually exclusive
and it may be bene¯cial to combine a number of these disparate measures to
maximize the biometric system resilience to variations in lighting.
The above classi¯cation ignores the near infrared imaging approach, which
obviates the e®ect of illumination variation by transposing the face recognition
problem outside the visible light spectrum by means of active illumination. Although
in the case of active imaging the problem of illumination variation is considerably
alleviated, the techniques discussed in this paper are still relevant and are likely to
improve the system performance. However, their main signi¯cance is in the context
of conventional, passive face imaging in the visible light spectrum and we shall
concentrate on this challenging problem. Although all the above approaches to
illumination invariance in face recognition and veri¯cation are important in their
own right, the focus of this paper is on the preprocessing methods. For the discussion
H. Mendez-V�azquez et al.
1360005-2
Int.
J. P
att.
Rec
ogn.
Art
if. I
ntel
l. 20
13.2
7. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by U
NIV
ER
SIT
Y O
F O
TA
GO
on
09/3
0/13
. For
per
sona
l use
onl
y.
on other facets of illumination invariant face recognition the reader is referred to
Refs. 4 and 36.
Preprocessing methods are usually e±cient and have been the most widely used in
real applications.8,36 There are a lot of preprocessing methods proposed in the lit-
erature: histogram equalization, gamma intensity correction, homomorphic ¯ltering,
multi-scale retinex, self-quotient image and anisotropic smoothing10,23,28,31 are among
the most used preprocessing techniques for face recognition, but newer approaches
like the total variation quotient image7 and the preprocessing sequence proposed by
Tan and Triggs29 cope better with illumination variations. Most of these methods
have been compared with each other, and the main conclusion that can be drawn is
that the better they deal with the illumination problem, the less stable behavior they
exhibit on images obtained in normal lighting conditions and in the presence of other
kinds of variations in the face image data.8,26,29 Most of the ¯lters and transforming
functions that are used to remove illumination variations usually introduce negative
e®ects or remove valuable discriminatory information from normally illuminated
images. Better approaches are still needed in order to best balance the advantages of
preprocessing for illumination degraded images and the loss of performance on
normally illuminated images.
Most of the preprocessing methods can be used either in a holistic or local way,
however, it has been shown that local approaches confront the illumination problems
better than the global ones.27,30
In this work we extend and describe in details a new photometric normalization
method using the local Discrete Cosine Transform (DCT) in the logarithmic domain
to compensate for the illumination e®ect while preserving the face discriminatory
information, that was ¯rst introduced in Ref. 18. A photometrically normalized face
image is obtained by subtracting a compensation term from the original image. The
compensation term is estimated by smoothing the image constructed using low-
frequency coe±cients extracted from the local DCT of the original image in the
logarithmic domain. The proposed method was tested on the XM2VTS face data-
base and compared with state-of-the-art photometric normalization methods. Our
method (LDCT) and the preprocessing sequence (PS) exhibit a similar performance
as measured in terms of average error rates, and both are superior to other pho-
tometric normalization methods. An in-depth analysis of the two methods (LDCT
and PS) revealed di®erences in their performance on individual images, suggesting
that the methods provide complementary information. Drawing on their diversity,
we propose to use them jointly to improve the results for face recognition under
varying lighting conditions, while at the same time ensuring that good results
are obtained for normally illuminated images. Signi¯cant improvements in perfor-
mance are experimentally demonstrated in both face veri¯cation and identi¯cation
frameworks.
This paper is organized as follows. Section 2 analyzes the e®ect of the illumination
changes on face images under the commonly used Lambertian model. Section 3
Photometric Normalization for Face Recognition using Local DCT
1360005-3
Int.
J. P
att.
Rec
ogn.
Art
if. I
ntel
l. 20
13.2
7. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by U
NIV
ER
SIT
Y O
F O
TA
GO
on
09/3
0/13
. For
per
sona
l use
onl
y.
introduces the proposed photometrical normalization method. Section 4 describes
the experimental setup adopted to evaluate it. Section 5 discusses the experiments
conducted in order to select the parameters for the proposed method. Section 6
compares the proposed method with some of the state-of-the-art photometric nor-
malization methods. Section 7 presents a novel face veri¯cation scheme which
combines the outputs of face recognition experts employing the proposed photo-
metric normalization and the PS method, and reports on the experimental results. In
Sec. 8, additional experiments of the proposal in a face identi¯cation framework are
reported and compared with state-of-the-art methods in a face database with great
variations in illumination conditions. Finally, Sec. 9 concludes the paper.
2. Compensation for Illumination Variations in the Low Frequencies
Let us consider an image acquisition system deploying a conventional camera. We
assume that the imaged objects have a Lambertian surface. This assumption is
reasonably well justi¯ed in the case of faces. We shall ignore the e®ect of interface
re°ection, which would distort only a small part of the face image due to the satu-
ration of the camera. In any case, in regions giving rise to total re°ection, the spectral
content of the re°ected light would be dominated by the illuminant, rather than the
face skin, and would not provide useful information for discriminatory purposes.
Suppose the scene is illuminated by spatially invariant illumination source of
spectral distribution eð�Þ where � represents the wavelength of the incident light.
Under the Lambertian assumption, the light emitted from the scene will be a func-
tion of the material properties of the scene objects, their albedo, which we denote by
�ðx; y; �Þ, and the relative angles between the direction of illumination and the
normal to the surface patch imaged by the ðx; yÞ pixel of a camera sensor. The e®ect
of the geometry will be to scale down the incident light by a factor sðx; yÞ. Theoutput of the sensor at pixel position ðx; yÞ will then be given by
Iðx; yÞ ¼Z �2
�1
�ð�Þ�ðx; y; �Þsðx; yÞeð�Þd�; ð1Þ
where �ð�Þ is the spectral response of the sensor and �i; i ¼ 1; 2 are the limits of the
visible frequency spectrum.
Assuming the response of the sensor is °at, �ð�Þ ¼ c1, and that the illuminant also
has a broad °at spectrum which can be approximated by eð�Þ ¼ c2, the intensity
image acquired by the camera will be given by
Iðx; yÞ ¼ c1c2sðx; yÞZ �2
�1
�ðx; y; �Þd� ð2Þ
which can be written as
Iðx; yÞ ¼ Lðx; yÞRðx; yÞ; ð3Þ
H. Mendez-V�azquez et al.
1360005-4
Int.
J. P
att.
Rec
ogn.
Art
if. I
ntel
l. 20
13.2
7. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by U
NIV
ER
SIT
Y O
F O
TA
GO
on
09/3
0/13
. For
per
sona
l use
onl
y.
where Lðx; yÞ is the intensity of the incident light and Rðx; yÞ is the re°ectance
property of the surface material.
The re°ectance, which represents the shape and texture of the surface, is unique
for each face and is what allows, ideally, to discriminate between an individual or
another. If we have the pixel intensity values of an image and the spectral distri-
bution of the incident light (luminance) at each point is known, then the re°ectance
can be recovered. However, the luminance value will vary according to the geometry
of the scene, the angle of incidence of the illuminant and the viewing angle. A priori
knowledge of all these factors only is possible in a very controlled laboratory setting.
It is not possible to have such information in real conditions, hence the identi¯cation
of a person starting from a typical face image turns a di±cult task. Methods to
estimate the luminance component or to compensate for variations in lighting con-
ditions, in a way that the discriminatory features related to re°ectance can be
recovered, are needed.
Most of photometric normalization methods are based on Eq. (3), trying to
eliminate the luminance value (L) and recover the re°ectance (R) to discriminate
between di®erent faces. Successful methods, such as homomorphic ¯ltering, make the
assumption that the luminance changes slowly over the scene and is therefore a low
frequency phenomenon, whereas re°ectance, which characterizes skin texture, con-
tributes a higher frequency content. Due to the multiplicative nature of the image
generation model in Eq. (3), these two information sources can be easily separated by
¯ltering in the logarithmic space, as in Eq. (4). Hence, several methods include a
logarithmic transformation step to eliminate the e®ect of lighting.
log Iðx; yÞ ¼ logRðx; yÞ þ logLðx; yÞ: ð4Þ
Although these photometric normalization methods are demonstrably e®ective,
the underlying premise is somewhat °awed. Referring to Eq. (2), the luminance term
is a byproduct of the incident light and the surface orientation. For a surface with
slowly varying surface normal, the luminance term will be a low frequency function.
Indeed, to a ¯rst approximation, many researchers model human head geometrically
as a cylinder. Such a 3D pro¯le would give rise to a smooth, low frequency luminance
image which could be easily ¯ltered out. However, face contains morphological fea-
tures such as eyes, nose, mouth, and wrinkles, which inject high frequency compo-
nents to the luminance function. The shading caused by these surface undulations
contribute information about the face 3D structure which should be preserved to aid
face discrimination.
Similarly, the re°ectance term contains low and high frequency information. The
dominant skin characteristic is of low frequency. The skin texture is basically
homogeneous, changing very slowly over the face surface. However, in the locality of
facial features, such as eye brows, lips, eyes, skin defects, beauty spots and facial hair,
the albedo changes rapidly, introducing high frequency signal to the re°ectance
function.
Photometric Normalization for Face Recognition using Local DCT
1360005-5
Int.
J. P
att.
Rec
ogn.
Art
if. I
ntel
l. 20
13.2
7. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by U
NIV
ER
SIT
Y O
F O
TA
GO
on
09/3
0/13
. For
per
sona
l use
onl
y.
The above analysis suggests that both the luminance and re°ectance components
of a face image contain low and high frequencies. Clearly this makes it di±cult to
separate the luminance e®ect from re°ectance purely on the basis of frequency
content. However, in both cases the high frequency content is the one containing the
most important information for discriminative purposes and the low frequency one
varies with the lighting variations. Therefore, when eliminating low-frequency
components of the image, variations in lighting are compensated, but these do not
correspond only to luminance, but also contain information from the re°ectance. To
maximize the e®ectiveness of photometric normalization, the low pass ¯lter has to be
designed carefully, so that the high frequency discriminatory information content is
not compromised.
In order to identify a suitable compensation method, let us investigate the e®ect of
illumination conditions on the observed face image. In general the illuminant can be
a very complex function. For simplicity, we shall assume that the scene is illuminated
by a distant point source. We are less concerned with variations in intensity, which
can be handled simply by scaling. More challenging is the e®ect of changes in the
incident angle of the light.
In Fig. 1, some images corresponding to a 3D face surface with constant albedo,
for di®erent incident light angles, are shown. Di®erences between images correspond
to changes in the luminance component, since they are from the same face model
with constant albedo. As can be appreciated, although the luminance images mostly
exhibit the behavior of the incident illumination, they also depict information about
the face surface. The e®ect of changes in the angle of the incident light over the 3D
face surface can be appreciated in Fig. 2, which plots the variance of the log lumi-
nance function, as a function of the incident light angle. We can see from the graph
that for frontal, or near frontal illumination, the variance is low, but it increases
dramatically for more pronounced side illumination.
From above observations, the changes due to illumination, although slowly
varying, may have in°uence on a broad frequency spectrum. Thus methods
that strive to minimize the illumination e®ect by low frequency suppression,6,26,29
will either fail to eliminate the e®ect of illumination fully, or it will compromise
the information content deemed useful for biometric analysis, depending on the
bandwidth.
It follows that to maximize the e®ectiveness of photometric normalization, is more
convenient to estimate and subtract carefully the low frequency content a®ected by
Fig. 1. Sample luminance face images with constant albedo at di®erent incident illumination angles.
H. Mendez-V�azquez et al.
1360005-6
Int.
J. P
att.
Rec
ogn.
Art
if. I
ntel
l. 20
13.2
7. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by U
NIV
ER
SIT
Y O
F O
TA
GO
on
09/3
0/13
. For
per
sona
l use
onl
y.
illumination instead of radically suppress it, to not compromise the discriminatory
information. Moreover, it must be taken into account that image variations due to
changes in the incident light do not a®ect uniformly all face regions,27 thus it is not
convenient to apply a homogeneous ¯lter to the complete image.
To put photometric normalization on a proper footing, we adopt an alternative
model for image generation. We take purely signal processing view point and con-
sider a face image to be generated by amplitude modulation. Let lðx; yÞ be a low
frequency base signal and hðx; yÞ the information conveying high frequency signal.
The slowly varying signal lðx; yÞ captures jointly the low frequency luminance
phenomenon, re°ecting the global shape of the face and the low frequency variations
in albedo. hðx; yÞ, in the ¯rst instance, represents the rapid luminance changes
produced by the surface properties of structural facial features, but in addition it
models micro changes in albedo associated with surface markings and facial hair.
Function lðx; yÞ is assumed to be non-negative, whereas the modulating function
hðx; yÞ is bound to the interval ½�1; 1�. The image Iðx; yÞ is generated by amplitude
modulation as
Iðx; yÞ ¼ lðx; yÞ½hðx; yÞ þ 1�: ð5Þ
Under the constraints imposed on the modulating signal, Eq. (5) will produce a
non-negative composite signal. The model in Eq. (5) is a quite realistic generator of
face images. In poorly lit scenes the intensity variations induced by surface undu-
lations and skin texture will be low, but in well lit environments the overall image
brightness will be much higher and local image changes associated with facial fea-
tures will have much greater contrast. The model does not di®erentiate between the
face image appearance caused by structure and skin texture. In any case, for intensity
0 10 20 30 40 50 60 700
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
light angle
varia
nce
Fig. 2. Variance of the log luminance for a 3D face with constant albedo, as a function of the incidentlight angle.
Photometric Normalization for Face Recognition using Local DCT
1360005-7
Int.
J. P
att.
Rec
ogn.
Art
if. I
ntel
l. 20
13.2
7. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by U
NIV
ER
SIT
Y O
F O
TA
GO
on
09/3
0/13
. For
per
sona
l use
onl
y.
images, these are inseparable and the model correctly re°ects this. The implicit
assumption about the model is that function lðx; yÞ is sensitive to illumination
changes, whereas function hðx; yÞ is illumination invariant.
The multiplicative nature of amplitude modulation allows us to separate the base
signal from the person speci¯c face information encoded by the modulating com-
ponent. A logarithmic mapping transforms the composite signal into an additive
mixture:
iðx; yÞ ¼ log Iðx; yÞ ¼ log lðx; yÞ þ log½hðx; yÞ þ 1�: ð6ÞUnder the assumption that the base signal can be estimated, the low frequency
variations can be removed from the compound signal to achieve photometric nor-
malization.
3. The Photometric Normalization Algorithm
In order to facilitate the optimization of the ¯ltering, we shall represent the image
content in the frequency domain. Di®erent methods can be used to transform an
image from spatial domain to frequency domain. The DCTmethod is commonly used
in signal and image processing because of its simplicity, low computational com-
plexity and better energy-compaction, being asymptotically equivalent to the
Karhunen�Loeve Transform (KLT) for Markov-1 signals with a correlation coe±-
cient close to one.24
The DCT transform of an M � N image, is de¯ned as:
Cðu; vÞ ¼ �ðuÞ�ðvÞXM�1
x¼0
XN�1
y¼0
iðx; yÞ � cos�ð2xþ 1Þu
2M
� �cos
�ð2yþ 1Þv2N
� �; ð7Þ
where
�ðuÞ ¼
1ffiffiffiffiffiffiM
p ; u ¼ 0
ffiffiffiffiffiffi2
M
r; u ¼ 1; . . . ;M � 1;
8>>><>>>:
ð8Þ
and
�ðvÞ ¼
1ffiffiffiffiffiN
p ; v ¼ 0
ffiffiffiffiffi2
N
r; v ¼ 1; . . . ;N � 1:
8>>><>>>:
ð9Þ
Amethod using the DCT to compensate for illumination variations was presented
in Ref. 6. The authors proposed setting to zero the low-frequency DCT coe±cients of
an image in the logarithm domain as an approximation of the compensation term
that it is necessary to subtract from the image. This method outperformed many of
H. Mendez-V�azquez et al.
1360005-8
Int.
J. P
att.
Rec
ogn.
Art
if. I
ntel
l. 20
13.2
7. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by U
NIV
ER
SIT
Y O
F O
TA
GO
on
09/3
0/13
. For
per
sona
l use
onl
y.
the existing methods dealing with illumination variations when comparing on Yale B
database.6 Recently, di®erent extensions of this method have been proposed.1,14 In
all cases the aim has been to increase e±ciency; none of them have shown better
classi¯cation results. An example of a photometrically normalized face image using
this method is shown in Fig. 3(b).
By setting the low frequency coe±cients of DCT to zero, we are able to create an
ideal ¯lter with a perfect transition band. However, this type of ¯ltering creates
ripples in the spatial domain. This e®ect is illustrated in Fig. 4, where an image
containing a Dirac impulse intensity function, in Fig. 4(a), is transformed by a global
DCT and low frequency coe±cients zeroed. The image reconstructed to the spatial
domain is shown in Fig. 4(b), in which the ripple extends over the whole image. For
face images these ripples would have a perturbing e®ect on the subsequent methods
of face description and classi¯cation. To minimize the e®ect of this phenomenon, we
shall adopt a local ¯ltering method as advocated in Ref. 17. The bene¯cial e®ect of
local processing is apparent from Fig. 4(c) where the ripple is con¯ned to the local
window.
(a) (b) (c)
Fig. 3. (a) Original image and photometric normalized image using (b) global DCT and (c) local DCT.
(a) (b) (c)
Fig. 4. A representation of the e®ect of removing the low frequencies of an image in a global way and in a
local way. (a) Original image, (b) global process and (c) local process.
Photometric Normalization for Face Recognition using Local DCT
1360005-9
Int.
J. P
att.
Rec
ogn.
Art
if. I
ntel
l. 20
13.2
7. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by U
NIV
ER
SIT
Y O
F O
TA
GO
on
09/3
0/13
. For
per
sona
l use
onl
y.
The method in Ref. 17, which involved discarding low-frequency DCT coe±cients
in the logarithmic domain in a local way, was showed to improve the results obtained
using the global DCT. In this method, the face image is divided in regular regions and
low-frequency DCT coe±cients of each region are discarded. Uniform Local Binary
Pattern (LBP) histograms2 are then computed for each region and used for classi-
¯cation. Note that the same region division is used for the normalization with the
DCT and for the classi¯cation step using the LBP, so in this case, the photometric
normalization is tightly coupled with the image structure used for feature extraction
and classi¯cation. This skillfully avoids the corrupting e®ect on facial image repre-
sentation of the blockiness of the photometrically normalized image obtained using a
local DCT approach, which can be quite pronounced as shown in Fig. 3(c). If the
image block structure used for preprocessing and feature extraction are incongruent,
the e®ect of the high frequency information injected by the block structure can be
devastating and can negate any positive bene¯ts of photometric normalization.
Unfortunately, the number of face image representation methods where this
kernel congruency between photometric normalization and feature extraction exists
naturally is severely limited. Thus the objective of our work is to develop a photo-
metric normalization method which retains its local sensitivity without introducing
any blocky artefact. Such a method can be used with any feature descriptor or
classi¯er regardless of image partitioning.
Our proposed method is a modi¯cation of the early technique that deals with the
blockiness artifact and as a result it makes the photometrically normalized images
usable by any general face representation approach. Moreover, we intend not to
radically suppress the low frequency information.
Based on the model presented in Eq. (5), the ¯rst step of the method is to
transform the image to the log intensity domain to obtain an additive mixture
following Eq. (6). The second step consists in estimating the low frequency compo-
nent in the log domain, which is based on the use of the local DCT. The recon-
structed low frequency image is then subtracted from the image in the log domain to
obtain the photometrically normalized image, which contains information only about
the luminance changes produced by the surface properties of structural facial fea-
tures and the microchanges in albedo. The photometric normalized image can be
restored to the original spatial domain, however, this is only a scale transformation
and it has shown than can introduce incorrect adjustments to the normalized
values.6 Taking this into account, in the proposal, the photometric normalized
images will be recognized directly from the logarithmic domain.
In this process, estimating the compensation term based on low frequency com-
ponents is fundamental to obtain a suitable photometric normalization.
3.1. Estimating the low frequency component
Since we want to make use of the local information instead of the global one, the
face image is divided into rectangular blocks and the DCT is computed over them.
H. Mendez-V�azquez et al.
1360005-10
Int.
J. P
att.
Rec
ogn.
Art
if. I
ntel
l. 20
13.2
7. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by U
NIV
ER
SIT
Y O
F O
TA
GO
on
09/3
0/13
. For
per
sona
l use
onl
y.
Using only the low-frequency coe±cients of each block and setting to zero the
remaining ones, a low pass version of the log image can be reconstructed by applying
the inverse DCT.
In a DCT block, the top left coe±cients, selected in a zig-zag scan manner cor-
respond to the low frequency information. However, the Cð0; 0Þ coe±cient, usually
called DC coe±cient, is related to the mean intensity values of the block, repre-
senting most of the energy of the region. This can be appreciated in Eq. (10), which is
a simpli¯cation of Eq. (7), considering that the cosine of zero is one.
Cð0; 0Þ ¼ 1ffiffiffiffiffiffiM
p ffiffiffiffiffiN
p �XM�1
x¼0
XN�1
y¼0
iðx; yÞ: ð10Þ
The e®ect of the DC coe±cient can be seen in Fig. 5. In the ¯rst row we show the
DC values of each block of the logarithmically transformed images of 3D face models
with a constant albedo for di®erent persons, while the second row shows the DC
values for images of the same model with di®erent incident lighting. All images in the
¯gure were resampling in a way that every pixel represent de DC value corresponding
to an image block. As can be appreciated, although this coe±cient re°ects a local
energy, DC coe±cients of a face image are highly related to the incident illumination.
Because of its local computation, they all together contain some high frequency
information associated to the luminance changes of structural facial features such as
the nose, eyes and lips.
Since we want to estimate the low frequency component to subtract it from the
image in the log domain, it is necessary to modify the DC coe±cient of each block in a
way that re°ects changes in the incident illumination. From Fig. 5, it is apparent,
that in images with normal incident illumination, the DC coe±cients of the
respective blocks show a little dispersion in their values. In contrast, if the lighting
variation increases, the di®erence in the DC coe±cient values become greater. This
(a)
(b)
Fig. 5. A representation of the DC coe±cient of each block for luminance images with constant albedo.
(a) Di®erent models with the same incident lighting. (b) Same model at di®erent angles of incident
lighting.
Photometric Normalization for Face Recognition using Local DCT
1360005-11
Int.
J. P
att.
Rec
ogn.
Art
if. I
ntel
l. 20
13.2
7. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by U
NIV
ER
SIT
Y O
F O
TA
GO
on
09/3
0/13
. For
per
sona
l use
onl
y.
can be measured objectively in terms of DC coe±cients variance. In Fig. 6, the
variance of the DC coe±cients of the log luminance of a 3D face surface with constant
albedo is plotted as a function of the incident light angle. It is apparent that the
larger the illumination variation, the greater the dispersion of the DC values.
If a constant value, representing a \good" DC value, is subtracted from each DC
coe±cient, the obtained value would represent the information injected to this value
due to the variation in the lighting. Then, to obtain the image which represents the
low frequency component, we use the low-frequency DCT coe±cients of each block,
replacing the DC one for its original computed value minus a constant reference
value.
The reference value was determined by computing the mean value of the DC
coe±cient in a set of 80 face images of di®erent subjects under the same frontal
incident light. In Fig. 7, the mean value obtained for each image and the general
mean value are plotted. As can be appreciated, for the same incident illumination,
the mean of the DC values of images from di®erent subjects is very similar. The
general mean value obtained for this frontal lighting will be used as a reference value
to normalize all face images.
The reconstructed low pass image exhibits a block e®ect produced by the image
subdivision. In order to reduce the block division e®ect, we apply a low pass smoothing
¯lter to the reconstructed image before subtracting it from the original image in the
logarithmic domain. Using a low pass ¯lter over the reconstructed image, does not
contradict the aim of suppressing the low frequencies, on the contrary, the high
frequencies that can appear in the estimated compensation image are eliminated.
The proposed procedure can be summarized in the following steps: (1) to apply
the logarithmic transformation to the original face image, (2) to reconstruct the low
0 10 20 30 40 50 60 700
1
2
3
4
5
6
7x 10
5
light angle
varia
nce
Fig. 6. Variance of the DC coe±cient as function of the incident illumination.
H. Mendez-V�azquez et al.
1360005-12
Int.
J. P
att.
Rec
ogn.
Art
if. I
ntel
l. 20
13.2
7. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by U
NIV
ER
SIT
Y O
F O
TA
GO
on
09/3
0/13
. For
per
sona
l use
onl
y.
pass version of the log image using the low-frequency DCT coe±cients and modifying
the local DC value, (3) to smooth the resulting image and (4) to subtract the
smoothed compensation term from the original image in the logarithmic domain. The
e®ect of each step is evident in Fig. 8, showing at the end the photometrically
normalized image obtained with the proposed method.
4. Experimental Setup
The XM2VTS database with the Lausanne protocol20 was used to evaluate the
performance of the proposed photometric normalization method. The XM2VTS
database contains 2360 images of 295 subjects, captured in four di®erent sessions.
The database is divided into a training set composed of images of 200 subjects as
clients, an evaluation set (Eval) with images of the same subjects as clients and of 25
additional subjects as imposters, and a test set with 70 subjects as imposters. The
training, evaluation and test sets are composed of images under controlled
0 10 20 30 40 50 60 70 8020
25
30
35
40
45
50
face images
DC
mea
n va
lues
Fig. 7. DC mean values of di®erent face images under the same frontal illumination.
(a) (b) (c) (d) (e)
Fig. 8. An example of the e®ect of each one of the steps of the preprocessing method: (a) original image,(b) logarithm transformation, (c) illumination compensation image with block e®ect, (d) smoothed
compensation image and (e) result image after subtraction.
Photometric Normalization for Face Recognition using Local DCT
1360005-13
Int.
J. P
att.
Rec
ogn.
Art
if. I
ntel
l. 20
13.2
7. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by U
NIV
ER
SIT
Y O
F O
TA
GO
on
09/3
0/13
. For
per
sona
l use
onl
y.
illumination conditions. There is an additional \dark" set which contains images
from the same subjects either illuminated from the left or the right.
There are two con¯gurations of the evaluation protocol known as Lausanne
protocol.20 Here, we use Con¯guration I, in which the images for training and
evaluation are from the ¯rst three sessions of acquisition. For training, three images
per person are used, and the number of accesses or comparisons in the other subsets
can be summarized as:
Eval Test Dark
Clients accesses 600 400 800Imposters accesses 40 000 112 000 56 000
Total accesses 40 600 112 400 56 800
The Equal Error Rate (EER) is the point at which the False Rejection Rate (FRR) is
equal to the False Acceptance Rate (FAR). The value obtained by the classi¯cation
method at this point in the Eval set is used as a threshold for the decision of ac-
ceptance or rejection in the Test and Dark sets. On the other hand, the Total Error
Rate (TER) is the sum of FRR and FAR. The lower this value, the better the
recognition performance.In our experiments, all face images were closely cropped to include only the face
region. The extracted face images were geometric normalized by the centers of the
two eyes to a standard size of 120� 144 (width� height) pixels.
4.1. Face description and classi¯cation
The Local Binary Pattern (LBP) operator is used for representing and classifying the
photometrically normalized face images. This face image representation can be
computed directly in the log image domain, as it is invariant to monotonic trans-
formations. This helps to deal with any residual illumination problem.
The original LBP operator, introduced by Ojala et al. in Ref. 21, labels each pixel
of an image with a value called LBP code, which corresponds to a binary number
that represents its relation with the 3� 3 local neighborhood. Di®erent extensions of
the original operator have been used for face recognition.16
The ¯rst and the most prevalently used LBP operator for face recognition was
presented in Ref. 2. In this case, a neighborhood of eight pixels in a radius of two
ð8; 2Þ is used to compute the LBP codes, but only those binary codes with at most
two bitwise transitions from zero to one (01) or vice versa (10), called uniform
patterns, are considered. The face image is divided into rectangular regions and
histograms of the uniform LBP codes are calculated over each of them. The histo-
grams of the regions are concatenated into a single augmented histogram which then
represents the face image. A nearest neighbor classi¯er with the �2 dissimilarity
measure is used to compare the histograms of two di®erent images.
One of the most recent advances of LBP in face recognition is the Multi-Scale
LBP (MLBP) representation with the Linear Discriminant Analysis (LDA).4 First,
H. Mendez-V�azquez et al.
1360005-14
Int.
J. P
att.
Rec
ogn.
Art
if. I
ntel
l. 20
13.2
7. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by U
NIV
ER
SIT
Y O
F O
TA
GO
on
09/3
0/13
. For
per
sona
l use
onl
y.
LBP codes at R di®erent radii or scales are computed over the face image, generating
R LBP images. All the LBP images are of the same size and they are divided into the
same number of nonoverlapping rectangular regions. The LBP histograms are
computed for each region in every LBP image, and the histograms corresponding to
the same region at di®erent scales are concatenated into a single vector providing a
multiresolution regional face descriptor. A regional discriminative facial descriptor is
then de¯ned by projecting the multiresolution regional face descriptors into a LDA
space, in which a normalized correlation is used as a similarity measure to compare
the projections corresponding to the same region of two di®erent images. Finally, the
similarities of the di®erent regions are summed up to obtain a global measure of
similarity of two face images.
In this work, both, the traditional LBP using the uniform patterns derived from a
ð8; 2Þ neighborhood and the recent MLBP representation, with the �2 dissimilarity
measure and the normalized correlation in the LDA space, are used to test the
proposed normalization method.
5. Parameters Selection
There are some parameters that can be chosen to optimize the performance of the
proposed algorithm.The ¯rst parameter is the size of the blocks inwhich the images are
divided to apply the local DCT. If the block is too small more computational e®ort is
needed. On the other hand the larger the blocks the more a®ected it will be by the
illumination variations. We chose to divide the image into w� w equally sized rect-
angular blocks and tested the method for di®erent values of w to select the best one.
Another parameter is the number of DCT coe±cients used from each block to
estimate the low frequency content of the image, which is a crucial step in the pro-
posed methodology. The DCT coe±cients usually are scanned in a zig-zag manner,
from the low frequencies to the high ones. We tested three di®erent cut o® points in
the list of coe±cients ordered by the zig-zag scan for the di®erent values ofw. In Fig. 9
(a), the performance of the proposed method with di®erent w values and using dif-
ferent numbers of DCT coe±cients for the three subsets of the XM2VTS database is
presented. Because of its simplicity, we use the LBP+�2 to run the experiments for
the parameters optimization. However, the general behavior of all the tested veri¯-
cation methods is similar, i.e. performance improvements are correlated.
As can be appreciated, the best performance in all the cases is obtained by di-
viding the images into 8� 8 blocks, which is also the traditional division for other
applications of the DCT. The optimal number of coe±cient is more di±cult to select
as expected, however, we decided to use 15 low-frequency DCT coe±cients because
this choice shows the most stable performance for the three sets and with the
di®erent blocks sizes.
As a ¯nal parameter we need to select the ¯lter and the size for the smoothing
operation that needs to be applied to the estimated term. There are a lot of
smoothing ¯lters de¯ned for digital image processing. We tested an averaging ¯lter
Photometric Normalization for Face Recognition using Local DCT
1360005-15
Int.
J. P
att.
Rec
ogn.
Art
if. I
ntel
l. 20
13.2
7. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by U
NIV
ER
SIT
Y O
F O
TA
GO
on
09/3
0/13
. For
per
sona
l use
onl
y.
with a square and a circular kernel, as well as a Gaussian ¯lter. These ¯lters were
chosen for their simplicity and a wide spread use. In Fig. 9(b), we report the TER of
the proposed method, using the selected block size and number of coe±cients, for
di®erent smoothing ¯lters on the three subsets of the database. The x-axis plots the
kernel size of the ¯lters, h, from which, for the case of the Gaussian ¯lter, the
standard deviation can be obtained as � ¼ ðh� 1Þ=2.From the plots, one can appreciate that the three ¯lters exhibit a very similar
performance. In general, as the size of the kernel increases the classi¯cation error
decreases. However, the larger the size of the kernel the higher the computational cost.
The best performance on theDark setwas achieved using a circular averaging ¯lterwith
a kernel of size 11, which also presents a very good performance on the Eval and Test
sets. This kernel size is also a good value for the computational complexity trade-o®.
(a) (b)
Fig. 9. Performance (TER) of the proposed method with (a) di®erent w� w blocks divisions and numberof DCT coe±cients and (b) di®erent smoothing ¯lters of di®erent size in the Evaluation (up), Test (middle)
and Dark (bottom) sets of the XM2VTS database.
H. Mendez-V�azquez et al.
1360005-16
Int.
J. P
att.
Rec
ogn.
Art
if. I
ntel
l. 20
13.2
7. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by U
NIV
ER
SIT
Y O
F O
TA
GO
on
09/3
0/13
. For
per
sona
l use
onl
y.
Using the optimized parameters we tested the proposed photometric normaliza-
tion method (LDCT) with the di®erent combinations of face description and deci-
sion-making methods: LBP+�2, LBP+LDA, MLBP+�2 and MLBP+LDA. Table 1
shows the obtained total error rates and compares them with the results for the
original images (OI) without any preprocessing. As can be appreciated, for all clas-
si¯ers, although the performance for the well-illuminated images is in some way
degraded by LDCT preprocessing (which is a normal behavior of photometric nor-
malization methods as can be seen further in Table 2), the improvement achieved for
the Dark set is signi¯cant. In general, the MLBP face description outperforms LBP,
while LDA is the best classi¯er. The subsequent evaluation of the proposed method
has been carried out exclusively with MLBPþLDA.
6. Comparison with Other Methods
The proposed method, in conjunction with MLBP+LDA, has been compared with
state-of-the-art photometric normalization methods using the same con¯guration of
the XM2VTS database. Table 2 shows the TER for each subset of the database
for the original images (OI) and well-known photometric normalization methods
such as Histogram Equalization (HE), Homomorphic Filtering (HF), Self-Quotient
Image (SQI) and Anisotropic Smoothing (AS), as well as the more recent approaches
including the Total Variation Quotient Image (TVQI) and the Processing
Sequence (PS).
Among the tested methods, PS shows the best results on the Dark set, followed by
our LDCT method. On the other hand on the Test set, where the images do not
present large illumination variations, PS shows a slightly worse performance than
LDCT. Since the performance of PS and LDCT is very close, we will compare these
methods in more detail.
6.1. PS verses LDCT
The PS method was proposed by Tan and Triggs.29 It is composed of a series of steps
aiming to reduce the e®ects of illumination variations, local shadowing and
Table 1. Comparison of di®erent LBP-basedclassi¯ers in terms of TER (%).
Eval Test Dark
LBPþ�2 OI 10.3 7.12 95.7
LDCT 16.0 11.8 65.2
LBPþLDA OI 3.66 2.94 17.1LDCT 4.67 3.64 14.5
MLBPþ�2 OI 9.69 7.81 89.6
LDCT 12.7 10.6 48.4
MLBP+LDA OI 1.90 1.16 13.7LDCT 2.00 1.32 4.55
Photometric Normalization for Face Recognition using Local DCT
1360005-17
Int.
J. P
att.
Rec
ogn.
Art
if. I
ntel
l. 20
13.2
7. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by U
NIV
ER
SIT
Y O
F O
TA
GO
on
09/3
0/13
. For
per
sona
l use
onl
y.
highlights, while still keep the essential visual appearance information for use in
recognition. The ¯rst step is to apply a gamma correction, which is a nonlinear gray
level transformation replacing the pixel value in I with I � , where � > 0. The second
step involves a Di®erence of Gaussian (DoG) ¯ltering. This band-pass ¯lter not only
suppresses low frequency information caused by illumination gradient, but also
reduces the high frequency noise. The ¯nal step is a global contrast equalization
which re-normalizes the image intensities to standardize the overall contrast, where
the large values are truncated and their in°uence is reduced. After this step, the
image may still contain extreme values, so a nonlinear function that compresses very
large values is optionally applied.
Comparing PS and the proposed method, the most important di®erence between
them is in the frequency information that is retained and suppressed in the main step
of each algorithm. The ¯rst step for both methods, gamma correction and logarithm
transformation, works in the same way, enhancing the dark image intensity values,
while compressing the bright ones. For both methods, the second step is the fun-
damental one. In PS, the DoG ¯ltering attenuates the lower and higher frequencies
retaining the information in the mid of frequency spectrum, while our method uses
the low frequency DCT coe±cients to construct the compensation term which is
subtracted from the original log image, thus suppressing low frequency content. The
high frequency attenuation in PS could be the cause of worse performance for well-
illuminated images, since the important facial features mainly lie in the high fre-
quency band. The subsequent steps in each method have di®erent purposes. In the
PS case, the ¯ltered image is postprocessed to improve its overall contrast, while in
LDCT, the compensation image constructed with the low frequency DCT coe±cients
is ¯ltered to remove blockiness. These di®erences inject diversity which leads to
di®erent outputs being generated in each case.
Noting that PS and LDCT work di®erently but the total error rates achieved by
them on the XM2VTS database are very similar, it was pertinent to check whether
the speci¯c misclassi¯cations committed by each method were correlated. In Ref. 33
a statistical test, known as z statistics, to determine whether of two classi¯ers deliver
di®erent outputs is described.
Table 2. Comparison of di®erent photometric normalizationmethods on the XM2VTS database using MLBPþLDA.
Eval Test Dark
OI 1.90 1.16 13.7HE 2.10 1.17 13.5
HF 2.35 1.35 12.7
SQI 2.31 1.84 11.6
TVQI 2.65 1.98 6.98AS 2.08 1.50 6.15
PS 2.00 1.56 3.72
LDCT 2.00 1.32 4.55
H. Mendez-V�azquez et al.
1360005-18
Int.
J. P
att.
Rec
ogn.
Art
if. I
ntel
l. 20
13.2
7. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by U
NIV
ER
SIT
Y O
F O
TA
GO
on
09/3
0/13
. For
per
sona
l use
onl
y.
Let us introduce the following notation:
n00 ¼ number of samples misclassi¯ed by both PS and LDCT,
n01 ¼ number of samples misclassi¯ed by PS but not by LDCT,
n10 ¼ number of samples misclassi¯ed by LDCT but not by PS,
n11 ¼ number of samples misclassi¯ed by neither PS nor LDCT.
This information is best represented as a confusion matrix:
The z statistics is de¯ned as:
z ¼ jn01 � n10j � 1ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffin10 þ n01
p : ð11Þ
If jzj > 1:96 we can say that the two methods do not have the same error (with a
0:05 probability of incorrect decision).
In Table 3, we show the confusion matrix for each set of the XM2VTS database
and, in Table 4, the corresponding z statistic.
Note that the statistical test in all cases is higher than 1.96. Thus the two methods
misclassify images in a di®erent way. A deeper analysis of the coincidences in mis-
classi¯cation (n00), reported in Table 5, shows that for both cases, less than the half
of the incorrectly classi¯ed images are also misclassi¯ed by the other method.
Table 3. Comparison of the number of images
misclassi¯ed for both methods in each set of the
XM2VTS.
Photometric Normalization for Face Recognition using Local DCT
1360005-19
Int.
J. P
att.
Rec
ogn.
Art
if. I
ntel
l. 20
13.2
7. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by U
NIV
ER
SIT
Y O
F O
TA
GO
on
09/3
0/13
. For
per
sona
l use
onl
y.
These results show clearly that the methodological di®erences between the two
methods inject diversity into the outputs generated by the face recognition method.
This diversity can be exploited to improve the recognition performance by multiple
expert fusion, as discussed in next section.
7. Classi¯er Fusion
It is well known that multiple classi¯er fusion is an e®ective method to improve the
performance of pattern recognition systems. The prerequisite is that the component
classi¯ers, the output of which is fused, provide complementary information. Clas-
si¯er diversity can be achieved in many di®erent ways. The options include di®erent
feature spaces, di®erent classi¯ers, di®erent metrics, and even di®erent classi¯er
parameter learning procedures. In our approach, the face recognition system in-
cluding its method of representation and matching is the same for all (the two)
component systems. The diversity is achieved by using di®erent face image pre-
processing techniques to perform photometric normalization. In particular, we
combine the solutions obtained with the PS and LDCT preprocessing, as illustrated
in Fig. 10.
The essential ingredients in multiple classi¯er fusion are the fusion architecture,
score normalization, and score fusion rule. In our work, we are combining only two
outputs and there are no architectural issues. A score normalization is normally
required if the fusion rule adopted is simple (untrained). For trained fusion rules,
score normalization is not needed, as the appropriate weighting of inputs is learnt
during the fusion rule inference process. We opted for a simple fusion by a ¯xed rule,
sum, as we desire a solution that is dependent on training as little as possible. The
sum fusion rule is known to be e®ective and also robust to noise.12 The use of a simple
fusion rule avoids the problems of generalization to data sets a®ected by drift caused
by various phenomena, such as illumination changes. As we use a single expert with
two di®erent inputs, the scores to be fused are in the same range and normalization is
not strictly required.
Table 4. The z statistics computedin each set of the XM2VTS.
Eval Test Dark
jzj 8.19 15.26 8.15
Table 5. Proportion of coincident misclassi¯cationfor PS and LDCT methods.
Eval Test Dark
PS 35.80% 36.28% 35.55%LDCT 45.39% 46.92% 26.72%
H. Mendez-V�azquez et al.
1360005-20
Int.
J. P
att.
Rec
ogn.
Art
if. I
ntel
l. 20
13.2
7. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by U
NIV
ER
SIT
Y O
F O
TA
GO
on
09/3
0/13
. For
per
sona
l use
onl
y.
Thus let us denote the score delivered by the face system obtained for an
input image, photometrically normalized by LDCT, as sLDCT and that delivered
for the same input with the PS preprocessing as sPS. The fused score is then simply
given as
s ¼ sLDCT þ sPS: ð12Þ
The merit of this simple fusion method can be gleaned from Table 6. Using the
proposed photometric normalization and classi¯er fusion scheme, a signi¯cant im-
provement in performance was achieved for all data sets, regardless of whether the
images were a®ected by illumination variations or not.
Table 7 compares our proposal with the reported results of some state-of-art
systems tested in the XM2VTS database for Con¯guration I, specially with the ones
reported on the Dark set.
The performance of 2.87% TER on the Dark set is very close to the best ever
error rate reported on the Dark set in the ICB 2006 competition.19 However, the
winning performance in the ICB 2006 competition was achieved by training the face
Table 6. Fusion results.
Eval Test Dark
FAR 1.00 1.06 0.47
PS FRR 1.00 0.5 3.25
TER 2.00 1.56 3.72
FAR 1.00 1.07 0.42
LDCT FRR 1.00 0.25 4.13
TER 2.00 1.32 4.55
FAR 0.90 0.97 0.37
PS+LDCT FRR 0.89 0.20 2.50
TER 1.79 1.17 2.87
original image
LDCT
PS
MLBP+
LDA
MLBP+
LDA
FUSION
output
outp
ut
classificationresult
Fig. 10. Proposed combination scheme.
Photometric Normalization for Face Recognition using Local DCT
1360005-21
Int.
J. P
att.
Rec
ogn.
Art
if. I
ntel
l. 20
13.2
7. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by U
NIV
ER
SIT
Y O
F O
TA
GO
on
09/3
0/13
. For
per
sona
l use
onl
y.
recognition system on poorly illuminated face images. In our approach the im-
provement is achieved entirely through photometric normalization. This is of
practical signi¯cance as in real scenarios it would be impossible to collect repre-
sentative data for all illumination conditions. Thus a solution that involves no
training is preferable. In any case, the brute force machine learning approach has
the disadvantage that the system performance on good quality data is degraded.
This can be seen from the results in Table 7. The solution optimized for the Dark set
(2nd entry ICB06-Best) dropped in performance on well-illuminated images (Eval)
from 1.63% to 2.35%.
8. Identi¯cation of Face Images with Variable LightingUsing the Proposed Combination Scheme
In order to corroborate the obtained results with the proposed combination scheme,
we evaluate it in a face identi¯cation framework.
The Yale B Face Database9 is used to conduct the experiments. This database is
widely used in the evaluation of face recognition methods that cope with illumina-
tion variations. It contains images of 10 subjects in 64 di®erent lighting conditions,
obtained with di®erent angles between the light source direction and the camera
axis. The larger the angle, the more unfavorable the lighting conditions are. The
database is usually divided into ¯ve subsets according to this angle. The face image
of every subject with an angle of 0� between the incident light and the camera is
used as gallery and the recognition performance for each one of the ¯ve subsets is
tested.
In Table 8, the correct classi¯cation error using the proposed scheme is com-
pared with some of the most important methods reported on this database. Al-
though good results exhibited for some methods on subsets S3 and S4, they do not
Table 7. TER of face recognition methods inthe XM2VTS database.
Eval Test Dark
LBP MAP16— 2.84 25.8
LBP AdaBoost16 — 7.80 71.20
LBP LDA11— 9.12 18.22
LBP HMM11— 2.74 19.22
AS LDA19 6.50 9.76 25.24AS HMM19 10.50 8.38 24.00
ICB06-Besta ,19 1.63 0.96 —
ICB06-Bestb ,19 2.35 — 2.02
PSþLDCT 1.79 1.17 2.87
aTrained and tested on well-illuminated images.bTrained and tested on variable illuminated
images.
H. Mendez-V�azquez et al.
1360005-22
Int.
J. P
att.
Rec
ogn.
Art
if. I
ntel
l. 20
13.2
7. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by U
NIV
ER
SIT
Y O
F O
TA
GO
on
09/3
0/13
. For
per
sona
l use
onl
y.
report on subset S5 (the most di±cult one), as is the case of the gradient angle
method.5
The Extended Yale B Database13 is a newer version of the database in which the
face images of 28 subjects were added. Several novel algorithms have been evaluated
in the extended database. Some of the most distinguishable are the ones based on
subspaces created from the quotient image, which compensate for illumination
variations in the HSV color space (QI HSV),32 the Canonical Stiefel Quotient
(CSQ)15 and the classi¯cation based on scattered representations (SRC).34 Some of
them have presented their results with the same subdivision of the database pre-
sented before, others report only the average recognition rate obtained. Table 9
compares these methods with the proposal.
From comparison in Tables 8 and 9 it can be concluded that the proposed
scheme, which combines the two photometric normalization methods, outperforms
most of the more relevant methods that have been proposed to deal with the
illumination problem on face recognition, achieving very good results on Yale B
database.
Table 8. Correct classi¯cation error (%) for di®erent algorithms onYale B database.
S1 S2 S3 S4 S5
Illumination ratio image35 0 0 3.3 18.6 —
Linear subspaces3 0 0 0 15.0 —
Illumination cones9 0 0 0 8.6 —
9 light points13 0 0 0 2.8 —
Gradient angle5 0 0 0 1.4 —
Quotient illumination relighting25 0 0 0 9.4 17.5
QI HSV32 0 0 0 8.3 15.7
GlobalDCT PCA6 0 0 0 0.18 1.71
PSþLDCT 0 0 0 0 0.51
Table 9. Comparison of recognition rates (%) of di®erentalgorithms on Extended Yale B Database.
S1 S2 S3 S4 S5 Average
CGHP15— — 54.28 32.63 15.65 —
Geodesic15 — — 78.60 63.71 29.30 —
CSQ15— — 99.78 97.88 51.78 —
QI HSV32 100 100 93.75 90.63 84.37 93.75PS LTP29 100 100 98.0 99.2 94.1 —
NN34— — — — — 90.7
NS34 — — — — — 94.1
SVM34— — — — — 97.7
SRC34— — — — — 98.1
PSþLDCT 100 100 99.06 99.34 96.53 98.98
Photometric Normalization for Face Recognition using Local DCT
1360005-23
Int.
J. P
att.
Rec
ogn.
Art
if. I
ntel
l. 20
13.2
7. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by U
NIV
ER
SIT
Y O
F O
TA
GO
on
09/3
0/13
. For
per
sona
l use
onl
y.
9. Conclusion
A new face image photometric normalization method based on the local DCT in the
logarithmic domain has been proposed. A low pass version of the image is subtracted
from the original face image to compensate for illumination variations. To construct
the low pass image, a local DCT is applied to the original image in the logarithmic
domain. A modi¯ed DC term and low-frequency DCT coe±cients are used to
reconstruct the illumination compensating image by applying the inverse DCT.
The proposed LDCT photometric normalization process in conjunction with the
MLBP+LDA classi¯cation method was tested on the XM2VTS face database.
Compared to other preprocessing algorithms, our method achieved a very good
performance with a total error rate very similar to that produced by the PS method,
the winning algorithm on the Dark set of the database as shown on Table 2.
Despite the similarities in the average error rates of PS and LDCT, an in-depth
analysis of the two preprocessing methods revealed notable di®erences in their be-
havior. The diversity in the observed performance of these two methods on indi-
vidual images motivated a new recognition framework based on score level fusion.
The proposed classi¯er fusion scheme, involving the LDCT photometric normaliza-
tion method and PS achieved a very good performance on all data sets of the
XM2VTS database, regardless of whether the images were a®ected by illumination
variations or not. The method was compared with the state-of-the-art systems tested
on the XM2VTS database, and found to be comparable with the best ever method
reported on the Dark set of the database, which requires training on poorly illumi-
nated images and degrades on good quality images. Moreover, the method was tested
in a face identi¯cation framework on Yale B database and outperforms the most
relevant approaches tested on it. The practical advantage of our approach which
is applicable without the need for any data collection and training is extremely
valuable.
Acknowledgment
This work was supported in part by the EU-funded Mobio project grant IST-214324
and TSB grant TP/6/ICT/6/S/K15331.
References
1. A. Abbas, M. I. Khalil, S. AbdelHay and H. M. A. Fahmy, Illumination invariant facerecognition in logarithm discrete cosine transform domain, Int. Conf. Image Processing(ICIP) (2009), pp. 4157�4160.
2. T. Ahonen, A. Hadid and M. Pietik~ainen, Face recognition with local binary patterns,Europan Conf. Computer Vision (ECCV 2004) (2004), pp. 469�481.
3. P. N. Belhumeur and D. J. Kriegman, What is the set of images of an object under allpossible illumination conditions? Int. J. Comput. Vis. 28(3) (1998) 245�260.
4. C. Chan, J. Kittler and K. Messer, Multi-scale local binary pattern histograms for facerecognition, Adv. Biometrics 4642 (2007) 809�818.
H. Mendez-V�azquez et al.
1360005-24
Int.
J. P
att.
Rec
ogn.
Art
if. I
ntel
l. 20
13.2
7. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by U
NIV
ER
SIT
Y O
F O
TA
GO
on
09/3
0/13
. For
per
sona
l use
onl
y.
5. H. F. Chen, P. N. Belhumeur and D. W. Jacobs, In search of illumination invariants,IEEE Conf. Computer Vision and Pattern Recognition (2000), pp. 254�261.
6. W. Chen, M. J. Er and S. Wu, Illumination compensation and normalization for robustface recognition using discrete cosine transform in logarithm domain, IEEE Trans. Syst.Man, Cybern. B 36(2) (2006) 458�466.
7. T. Chen, W. Yin, X. S. Zhou, D. Comaniciu and T. S. Huang, Illumination normalizationfor face recognition and uneven background correction using total variation based imagemodels, IEEE Computer Society Conf. Computer Vision and Pattern Recognition(CVPR), 2 (2005) 532�539.
8. B. Du, S. Shan, L. Qing and W. Gao, Empirical comparisons of several preprocessingmethods for illumination insensitive face recognition, IEEE Int. Conf. Acoustics, Speech,and Signal Processing (ICASSP '05), 2 (2005) 981�984.
9. A. S. Georghiades, P. N. Belhumeur and D. J. Kriegman, From few to many: Illuminationcone models for face recognition under variable lighting and pose, IEEE Trans. PatternAnal. Mach. Intell. 23(6) (2001) 643�660.
10. R. Gross and V. Brajovic, An image preprocessing algorithm for Illumination invariantface recognition, 4th International Conf. Audio-and Video-Based Biometrie Person Au-thentication (AVBPA03) (2003), pp. 10�18.
11. G. Heusch, Y. Rodriguez and S. Marcel, Local binary patterns as an image preprocessingfor face authentication, in FGR'06: Proc. 7th Int. Conf. Automatic Face and GestureRecognition (2006), pp. 9�14.
12. J. Kittler, M. Hatef, R. P. W. Duin and J. Matas, On combining classi¯ers, IEEE Trans.Pattern Anal. Mach. Intell. 20 (1998) 226�239.
13. K. Ch. Lee, J. Ho and D. J. Kriegman, Acquiring linear subspaces for face recognitionunder variable lighting, IEEE Trans. Pattern Anal. Mach. Intell. 27(5) (2005) 684�698.
14. H. F. Liau and D. Isa, New illumination compensation method for face recognition, Int. J.Comput. Netw. Secur. 2(3) (2010) 5�12.
15. Y. M. Lui, J. R. Beveridge and M. Kirby, Canonical stiefel quotient and its application togeneric face recognition in illumination spaces, BTAS09: Third IEEE Int. Conf. Bio-metrics Theory, Applications and Systems (2009), pp. 1�8.
16. S. Marcel, Y. Rodriguez and G. Heusch, On the recent use of local binary patterns for faceauthentication, Int. J. Image Video Process.: Special Issue on Facial Image ProcessingIDIAP-RR06-34 (2006).
17. H. M�endez-V�azquez, E. Garcia and Y. Condes, A new combination of local appearancebased methods for face recognition under varying lighting conditions, in Progress inPattern Recognition, Image Analysis and Applications, Lecture Notes in ComputerScience, Vol. 5197 (Springer, 2008), pp. 535�542.
18. H. M�endez-V�azquez, J. Kittler, C. H. Chan and E. García, On combining local DCT withpreprocessing sequence for face recognition under varying lighting conditions, in Progressin Pattern Recognition, Image Analysis and Applications, Lecture Notes in ComputerScience, Vol. 6419 (Springer, 2010), pp. 410�417.
19. K. Messer, J. Kittler, J. Short, G. Heusch, F. Cardinaux, S. Marcel, Y. Rodriguez,S. Shan, Y. Su, W. Gaod and X. Chen, Performance characterisation of face recognitionalgorithms and their sensitivity to severe illumination changes, in Proc. Int. Conf. Bio-metrics, ICB (2006), pp. 1�11.
20. K. Messer, J. Matas, J. Kittler and K. Jonsson, Xm2vtsdb: The extended m2vts database,Second Int. Conf. Audio and Video-Based Biometric Person Authentication (1999),pp. 72�77.
21. T. Ojala, M. Pietikäinen and D. Harwood, A comparative study of texture measures withclassi¯cation based on feature distributions, Pattern Recogn 29(1) (1996) 51�59.
Photometric Normalization for Face Recognition using Local DCT
1360005-25
Int.
J. P
att.
Rec
ogn.
Art
if. I
ntel
l. 20
13.2
7. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by U
NIV
ER
SIT
Y O
F O
TA
GO
on
09/3
0/13
. For
per
sona
l use
onl
y.
22. J. Phillips, T. Scruggs, A. O'toole, P. Flynn, K. Bowyer, C. Schott and M. Sharpe, FRVT2006 and ICE 2006 large-scale results, Technical Report, National Institute of Standardsand Technology (NIST), March 2007.
23. Z. Rahman, D. Jobson and G. Woodell, Multi-scale retinex for color image enhancement,Int. Conf. Image Processing (ICIP), Vol. III (1996), pp. 1003�1006.
24. K. R. Rao and P. Yip, Discrete Cosine Transform: Algorithms, Advantages, Applica-tions. Academic Press Professional, Inc. (San Diego, CA, USA, 1990).
25. S. G. Shan, W. Gao, B. Cao and D. B. Zhao, Illumination normalization for robust facerecognition against varying lighting conditions, IEEE Int. Workshop on Analysis andModeling of Faces and Gestures: AMFG, 2003, pp. 157�164.
26. J. Short, J. Kittler and K. Messer, A comparison of photometric normalisation algorithmsfor face veri¯cation, in FGR '04: Proc. 6th Int. Conf. Automatic Face and GestureRecognition (AFGR) (2004), pp. 254�259.
27. J. Short, J. Kittler and K. Messer, Photometric normalisation for component-based faceveri¯cation, in FGR '06: Proc. 7th Int. Conf. Automatic Face and Gesture Recognition,Washington, DC, USA (IEEE Computer Society, 2006), pp. 114�119.
28. T. Stockham, Image processing in the context of a visual model, Proc. IEEE, 60(7),(1972) 828�842.
29. X. Tan and B. Triggs, Enhanced local texture feature sets for face recognition underdi±cult lighting conditions, in IEEE Int. Workshop on Analysis and Modeling of Facesand Gestures: AMFG (2007), pp. 168�182.
30. M. Villegas and R. Paredes, Comparison of illumination normalization methods for facerecognition, Third COST 275 Workshop-Biometric on the Internet (2005), pp. 27�30.
31. H. Wang, S. Z. Li and Y. Wang, Face recognition under varying lighting conditions usingself quotient image, in FGR '04: Proc. 6th Int. Conf. Automatic Face and GestureRecognition (AFGR) (2004), p. 819.
32. Y. H. Wang, X. J. Ning, C. X. Yang and Q.F Wang, A method of illumination com-pensation for human face image based on quotient image, Inform. Sci. 178(12) (2008)2705�2721.
33. A. R. Webb, Statistical Pattern Recognition, 2nd edn. (John Wiley and Sons Ltd, 2002),Chap. 8.3, pp. 266�271.
34. J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry and Yi Ma, Robust face recognition viasparse representation, IEEE Trans. Pattern Anal. Mach. Intell. 31(2) (2009) 210�227.
35. J. Zhao, Y. Su, D. J. Wang and S. W. Luo, Illumination ratio image: Synthesizing andrecognition with varying illuminations, Pattern Recogn. Lett. 24(15) (2003) 2703�2710.
36. X. Zou, J. Kittler and K. Messer, Illumination invariant face recognition: A survey, in FirstIEEE Int. Conf. Biometrics: Theory, Applications, and Systems (BTAS), September 2007.
H. Mendez-V�azquez et al.
1360005-26
Int.
J. P
att.
Rec
ogn.
Art
if. I
ntel
l. 20
13.2
7. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by U
NIV
ER
SIT
Y O
F O
TA
GO
on
09/3
0/13
. For
per
sona
l use
onl
y.
Heydi Mendez-Va�zquezreceived her B.S. degree(with Honors) in Com-puting in 2005 and herPh.D. in Automatic andComputing in 2010, bothfrom the PolytechnicUniversity of Havana(CUJAE), Cuba. Since2005, she has been withthe Biometric Research
Group of the Advanced Technologies Applica-tion Center (CENATAV), in Cuba. Her re-search interests include computer vision,biometrics, face recognition and image proces-sing. She is a member of the Cuban Associationof Pattern Recognition (ACRP) and the Inter-national Association of Pattern Recognition(IAPR). She received a best student paperaward in the 5th International Summer Schoolon Biometrics endorsed by the IAPR, inAlghero, Italy, 2008 and was awarded one ofthe annual prizes by the Cuban Academy ofSciences in 2011.
Josef Kittler receivedhis B.A., Ph.D., and D.Sc.degrees from the Univer-sity of Cambridge in 1971,1974, and 1991, respec-tively. He heads the Cen-tre for Vision, Speech andSignal Processing at theSchool of Electronics andPhysical Sciences, Uni-versity of Surrey, U.K. He
teaches and conducts research in the subject areaof machine intelligence, with a focus on bio-metrics, video and image database retrieval, au-tomatic inspection, medical data analysis, andcognitive vision. He published a Prentice-Halltextbook, Pattern Recognition: A StatisticalApproach and several edited volumes, as well asmore than 600 scienti¯c papers, including morethan 170 journal papers. He serves on the Edi-torial Board of several scienti¯c journals inpattern recognition and computer vision.
Chi Ho Chan receivedhis Ph.D. from the Uni-versity of Surrey, U.K. in2008. He is currently a re-search fellow at the Centrefor Vision, Speech andSignal Processing, Uni-versity of Surrey. From2002 to 2004, he served asa researcher at ATR In-ternational (Japan). His
research interests include image processing, pat-tern recognition, biometrics, and vision-basedhuman-computer interaction.
Edel García-Reyes re-ceived his B.S. degree inMathematics and Cyber-netic from Havana Uni-versity, in 1986 andreceived his Ph.D. inTechnical Sciences fromthe Technical Military In-stitute Jos�e Martí" of Ha-vana, in 1997. Currently,he is working as a re-
searcher at the Advanced Technologies Applica-tion Center (CENATAV). Dr. Edel has focusedhis research on digital image processing of remotesensing data, biometrics and video surveillance.He has participated as a member in both technicalcommittees and experts groups and has been areviewer for di®erent events and journals such asPattern Recognition Letter, Journal of Real-TimeImage Processing, etc. Dr. Edel worked at theCuban Institute of Geodesy and Cartography(1986�1995) and in the Enterprise Group Geo-Cuba (1995�2001) where he directed the Agencyof the Centre of Data and Computer Science ofGeocuba — Investigation and Consultancy(1998�2001).
Photometric Normalization for Face Recognition using Local DCT
1360005-27
Int.
J. P
att.
Rec
ogn.
Art
if. I
ntel
l. 20
13.2
7. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by U
NIV
ER
SIT
Y O
F O
TA
GO
on
09/3
0/13
. For
per
sona
l use
onl
y.