photometric normalization for face recognition using local discrete cosine transform

PHOTOMETRIC NORMALIZATION FOR FACE

RECOGNITION USING LOCAL DISCRETE

COSINE TRANSFORM

HEYDI MENDEZ-V �AZQUEZ

Pattern Recognition Department

Advanced Technologies Application Center7a # 21812 b/218 and 222, Siboney, Playa, P.C. 12200, Havana, Cuba

[email protected]

JOSEF KITTLER* and CHI HO CHAN†

Center for Vision, Speech and Signal Processing

University of Surrey, Guildford, Surrey, GU2 7XH, UK*[email protected]

†[email protected]

EDEL GARCÍA-REYES

Pattern Recognition DepartmentAdvanced Technologies Application Center

7a # 21812 b/218 and 222, Siboney, Playa, P.C. 12200, Havana, Cuba

[email protected]

Received 30 June 2011

Accepted 10 August 2012Published 29 May 2013

Variations in illumination is one of major limiting factors of face recognition system perfor-

mance. The e®ect of changes in the incident light on face images is analyzed, as well as itsin°uence on the low frequency components of the image. Starting from this analysis, a new

photometric normalization method for illumination invariant face recognition is presented.

Low-frequency Discrete Cosine Transform coe±cients in the logarithmic domain are used in a

local way to reconstruct a slowly varying component of the face image which is caused byillumination. After smoothing, this component is subtracted from the original logarithmic image

to compensate for illumination variations. Compared to other preprocessing algorithms, our

method achieved a very good performance with a total error rate very similar to that produced

by the best performing state-of-the-art algorithm. An in-depth analysis of the two preprocessingmethods revealed notable di®erences in their behavior, which is exploited in a multiple classi¯er

fusion framework to achieve further performance improvement. The superiority of the proposal

is demonstrated in both face veri¯cation and identi¯cation experiments.

Keywords : Face recognition; illumination variations; photometric normalization; local discrete

cosine transform.

International Journal of Pattern Recognitionand Arti¯cial Intelligence

Vol. 27, No. 3 (2013) 1360005 (27 pages)

#.c World Scienti¯c Publishing Company

DOI: 10.1142/S0218001413600057

1360005-1

Int.

J. P

att.

Rec

ogn.

Art

if. I

ntel

l. 20

13.2

7. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by U

NIV

ER

SIT

Y O

F O

TA

GO

on

09/3

0/13

. For

per

sona

l use

onl

y.

http://dx.doi.org/10.1142/S0218001413600057

1. Introduction

Face recognition is one of the most popular biometric techniques. Although a great

number of algorithms have been developed, face recognition is still an open and very

challenging problem, especially in applications where the imaging conditions are

changing. In di®erent face recognition studies it has been shown that variation in

lighting is one of the major limiting factors of face recognition system performance.22

To cope with the problem of face recognition under illumination variation, several

methods have been proposed. Conceptually, they can be grouped into ¯ve main

categories: preprocessing, invariant feature extraction, face image acquisition

modeling, illumination variation learning, and postprocessing. Preprocessing meth-

ods normalize the input face image, aiming to obtain a stable representation of the

face under di®erent lighting conditions. The second approach attempts to extract

facial features invariant to illumination. The third one simultaneously constructs a

3D face model and a lighting model that give rise to the observed image. Once the 3D

face model is estimated, the illumination conditions can be normalized by re-lighting

the model with a canonical illumination source. The fourth alternative involves

collecting a large database of training face images which are representative of a vast

range of illumination conditions. Using such a database, generative or discriminative

models with the capacity to represent face images under all possible imaging con-

ditions can be learnt. Last, but not least, the e®ect of illumination variation can be

minimized by postprocessing techniques which normalize the impact of illumination

changes on the similarity score computed for the query and the target gallery image

pair with that observed for a reference nontarget pairs.

Each of these ¯ve approaches has di®erent merits and may not be appropriate for

some application scenarios. For instance, it may be di±cult to collect su±ciently

large database of images that would be representative of all the possible imaging

conditions. For some veri¯cation applications it may be inconvenient to employ

postprocessing techniques, as a cohort of nontarget gallery images may not be

available. There are also di®erences in the computational complexities of the

potential solutions. By the same token, these techniques are not mutually exclusive

and it may be bene¯cial to combine a number of these disparate measures to

maximize the biometric system resilience to variations in lighting.

The above classi¯cation ignores the near infrared imaging approach, which

obviates the e®ect of illumination variation by transposing the face recognition

problem outside the visible light spectrum by means of active illumination. Although

in the case of active imaging the problem of illumination variation is considerably

alleviated, the techniques discussed in this paper are still relevant and are likely to

improve the system performance. However, their main signi¯cance is in the context

of conventional, passive face imaging in the visible light spectrum and we shall

concentrate on this challenging problem. Although all the above approaches to

illumination invariance in face recognition and veri¯cation are important in their

own right, the focus of this paper is on the preprocessing methods. For the discussion

H. Mendez-V�azquez et al.

1360005-2

Int.

J. P

att.

Rec

ogn.

Art

if. I

ntel

l. 20

13.2

7. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by U

NIV

ER

SIT

Y O

F O

TA

GO

on

09/3

0/13

. For

per

sona

l use

onl

y.

on other facets of illumination invariant face recognition the reader is referred to

Refs. 4 and 36.

Preprocessing methods are usually e±cient and have been the most widely used in

real applications.8,36 There are a lot of preprocessing methods proposed in the lit-

erature: histogram equalization, gamma intensity correction, homomorphic ¯ltering,

multi-scale retinex, self-quotient image and anisotropic smoothing10,23,28,31 are among

the most used preprocessing techniques for face recognition, but newer approaches

like the total variation quotient image7 and the preprocessing sequence proposed by

Tan and Triggs29 cope better with illumination variations. Most of these methods

have been compared with each other, and the main conclusion that can be drawn is

that the better they deal with the illumination problem, the less stable behavior they

exhibit on images obtained in normal lighting conditions and in the presence of other

kinds of variations in the face image data.8,26,29 Most of the ¯lters and transforming

functions that are used to remove illumination variations usually introduce negative

e®ects or remove valuable discriminatory information from normally illuminated

images. Better approaches are still needed in order to best balance the advantages of

preprocessing for illumination degraded images and the loss of performance on

normally illuminated images.

Most of the preprocessing methods can be used either in a holistic or local way,

however, it has been shown that local approaches confront the illumination problems

better than the global ones.27,30

In this work we extend and describe in details a new photometric normalization

method using the local Discrete Cosine Transform (DCT) in the logarithmic domain

to compensate for the illumination e®ect while preserving the face discriminatory

information, that was ¯rst introduced in Ref. 18. A photometrically normalized face

image is obtained by subtracting a compensation term from the original image. The

compensation term is estimated by smoothing the image constructed using low-

frequency coe±cients extracted from the local DCT of the original image in the

logarithmic domain. The proposed method was tested on the XM2VTS face data-

base and compared with state-of-the-art photometric normalization methods. Our

method (LDCT) and the preprocessing sequence (PS) exhibit a similar performance

as measured in terms of average error rates, and both are superior to other pho-

tometric normalization methods. An in-depth analysis of the two methods (LDCT

and PS) revealed di®erences in their performance on individual images, suggesting

that the methods provide complementary information. Drawing on their diversity,

we propose to use them jointly to improve the results for face recognition under

varying lighting conditions, while at the same time ensuring that good results

are obtained for normally illuminated images. Signi¯cant improvements in perfor-

mance are experimentally demonstrated in both face veri¯cation and identi¯cation

frameworks.

This paper is organized as follows. Section 2 analyzes the e®ect of the illumination

changes on face images under the commonly used Lambertian model. Section 3

Photometric Normalization for Face Recognition using Local DCT

1360005-3

Int.

J. P

att.

Rec

ogn.

Art

if. I

ntel

l. 20

13.2

7. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by U

NIV

ER

SIT

Y O

F O

TA

GO

on

09/3

0/13

. For

per

sona

l use

onl

y.

introduces the proposed photometrical normalization method. Section 4 describes

the experimental setup adopted to evaluate it. Section 5 discusses the experiments

conducted in order to select the parameters for the proposed method. Section 6

compares the proposed method with some of the state-of-the-art photometric nor-

malization methods. Section 7 presents a novel face veri¯cation scheme which

combines the outputs of face recognition experts employing the proposed photo-

metric normalization and the PS method, and reports on the experimental results. In

Sec. 8, additional experiments of the proposal in a face identi¯cation framework are

reported and compared with state-of-the-art methods in a face database with great

variations in illumination conditions. Finally, Sec. 9 concludes the paper.

2. Compensation for Illumination Variations in the Low Frequencies

Let us consider an image acquisition system deploying a conventional camera. We

assume that the imaged objects have a Lambertian surface. This assumption is

reasonably well justi¯ed in the case of faces. We shall ignore the e®ect of interface

re°ection, which would distort only a small part of the face image due to the satu-

ration of the camera. In any case, in regions giving rise to total re°ection, the spectral

content of the re°ected light would be dominated by the illuminant, rather than the

face skin, and would not provide useful information for discriminatory purposes.

Suppose the scene is illuminated by spatially invariant illumination source of

spectral distribution eð�Þ where � represents the wavelength of the incident light.

Under the Lambertian assumption, the light emitted from the scene will be a func-

tion of the material properties of the scene objects, their albedo, which we denote by

�ðx; y; �Þ, and the relative angles between the direction of illumination and the

normal to the surface patch imaged by the ðx; yÞ pixel of a camera sensor. The e®ect

of the geometry will be to scale down the incident light by a factor sðx; yÞ. Theoutput of the sensor at pixel position ðx; yÞ will then be given by

Iðx; yÞ ¼Z �2

�1

�ð�Þ�ðx; y; �Þsðx; yÞeð�Þd�; ð1Þ

where �ð�Þ is the spectral response of the sensor and �i; i ¼ 1; 2 are the limits of the

visible frequency spectrum.

Assuming the response of the sensor is °at, �ð�Þ ¼ c1, and that the illuminant also

has a broad °at spectrum which can be approximated by eð�Þ ¼ c2, the intensity

image acquired by the camera will be given by

Iðx; yÞ ¼ c1c2sðx; yÞZ �2

�1

�ðx; y; �Þd� ð2Þ

which can be written as

Iðx; yÞ ¼ Lðx; yÞRðx; yÞ; ð3Þ


1360005-4

Int.

J. P

att.

Rec

ogn.

Art

if. I

ntel

l. 20

13.2

7. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by U

NIV

ER

SIT

Y O

F O

TA

GO

on

09/3

0/13

. For

per

sona

l use

onl

y.

where Lðx; yÞ is the intensity of the incident light and Rðx; yÞ is the re°ectance

property of the surface material.

The re°ectance, which represents the shape and texture of the surface, is unique

for each face and is what allows, ideally, to discriminate between an individual or

another. If we have the pixel intensity values of an image and the spectral distri-

bution of the incident light (luminance) at each point is known, then the re°ectance

can be recovered. However, the luminance value will vary according to the geometry

of the scene, the angle of incidence of the illuminant and the viewing angle. A priori

knowledge of all these factors only is possible in a very controlled laboratory setting.

It is not possible to have such information in real conditions, hence the identi¯cation

of a person starting from a typical face image turns a di±cult task. Methods to

estimate the luminance component or to compensate for variations in lighting con-

ditions, in a way that the discriminatory features related to re°ectance can be

recovered, are needed.

Most of photometric normalization methods are based on Eq. (3), trying to

eliminate the luminance value (L) and recover the re°ectance (R) to discriminate

between di®erent faces. Successful methods, such as homomorphic ¯ltering, make the

assumption that the luminance changes slowly over the scene and is therefore a low

frequency phenomenon, whereas re°ectance, which characterizes skin texture, con-

tributes a higher frequency content. Due to the multiplicative nature of the image

generation model in Eq. (3), these two information sources can be easily separated by

¯ltering in the logarithmic space, as in Eq. (4). Hence, several methods include a

logarithmic transformation step to eliminate the e®ect of lighting.

log Iðx; yÞ ¼ logRðx; yÞ þ logLðx; yÞ: ð4Þ

Although these photometric normalization methods are demonstrably e®ective,

the underlying premise is somewhat °awed. Referring to Eq. (2), the luminance term

is a byproduct of the incident light and the surface orientation. For a surface with

slowly varying surface normal, the luminance term will be a low frequency function.

Indeed, to a ¯rst approximation, many researchers model human head geometrically

as a cylinder. Such a 3D pro¯le would give rise to a smooth, low frequency luminance

image which could be easily ¯ltered out. However, face contains morphological fea-

tures such as eyes, nose, mouth, and wrinkles, which inject high frequency compo-

nents to the luminance function. The shading caused by these surface undulations

contribute information about the face 3D structure which should be preserved to aid

face discrimination.

Similarly, the re°ectance term contains low and high frequency information. The

dominant skin characteristic is of low frequency. The skin texture is basically

homogeneous, changing very slowly over the face surface. However, in the locality of

facial features, such as eye brows, lips, eyes, skin defects, beauty spots and facial hair,

the albedo changes rapidly, introducing high frequency signal to the re°ectance

function.


1360005-5

Int.

J. P

att.

Rec

ogn.

Art

if. I

ntel

l. 20

13.2

7. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by U

NIV

ER

SIT

Y O

F O

TA

GO

on

09/3

0/13

. For

per

sona

l use

onl

y.

The above analysis suggests that both the luminance and re°ectance components

of a face image contain low and high frequencies. Clearly this makes it di±cult to

separate the luminance e®ect from re°ectance purely on the basis of frequency

content. However, in both cases the high frequency content is the one containing the

most important information for discriminative purposes and the low frequency one

varies with the lighting variations. Therefore, when eliminating low-frequency

components of the image, variations in lighting are compensated, but these do not

correspond only to luminance, but also contain information from the re°ectance. To

maximize the e®ectiveness of photometric normalization, the low pass ¯lter has to be

designed carefully, so that the high frequency discriminatory information content is

not compromised.

In order to identify a suitable compensation method, let us investigate the e®ect of

illumination conditions on the observed face image. In general the illuminant can be

a very complex function. For simplicity, we shall assume that the scene is illuminated

by a distant point source. We are less concerned with variations in intensity, which

can be handled simply by scaling. More challenging is the e®ect of changes in the

incident angle of the light.

In Fig. 1, some images corresponding to a 3D face surface with constant albedo,

for di®erent incident light angles, are shown. Di®erences between images correspond

to changes in the luminance component, since they are from the same face model

with constant albedo. As can be appreciated, although the luminance images mostly

exhibit the behavior of the incident illumination, they also depict information about

the face surface. The e®ect of changes in the angle of the incident light over the 3D

face surface can be appreciated in Fig. 2, which plots the variance of the log lumi-

nance function, as a function of the incident light angle. We can see from the graph

that for frontal, or near frontal illumination, the variance is low, but it increases

dramatically for more pronounced side illumination.

From above observations, the changes due to illumination, although slowly

varying, may have in°uence on a broad frequency spectrum. Thus methods

that strive to minimize the illumination e®ect by low frequency suppression,6,26,29

will either fail to eliminate the e®ect of illumination fully, or it will compromise

the information content deemed useful for biometric analysis, depending on the

bandwidth.

It follows that to maximize the e®ectiveness of photometric normalization, is more

convenient to estimate and subtract carefully the low frequency content a®ected by

Fig. 1. Sample luminance face images with constant albedo at di®erent incident illumination angles.


1360005-6

Int.

J. P

att.

Rec

ogn.

Art

if. I

ntel

l. 20

13.2

7. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by U

NIV

ER

SIT

Y O

F O

TA

GO

on

09/3

0/13

. For

per

sona

l use

onl

y.

illumination instead of radically suppress it, to not compromise the discriminatory

information. Moreover, it must be taken into account that image variations due to

changes in the incident light do not a®ect uniformly all face regions,27 thus it is not

convenient to apply a homogeneous ¯lter to the complete image.

To put photometric normalization on a proper footing, we adopt an alternative

model for image generation. We take purely signal processing view point and con-

sider a face image to be generated by amplitude modulation. Let lðx; yÞ be a low

frequency base signal and hðx; yÞ the information conveying high frequency signal.

The slowly varying signal lðx; yÞ captures jointly the low frequency luminance

phenomenon, re°ecting the global shape of the face and the low frequency variations

in albedo. hðx; yÞ, in the ¯rst instance, represents the rapid luminance changes

produced by the surface properties of structural facial features, but in addition it

models micro changes in albedo associated with surface markings and facial hair.

Function lðx; yÞ is assumed to be non-negative, whereas the modulating function

hðx; yÞ is bound to the interval ½�1; 1�. The image Iðx; yÞ is generated by amplitude

modulation as

Iðx; yÞ ¼ lðx; yÞ½hðx; yÞ þ 1�: ð5Þ

Under the constraints imposed on the modulating signal, Eq. (5) will produce a

non-negative composite signal. The model in Eq. (5) is a quite realistic generator of

face images. In poorly lit scenes the intensity variations induced by surface undu-

lations and skin texture will be low, but in well lit environments the overall image

brightness will be much higher and local image changes associated with facial fea-

tures will have much greater contrast. The model does not di®erentiate between the

face image appearance caused by structure and skin texture. In any case, for intensity

0 10 20 30 40 50 60 700

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

light angle

varia

nce

Fig. 2. Variance of the log luminance for a 3D face with constant albedo, as a function of the incidentlight angle.


1360005-7

Int.

J. P

att.

Rec

ogn.

Art

if. I

ntel

l. 20

13.2

7. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by U

NIV

ER

SIT

Y O

F O

TA

GO

on

09/3

0/13

. For

per

sona

l use

onl

y.

images, these are inseparable and the model correctly re°ects this. The implicit

assumption about the model is that function lðx; yÞ is sensitive to illumination

changes, whereas function hðx; yÞ is illumination invariant.

The multiplicative nature of amplitude modulation allows us to separate the base

signal from the person speci¯c face information encoded by the modulating com-

ponent. A logarithmic mapping transforms the composite signal into an additive

mixture:

iðx; yÞ ¼ log Iðx; yÞ ¼ log lðx; yÞ þ log½hðx; yÞ þ 1�: ð6ÞUnder the assumption that the base signal can be estimated, the low frequency

variations can be removed from the compound signal to achieve photometric nor-

malization.

3. The Photometric Normalization Algorithm

In order to facilitate the optimization of the ¯ltering, we shall represent the image

content in the frequency domain. Di®erent methods can be used to transform an

image from spatial domain to frequency domain. The DCTmethod is commonly used

in signal and image processing because of its simplicity, low computational com-

plexity and better energy-compaction, being asymptotically equivalent to the

Karhunen�Loeve Transform (KLT) for Markov-1 signals with a correlation coe±-

cient close to one.24

The DCT transform of an M � N image, is de¯ned as:

Cðu; vÞ ¼ �ðuÞ�ðvÞXM�1

x¼0

XN�1

y¼0

iðx; yÞ � cos�ð2xþ 1Þu

2M

� �cos

�ð2yþ 1Þv2N

� �; ð7Þ

where

�ðuÞ ¼

1ffiffiffiffiffiffiM

p ; u ¼ 0

ffiffiffiffiffiffi2

M

r; u ¼ 1; . . . ;M � 1;

8>>><>>>:

ð8Þ

and

�ðvÞ ¼

1ffiffiffiffiffiN

p ; v ¼ 0

ffiffiffiffiffi2

N

r; v ¼ 1; . . . ;N � 1:

8>>><>>>:

ð9Þ

Amethod using the DCT to compensate for illumination variations was presented

in Ref. 6. The authors proposed setting to zero the low-frequency DCT coe±cients of

an image in the logarithm domain as an approximation of the compensation term

that it is necessary to subtract from the image. This method outperformed many of


1360005-8

Int.

J. P

att.

Rec

ogn.

Art

if. I

ntel

l. 20

13.2

7. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by U

NIV

ER

SIT

Y O

F O

TA

GO

on

09/3

0/13

. For

per

sona

l use

onl

y.

the existing methods dealing with illumination variations when comparing on Yale B

database.6 Recently, di®erent extensions of this method have been proposed.1,14 In

all cases the aim has been to increase e±ciency; none of them have shown better

classi¯cation results. An example of a photometrically normalized face image using

this method is shown in Fig. 3(b).

By setting the low frequency coe±cients of DCT to zero, we are able to create an

ideal ¯lter with a perfect transition band. However, this type of ¯ltering creates

ripples in the spatial domain. This e®ect is illustrated in Fig. 4, where an image

containing a Dirac impulse intensity function, in Fig. 4(a), is transformed by a global

DCT and low frequency coe±cients zeroed. The image reconstructed to the spatial

domain is shown in Fig. 4(b), in which the ripple extends over the whole image. For

face images these ripples would have a perturbing e®ect on the subsequent methods

of face description and classi¯cation. To minimize the e®ect of this phenomenon, we

shall adopt a local ¯ltering method as advocated in Ref. 17. The bene¯cial e®ect of

local processing is apparent from Fig. 4(c) where the ripple is con¯ned to the local

window.

(a) (b) (c)

Fig. 3. (a) Original image and photometric normalized image using (b) global DCT and (c) local DCT.

(a) (b) (c)

Fig. 4. A representation of the e®ect of removing the low frequencies of an image in a global way and in a

local way. (a) Original image, (b) global process and (c) local process.


1360005-9

Int.

J. P

att.

Rec

ogn.

Art

if. I

ntel

l. 20

13.2

7. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by U

NIV

ER

SIT

Y O

F O

TA

GO

on

09/3

0/13

. For

per

sona

l use

onl

y.

The method in Ref. 17, which involved discarding low-frequency DCT coe±cients

in the logarithmic domain in a local way, was showed to improve the results obtained

using the global DCT. In this method, the face image is divided in regular regions and

low-frequency DCT coe±cients of each region are discarded. Uniform Local Binary

Pattern (LBP) histograms2 are then computed for each region and used for classi-

¯cation. Note that the same region division is used for the normalization with the

DCT and for the classi¯cation step using the LBP, so in this case, the photometric

normalization is tightly coupled with the image structure used for feature extraction

and classi¯cation. This skillfully avoids the corrupting e®ect on facial image repre-

sentation of the blockiness of the photometrically normalized image obtained using a

local DCT approach, which can be quite pronounced as shown in Fig. 3(c). If the

image block structure used for preprocessing and feature extraction are incongruent,

the e®ect of the high frequency information injected by the block structure can be

devastating and can negate any positive bene¯ts of photometric normalization.

Unfortunately, the number of face image representation methods where this

kernel congruency between photometric normalization and feature extraction exists

naturally is severely limited. Thus the objective of our work is to develop a photo-

metric normalization method which retains its local sensitivity without introducing

any blocky artefact. Such a method can be used with any feature descriptor or

classi¯er regardless of image partitioning.

Our proposed method is a modi¯cation of the early technique that deals with the

blockiness artifact and as a result it makes the photometrically normalized images

usable by any general face representation approach. Moreover, we intend not to

radically suppress the low frequency information.

Based on the model presented in Eq. (5), the ¯rst step of the method is to

transform the image to the log intensity domain to obtain an additive mixture

following Eq. (6). The second step consists in estimating the low frequency compo-

nent in the log domain, which is based on the use of the local DCT. The recon-

structed low frequency image is then subtracted from the image in the log domain to

obtain the photometrically normalized image, which contains information only about

the luminance changes produced by the surface properties of structural facial fea-

tures and the microchanges in albedo. The photometric normalized image can be

restored to the original spatial domain, however, this is only a scale transformation

and it has shown than can introduce incorrect adjustments to the normalized

values.6 Taking this into account, in the proposal, the photometric normalized

images will be recognized directly from the logarithmic domain.

In this process, estimating the compensation term based on low frequency com-

ponents is fundamental to obtain a suitable photometric normalization.

3.1. Estimating the low frequency component

Since we want to make use of the local information instead of the global one, the

face image is divided into rectangular blocks and the DCT is computed over them.


1360005-10

Int.

J. P

att.

Rec

ogn.

Art

if. I

ntel

l. 20

13.2

7. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by U

NIV

ER

SIT

Y O

F O

TA

GO

on

09/3

0/13

. For

per

sona

l use

onl

y.

Using only the low-frequency coe±cients of each block and setting to zero the

remaining ones, a low pass version of the log image can be reconstructed by applying

the inverse DCT.

In a DCT block, the top left coe±cients, selected in a zig-zag scan manner cor-

respond to the low frequency information. However, the Cð0; 0Þ coe±cient, usually

called DC coe±cient, is related to the mean intensity values of the block, repre-

senting most of the energy of the region. This can be appreciated in Eq. (10), which is

a simpli¯cation of Eq. (7), considering that the cosine of zero is one.

Cð0; 0Þ ¼ 1ffiffiffiffiffiffiM

p ffiffiffiffiffiN

p �XM�1

x¼0

XN�1

y¼0

iðx; yÞ: ð10Þ

The e®ect of the DC coe±cient can be seen in Fig. 5. In the ¯rst row we show the

DC values of each block of the logarithmically transformed images of 3D face models

with a constant albedo for di®erent persons, while the second row shows the DC

values for images of the same model with di®erent incident lighting. All images in the

¯gure were resampling in a way that every pixel represent de DC value corresponding

to an image block. As can be appreciated, although this coe±cient re°ects a local

energy, DC coe±cients of a face image are highly related to the incident illumination.

Because of its local computation, they all together contain some high frequency

information associated to the luminance changes of structural facial features such as

the nose, eyes and lips.

Since we want to estimate the low frequency component to subtract it from the

image in the log domain, it is necessary to modify the DC coe±cient of each block in a

way that re°ects changes in the incident illumination. From Fig. 5, it is apparent,

that in images with normal incident illumination, the DC coe±cients of the

respective blocks show a little dispersion in their values. In contrast, if the lighting

variation increases, the di®erence in the DC coe±cient values become greater. This

(a)

(b)

Fig. 5. A representation of the DC coe±cient of each block for luminance images with constant albedo.

(a) Di®erent models with the same incident lighting. (b) Same model at di®erent angles of incident

lighting.


1360005-11

Int.

J. P

att.

Rec

ogn.

Art

if. I

ntel

l. 20

13.2

7. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by U

NIV

ER

SIT

Y O

F O

TA

GO

on

09/3

0/13

. For

per

sona

l use

onl

y.

can be measured objectively in terms of DC coe±cients variance. In Fig. 6, the

variance of the DC coe±cients of the log luminance of a 3D face surface with constant

albedo is plotted as a function of the incident light angle. It is apparent that the

larger the illumination variation, the greater the dispersion of the DC values.

If a constant value, representing a \good" DC value, is subtracted from each DC

coe±cient, the obtained value would represent the information injected to this value

due to the variation in the lighting. Then, to obtain the image which represents the

low frequency component, we use the low-frequency DCT coe±cients of each block,

replacing the DC one for its original computed value minus a constant reference

value.

The reference value was determined by computing the mean value of the DC

coe±cient in a set of 80 face images of di®erent subjects under the same frontal

incident light. In Fig. 7, the mean value obtained for each image and the general

mean value are plotted. As can be appreciated, for the same incident illumination,

the mean of the DC values of images from di®erent subjects is very similar. The

general mean value obtained for this frontal lighting will be used as a reference value

to normalize all face images.

The reconstructed low pass image exhibits a block e®ect produced by the image

subdivision. In order to reduce the block division e®ect, we apply a low pass smoothing

¯lter to the reconstructed image before subtracting it from the original image in the

logarithmic domain. Using a low pass ¯lter over the reconstructed image, does not

contradict the aim of suppressing the low frequencies, on the contrary, the high

frequencies that can appear in the estimated compensation image are eliminated.

The proposed procedure can be summarized in the following steps: (1) to apply

the logarithmic transformation to the original face image, (2) to reconstruct the low

0 10 20 30 40 50 60 700

1

2

3

4

5

6

7x 10

5

light angle

varia

nce

Fig. 6. Variance of the DC coe±cient as function of the incident illumination.


1360005-12

Int.

J. P

att.

Rec

ogn.

Art

if. I

ntel

l. 20

13.2

7. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by U

NIV

ER

SIT

Y O

F O

TA

GO

on

09/3

0/13

. For

per

sona

l use

onl

y.

pass version of the log image using the low-frequency DCT coe±cients and modifying

the local DC value, (3) to smooth the resulting image and (4) to subtract the

smoothed compensation term from the original image in the logarithmic domain. The

e®ect of each step is evident in Fig. 8, showing at the end the photometrically

normalized image obtained with the proposed method.

4. Experimental Setup

The XM2VTS database with the Lausanne protocol20 was used to evaluate the

performance of the proposed photometric normalization method. The XM2VTS

database contains 2360 images of 295 subjects, captured in four di®erent sessions.

The database is divided into a training set composed of images of 200 subjects as

clients, an evaluation set (Eval) with images of the same subjects as clients and of 25

additional subjects as imposters, and a test set with 70 subjects as imposters. The

training, evaluation and test sets are composed of images under controlled

0 10 20 30 40 50 60 70 8020

25

30

35

40

45

50

face images

DC

mea

n va

lues

Fig. 7. DC mean values of di®erent face images under the same frontal illumination.

(a) (b) (c) (d) (e)

Fig. 8. An example of the e®ect of each one of the steps of the preprocessing method: (a) original image,(b) logarithm transformation, (c) illumination compensation image with block e®ect, (d) smoothed

compensation image and (e) result image after subtraction.


1360005-13

Int.

J. P

att.

Rec

ogn.

Art

if. I

ntel

l. 20

13.2

7. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by U

NIV

ER

SIT

Y O

F O

TA

GO

on

09/3

0/13

. For

per

sona

l use

onl

y.

illumination conditions. There is an additional \dark" set which contains images

from the same subjects either illuminated from the left or the right.

There are two con¯gurations of the evaluation protocol known as Lausanne

protocol.20 Here, we use Con¯guration I, in which the images for training and

evaluation are from the ¯rst three sessions of acquisition. For training, three images

per person are used, and the number of accesses or comparisons in the other subsets

can be summarized as:

Eval Test Dark

Clients accesses 600 400 800Imposters accesses 40 000 112 000 56 000

Total accesses 40 600 112 400 56 800

The Equal Error Rate (EER) is the point at which the False Rejection Rate (FRR) is

equal to the False Acceptance Rate (FAR). The value obtained by the classi¯cation

method at this point in the Eval set is used as a threshold for the decision of ac-

ceptance or rejection in the Test and Dark sets. On the other hand, the Total Error

Rate (TER) is the sum of FRR and FAR. The lower this value, the better the

recognition performance.In our experiments, all face images were closely cropped to include only the face

region. The extracted face images were geometric normalized by the centers of the

two eyes to a standard size of 120� 144 (width� height) pixels.

4.1. Face description and classi¯cation

The Local Binary Pattern (LBP) operator is used for representing and classifying the

photometrically normalized face images. This face image representation can be

computed directly in the log image domain, as it is invariant to monotonic trans-

formations. This helps to deal with any residual illumination problem.

The original LBP operator, introduced by Ojala et al. in Ref. 21, labels each pixel

of an image with a value called LBP code, which corresponds to a binary number

that represents its relation with the 3� 3 local neighborhood. Di®erent extensions of

the original operator have been used for face recognition.16

The ¯rst and the most prevalently used LBP operator for face recognition was

presented in Ref. 2. In this case, a neighborhood of eight pixels in a radius of two

ð8; 2Þ is used to compute the LBP codes, but only those binary codes with at most

two bitwise transitions from zero to one (01) or vice versa (10), called uniform

patterns, are considered. The face image is divided into rectangular regions and

histograms of the uniform LBP codes are calculated over each of them. The histo-

grams of the regions are concatenated into a single augmented histogram which then

represents the face image. A nearest neighbor classi¯er with the �2 dissimilarity

measure is used to compare the histograms of two di®erent images.

One of the most recent advances of LBP in face recognition is the Multi-Scale

LBP (MLBP) representation with the Linear Discriminant Analysis (LDA).4 First,


1360005-14

Int.

J. P

att.

Rec

ogn.

Art

if. I

ntel

l. 20

13.2

7. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by U

NIV

ER

SIT

Y O

F O

TA

GO

on

09/3

0/13

. For

per

sona

l use

onl

y.

LBP codes at R di®erent radii or scales are computed over the face image, generating

R LBP images. All the LBP images are of the same size and they are divided into the

same number of nonoverlapping rectangular regions. The LBP histograms are

computed for each region in every LBP image, and the histograms corresponding to

the same region at di®erent scales are concatenated into a single vector providing a

multiresolution regional face descriptor. A regional discriminative facial descriptor is

then de¯ned by projecting the multiresolution regional face descriptors into a LDA

space, in which a normalized correlation is used as a similarity measure to compare

the projections corresponding to the same region of two di®erent images. Finally, the

similarities of the di®erent regions are summed up to obtain a global measure of

similarity of two face images.

In this work, both, the traditional LBP using the uniform patterns derived from a

ð8; 2Þ neighborhood and the recent MLBP representation, with the �2 dissimilarity

measure and the normalized correlation in the LDA space, are used to test the

proposed normalization method.

5. Parameters Selection

There are some parameters that can be chosen to optimize the performance of the

proposed algorithm.The ¯rst parameter is the size of the blocks inwhich the images are

divided to apply the local DCT. If the block is too small more computational e®ort is

needed. On the other hand the larger the blocks the more a®ected it will be by the

illumination variations. We chose to divide the image into w� w equally sized rect-

angular blocks and tested the method for di®erent values of w to select the best one.

Another parameter is the number of DCT coe±cients used from each block to

estimate the low frequency content of the image, which is a crucial step in the pro-

posed methodology. The DCT coe±cients usually are scanned in a zig-zag manner,

from the low frequencies to the high ones. We tested three di®erent cut o® points in

the list of coe±cients ordered by the zig-zag scan for the di®erent values ofw. In Fig. 9

(a), the performance of the proposed method with di®erent w values and using dif-

ferent numbers of DCT coe±cients for the three subsets of the XM2VTS database is

presented. Because of its simplicity, we use the LBP+�2 to run the experiments for

the parameters optimization. However, the general behavior of all the tested veri¯-

cation methods is similar, i.e. performance improvements are correlated.

As can be appreciated, the best performance in all the cases is obtained by di-

viding the images into 8� 8 blocks, which is also the traditional division for other

applications of the DCT. The optimal number of coe±cient is more di±cult to select

as expected, however, we decided to use 15 low-frequency DCT coe±cients because

this choice shows the most stable performance for the three sets and with the

di®erent blocks sizes.

As a ¯nal parameter we need to select the ¯lter and the size for the smoothing

operation that needs to be applied to the estimated term. There are a lot of

smoothing ¯lters de¯ned for digital image processing. We tested an averaging ¯lter


1360005-15

Int.

J. P

att.

Rec

ogn.

Art

if. I

ntel

l. 20

13.2

7. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by U

NIV

ER

SIT

Y O

F O

TA

GO

on

09/3

0/13

. For

per

sona

l use

onl

y.

with a square and a circular kernel, as well as a Gaussian ¯lter. These ¯lters were

chosen for their simplicity and a wide spread use. In Fig. 9(b), we report the TER of

the proposed method, using the selected block size and number of coe±cients, for

di®erent smoothing ¯lters on the three subsets of the database. The x-axis plots the

kernel size of the ¯lters, h, from which, for the case of the Gaussian ¯lter, the

standard deviation can be obtained as � ¼ ðh� 1Þ=2.From the plots, one can appreciate that the three ¯lters exhibit a very similar

performance. In general, as the size of the kernel increases the classi¯cation error

decreases. However, the larger the size of the kernel the higher the computational cost.

The best performance on theDark setwas achieved using a circular averaging ¯lterwith

a kernel of size 11, which also presents a very good performance on the Eval and Test

sets. This kernel size is also a good value for the computational complexity trade-o®.

(a) (b)

Fig. 9. Performance (TER) of the proposed method with (a) di®erent w� w blocks divisions and numberof DCT coe±cients and (b) di®erent smoothing ¯lters of di®erent size in the Evaluation (up), Test (middle)

and Dark (bottom) sets of the XM2VTS database.


1360005-16

Int.

J. P

att.

Rec

ogn.

Art

if. I

ntel

l. 20

13.2

7. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by U

NIV

ER

SIT

Y O

F O

TA

GO

on

09/3

0/13

. For

per

sona

l use

onl

y.

Using the optimized parameters we tested the proposed photometric normaliza-

tion method (LDCT) with the di®erent combinations of face description and deci-

sion-making methods: LBP+�2, LBP+LDA, MLBP+�2 and MLBP+LDA. Table 1

shows the obtained total error rates and compares them with the results for the

original images (OI) without any preprocessing. As can be appreciated, for all clas-

si¯ers, although the performance for the well-illuminated images is in some way

degraded by LDCT preprocessing (which is a normal behavior of photometric nor-

malization methods as can be seen further in Table 2), the improvement achieved for

the Dark set is signi¯cant. In general, the MLBP face description outperforms LBP,

while LDA is the best classi¯er. The subsequent evaluation of the proposed method

has been carried out exclusively with MLBPþLDA.

6. Comparison with Other Methods

The proposed method, in conjunction with MLBP+LDA, has been compared with

state-of-the-art photometric normalization methods using the same con¯guration of

the XM2VTS database. Table 2 shows the TER for each subset of the database

for the original images (OI) and well-known photometric normalization methods

such as Histogram Equalization (HE), Homomorphic Filtering (HF), Self-Quotient

Image (SQI) and Anisotropic Smoothing (AS), as well as the more recent approaches

including the Total Variation Quotient Image (TVQI) and the Processing

Sequence (PS).

Among the tested methods, PS shows the best results on the Dark set, followed by

our LDCT method. On the other hand on the Test set, where the images do not

present large illumination variations, PS shows a slightly worse performance than

LDCT. Since the performance of PS and LDCT is very close, we will compare these

methods in more detail.

6.1. PS verses LDCT

The PS method was proposed by Tan and Triggs.29 It is composed of a series of steps

aiming to reduce the e®ects of illumination variations, local shadowing and

Table 1. Comparison of di®erent LBP-basedclassi¯ers in terms of TER (%).

Eval Test Dark

LBPþ�2 OI 10.3 7.12 95.7

LDCT 16.0 11.8 65.2

LBPþLDA OI 3.66 2.94 17.1LDCT 4.67 3.64 14.5

MLBPþ�2 OI 9.69 7.81 89.6

LDCT 12.7 10.6 48.4

MLBP+LDA OI 1.90 1.16 13.7LDCT 2.00 1.32 4.55


1360005-17

Int.

J. P

att.

Rec

ogn.

Art

if. I

ntel

l. 20

13.2

7. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by U

NIV

ER

SIT

Y O

F O

TA

GO

on

09/3

0/13

. For

per

sona

l use

onl

y.

highlights, while still keep the essential visual appearance information for use in

recognition. The ¯rst step is to apply a gamma correction, which is a nonlinear gray

level transformation replacing the pixel value in I with I � , where � > 0. The second

step involves a Di®erence of Gaussian (DoG) ¯ltering. This band-pass ¯lter not only

suppresses low frequency information caused by illumination gradient, but also

reduces the high frequency noise. The ¯nal step is a global contrast equalization

which re-normalizes the image intensities to standardize the overall contrast, where

the large values are truncated and their in°uence is reduced. After this step, the

image may still contain extreme values, so a nonlinear function that compresses very

large values is optionally applied.

Comparing PS and the proposed method, the most important di®erence between

them is in the frequency information that is retained and suppressed in the main step

of each algorithm. The ¯rst step for both methods, gamma correction and logarithm

transformation, works in the same way, enhancing the dark image intensity values,

while compressing the bright ones. For both methods, the second step is the fun-

damental one. In PS, the DoG ¯ltering attenuates the lower and higher frequencies

retaining the information in the mid of frequency spectrum, while our method uses

the low frequency DCT coe±cients to construct the compensation term which is

subtracted from the original log image, thus suppressing low frequency content. The

high frequency attenuation in PS could be the cause of worse performance for well-

illuminated images, since the important facial features mainly lie in the high fre-

quency band. The subsequent steps in each method have di®erent purposes. In the

PS case, the ¯ltered image is postprocessed to improve its overall contrast, while in

LDCT, the compensation image constructed with the low frequency DCT coe±cients

is ¯ltered to remove blockiness. These di®erences inject diversity which leads to

di®erent outputs being generated in each case.

Noting that PS and LDCT work di®erently but the total error rates achieved by

them on the XM2VTS database are very similar, it was pertinent to check whether

the speci¯c misclassi¯cations committed by each method were correlated. In Ref. 33

a statistical test, known as z statistics, to determine whether of two classi¯ers deliver

di®erent outputs is described.

Table 2. Comparison of di®erent photometric normalizationmethods on the XM2VTS database using MLBPþLDA.

Eval Test Dark

OI 1.90 1.16 13.7HE 2.10 1.17 13.5

HF 2.35 1.35 12.7

SQI 2.31 1.84 11.6

TVQI 2.65 1.98 6.98AS 2.08 1.50 6.15

PS 2.00 1.56 3.72

LDCT 2.00 1.32 4.55


1360005-18

Int.

J. P

att.

Rec

ogn.

Art

if. I

ntel

l. 20

13.2

7. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by U

NIV

ER

SIT

Y O

F O

TA

GO

on

09/3

0/13

. For

per

sona

l use

onl

y.

Let us introduce the following notation:

n00 ¼ number of samples misclassi¯ed by both PS and LDCT,

n01 ¼ number of samples misclassi¯ed by PS but not by LDCT,

n10 ¼ number of samples misclassi¯ed by LDCT but not by PS,

n11 ¼ number of samples misclassi¯ed by neither PS nor LDCT.

This information is best represented as a confusion matrix:

The z statistics is de¯ned as:

z ¼ jn01 � n10j � 1ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffin10 þ n01

p : ð11Þ

If jzj > 1:96 we can say that the two methods do not have the same error (with a

0:05 probability of incorrect decision).

In Table 3, we show the confusion matrix for each set of the XM2VTS database

and, in Table 4, the corresponding z statistic.

Note that the statistical test in all cases is higher than 1.96. Thus the two methods

misclassify images in a di®erent way. A deeper analysis of the coincidences in mis-

classi¯cation (n00), reported in Table 5, shows that for both cases, less than the half

of the incorrectly classi¯ed images are also misclassi¯ed by the other method.

Table 3. Comparison of the number of images

misclassi¯ed for both methods in each set of the

XM2VTS.


1360005-19

Int.

J. P

att.

Rec

ogn.

Art

if. I

ntel

l. 20

13.2

7. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by U

NIV

ER

SIT

Y O

F O

TA

GO

on

09/3

0/13

. For

per

sona

l use

onl

y.

These results show clearly that the methodological di®erences between the two

methods inject diversity into the outputs generated by the face recognition method.

This diversity can be exploited to improve the recognition performance by multiple

expert fusion, as discussed in next section.

7. Classi¯er Fusion

It is well known that multiple classi¯er fusion is an e®ective method to improve the

performance of pattern recognition systems. The prerequisite is that the component

classi¯ers, the output of which is fused, provide complementary information. Clas-

si¯er diversity can be achieved in many di®erent ways. The options include di®erent

feature spaces, di®erent classi¯ers, di®erent metrics, and even di®erent classi¯er

parameter learning procedures. In our approach, the face recognition system in-

cluding its method of representation and matching is the same for all (the two)

component systems. The diversity is achieved by using di®erent face image pre-

processing techniques to perform photometric normalization. In particular, we

combine the solutions obtained with the PS and LDCT preprocessing, as illustrated

in Fig. 10.

The essential ingredients in multiple classi¯er fusion are the fusion architecture,

score normalization, and score fusion rule. In our work, we are combining only two

outputs and there are no architectural issues. A score normalization is normally

required if the fusion rule adopted is simple (untrained). For trained fusion rules,

score normalization is not needed, as the appropriate weighting of inputs is learnt

during the fusion rule inference process. We opted for a simple fusion by a ¯xed rule,

sum, as we desire a solution that is dependent on training as little as possible. The

sum fusion rule is known to be e®ective and also robust to noise.12 The use of a simple

fusion rule avoids the problems of generalization to data sets a®ected by drift caused

by various phenomena, such as illumination changes. As we use a single expert with

two di®erent inputs, the scores to be fused are in the same range and normalization is

not strictly required.

Table 4. The z statistics computedin each set of the XM2VTS.

Eval Test Dark

jzj 8.19 15.26 8.15

Table 5. Proportion of coincident misclassi¯cationfor PS and LDCT methods.

Eval Test Dark

PS 35.80% 36.28% 35.55%LDCT 45.39% 46.92% 26.72%


1360005-20

Int.

J. P

att.

Rec

ogn.

Art

if. I

ntel

l. 20

13.2

7. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by U

NIV

ER

SIT

Y O

F O

TA

GO

on

09/3

0/13

. For

per

sona

l use

onl

y.

Thus let us denote the score delivered by the face system obtained for an

input image, photometrically normalized by LDCT, as sLDCT and that delivered

for the same input with the PS preprocessing as sPS. The fused score is then simply

given as

s ¼ sLDCT þ sPS: ð12Þ

The merit of this simple fusion method can be gleaned from Table 6. Using the

proposed photometric normalization and classi¯er fusion scheme, a signi¯cant im-

provement in performance was achieved for all data sets, regardless of whether the

images were a®ected by illumination variations or not.

Table 7 compares our proposal with the reported results of some state-of-art

systems tested in the XM2VTS database for Con¯guration I, specially with the ones

reported on the Dark set.

The performance of 2.87% TER on the Dark set is very close to the best ever

error rate reported on the Dark set in the ICB 2006 competition.19 However, the

winning performance in the ICB 2006 competition was achieved by training the face

Table 6. Fusion results.

Eval Test Dark

FAR 1.00 1.06 0.47

PS FRR 1.00 0.5 3.25

TER 2.00 1.56 3.72

FAR 1.00 1.07 0.42

LDCT FRR 1.00 0.25 4.13

TER 2.00 1.32 4.55

FAR 0.90 0.97 0.37

PS+LDCT FRR 0.89 0.20 2.50

TER 1.79 1.17 2.87

original image

LDCT

PS

MLBP+

LDA

MLBP+

LDA

FUSION

output

outp

ut

classificationresult

Fig. 10. Proposed combination scheme.


1360005-21

Int.

J. P

att.

Rec

ogn.

Art

if. I

ntel

l. 20

13.2

7. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by U

NIV

ER

SIT

Y O

F O

TA

GO

on

09/3

0/13

. For

per

sona

l use

onl

y.

recognition system on poorly illuminated face images. In our approach the im-

provement is achieved entirely through photometric normalization. This is of

practical signi¯cance as in real scenarios it would be impossible to collect repre-

sentative data for all illumination conditions. Thus a solution that involves no

training is preferable. In any case, the brute force machine learning approach has

the disadvantage that the system performance on good quality data is degraded.

This can be seen from the results in Table 7. The solution optimized for the Dark set

(2nd entry ICB06-Best) dropped in performance on well-illuminated images (Eval)

from 1.63% to 2.35%.

8. Identi¯cation of Face Images with Variable LightingUsing the Proposed Combination Scheme

In order to corroborate the obtained results with the proposed combination scheme,

we evaluate it in a face identi¯cation framework.

The Yale B Face Database9 is used to conduct the experiments. This database is

widely used in the evaluation of face recognition methods that cope with illumina-

tion variations. It contains images of 10 subjects in 64 di®erent lighting conditions,

obtained with di®erent angles between the light source direction and the camera

axis. The larger the angle, the more unfavorable the lighting conditions are. The

database is usually divided into ¯ve subsets according to this angle. The face image

of every subject with an angle of 0� between the incident light and the camera is

used as gallery and the recognition performance for each one of the ¯ve subsets is

tested.

In Table 8, the correct classi¯cation error using the proposed scheme is com-

pared with some of the most important methods reported on this database. Al-

though good results exhibited for some methods on subsets S3 and S4, they do not

Table 7. TER of face recognition methods inthe XM2VTS database.

Eval Test Dark

LBP MAP16— 2.84 25.8

LBP AdaBoost16 — 7.80 71.20

LBP LDA11— 9.12 18.22

LBP HMM11— 2.74 19.22

AS LDA19 6.50 9.76 25.24AS HMM19 10.50 8.38 24.00

ICB06-Besta ,19 1.63 0.96 —

ICB06-Bestb ,19 2.35 — 2.02

PSþLDCT 1.79 1.17 2.87

aTrained and tested on well-illuminated images.bTrained and tested on variable illuminated

images.


1360005-22

Int.

J. P

att.

Rec

ogn.

Art

if. I

ntel

l. 20

13.2

7. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by U

NIV

ER

SIT

Y O

F O

TA

GO

on

09/3

0/13

. For

per

sona

l use

onl

y.

report on subset S5 (the most di±cult one), as is the case of the gradient angle

method.5

The Extended Yale B Database13 is a newer version of the database in which the

face images of 28 subjects were added. Several novel algorithms have been evaluated

in the extended database. Some of the most distinguishable are the ones based on

subspaces created from the quotient image, which compensate for illumination

variations in the HSV color space (QI HSV),32 the Canonical Stiefel Quotient

(CSQ)15 and the classi¯cation based on scattered representations (SRC).34 Some of

them have presented their results with the same subdivision of the database pre-

sented before, others report only the average recognition rate obtained. Table 9

compares these methods with the proposal.

From comparison in Tables 8 and 9 it can be concluded that the proposed

scheme, which combines the two photometric normalization methods, outperforms

most of the more relevant methods that have been proposed to deal with the

illumination problem on face recognition, achieving very good results on Yale B

database.

Table 8. Correct classi¯cation error (%) for di®erent algorithms onYale B database.

S1 S2 S3 S4 S5

Illumination ratio image35 0 0 3.3 18.6 —

Linear subspaces3 0 0 0 15.0 —

Illumination cones9 0 0 0 8.6 —

9 light points13 0 0 0 2.8 —

Gradient angle5 0 0 0 1.4 —

Quotient illumination relighting25 0 0 0 9.4 17.5

QI HSV32 0 0 0 8.3 15.7

GlobalDCT PCA6 0 0 0 0.18 1.71

PSþLDCT 0 0 0 0 0.51

Table 9. Comparison of recognition rates (%) of di®erentalgorithms on Extended Yale B Database.

S1 S2 S3 S4 S5 Average

CGHP15— — 54.28 32.63 15.65 —

Geodesic15 — — 78.60 63.71 29.30 —

CSQ15— — 99.78 97.88 51.78 —

QI HSV32 100 100 93.75 90.63 84.37 93.75PS LTP29 100 100 98.0 99.2 94.1 —

NN34— — — — — 90.7

NS34 — — — — — 94.1

SVM34— — — — — 97.7

SRC34— — — — — 98.1

PSþLDCT 100 100 99.06 99.34 96.53 98.98


1360005-23

Int.

J. P

att.

Rec

ogn.

Art

if. I

ntel

l. 20

13.2

7. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by U

NIV

ER

SIT

Y O

F O

TA

GO

on

09/3

0/13

. For

per

sona

l use

onl

y.

9. Conclusion

A new face image photometric normalization method based on the local DCT in the

logarithmic domain has been proposed. A low pass version of the image is subtracted

from the original face image to compensate for illumination variations. To construct

the low pass image, a local DCT is applied to the original image in the logarithmic

domain. A modi¯ed DC term and low-frequency DCT coe±cients are used to

reconstruct the illumination compensating image by applying the inverse DCT.

The proposed LDCT photometric normalization process in conjunction with the

MLBP+LDA classi¯cation method was tested on the XM2VTS face database.

Compared to other preprocessing algorithms, our method achieved a very good

performance with a total error rate very similar to that produced by the PS method,

the winning algorithm on the Dark set of the database as shown on Table 2.

Despite the similarities in the average error rates of PS and LDCT, an in-depth

analysis of the two preprocessing methods revealed notable di®erences in their be-

havior. The diversity in the observed performance of these two methods on indi-

vidual images motivated a new recognition framework based on score level fusion.

The proposed classi¯er fusion scheme, involving the LDCT photometric normaliza-

tion method and PS achieved a very good performance on all data sets of the

XM2VTS database, regardless of whether the images were a®ected by illumination

variations or not. The method was compared with the state-of-the-art systems tested

on the XM2VTS database, and found to be comparable with the best ever method

reported on the Dark set of the database, which requires training on poorly illumi-

nated images and degrades on good quality images. Moreover, the method was tested

in a face identi¯cation framework on Yale B database and outperforms the most

relevant approaches tested on it. The practical advantage of our approach which

is applicable without the need for any data collection and training is extremely

valuable.

Acknowledgment

This work was supported in part by the EU-funded Mobio project grant IST-214324

and TSB grant TP/6/ICT/6/S/K15331.

References

1. A. Abbas, M. I. Khalil, S. AbdelHay and H. M. A. Fahmy, Illumination invariant facerecognition in logarithm discrete cosine transform domain, Int. Conf. Image Processing(ICIP) (2009), pp. 4157�4160.

2. T. Ahonen, A. Hadid and M. Pietik~ainen, Face recognition with local binary patterns,Europan Conf. Computer Vision (ECCV 2004) (2004), pp. 469�481.

3. P. N. Belhumeur and D. J. Kriegman, What is the set of images of an object under allpossible illumination conditions? Int. J. Comput. Vis. 28(3) (1998) 245�260.

4. C. Chan, J. Kittler and K. Messer, Multi-scale local binary pattern histograms for facerecognition, Adv. Biometrics 4642 (2007) 809�818.


1360005-24

Int.

J. P

att.

Rec

ogn.

Art

if. I

ntel

l. 20

13.2

7. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by U

NIV

ER

SIT

Y O

F O

TA

GO

on

09/3

0/13

. For

per

sona

l use

onl

y.

5. H. F. Chen, P. N. Belhumeur and D. W. Jacobs, In search of illumination invariants,IEEE Conf. Computer Vision and Pattern Recognition (2000), pp. 254�261.

6. W. Chen, M. J. Er and S. Wu, Illumination compensation and normalization for robustface recognition using discrete cosine transform in logarithm domain, IEEE Trans. Syst.Man, Cybern. B 36(2) (2006) 458�466.

7. T. Chen, W. Yin, X. S. Zhou, D. Comaniciu and T. S. Huang, Illumination normalizationfor face recognition and uneven background correction using total variation based imagemodels, IEEE Computer Society Conf. Computer Vision and Pattern Recognition(CVPR), 2 (2005) 532�539.

8. B. Du, S. Shan, L. Qing and W. Gao, Empirical comparisons of several preprocessingmethods for illumination insensitive face recognition, IEEE Int. Conf. Acoustics, Speech,and Signal Processing (ICASSP '05), 2 (2005) 981�984.

9. A. S. Georghiades, P. N. Belhumeur and D. J. Kriegman, From few to many: Illuminationcone models for face recognition under variable lighting and pose, IEEE Trans. PatternAnal. Mach. Intell. 23(6) (2001) 643�660.

10. R. Gross and V. Brajovic, An image preprocessing algorithm for Illumination invariantface recognition, 4th International Conf. Audio-and Video-Based Biometrie Person Au-thentication (AVBPA03) (2003), pp. 10�18.

11. G. Heusch, Y. Rodriguez and S. Marcel, Local binary patterns as an image preprocessingfor face authentication, in FGR'06: Proc. 7th Int. Conf. Automatic Face and GestureRecognition (2006), pp. 9�14.

12. J. Kittler, M. Hatef, R. P. W. Duin and J. Matas, On combining classi¯ers, IEEE Trans.Pattern Anal. Mach. Intell. 20 (1998) 226�239.

13. K. Ch. Lee, J. Ho and D. J. Kriegman, Acquiring linear subspaces for face recognitionunder variable lighting, IEEE Trans. Pattern Anal. Mach. Intell. 27(5) (2005) 684�698.

14. H. F. Liau and D. Isa, New illumination compensation method for face recognition, Int. J.Comput. Netw. Secur. 2(3) (2010) 5�12.

15. Y. M. Lui, J. R. Beveridge and M. Kirby, Canonical stiefel quotient and its application togeneric face recognition in illumination spaces, BTAS09: Third IEEE Int. Conf. Bio-metrics Theory, Applications and Systems (2009), pp. 1�8.

16. S. Marcel, Y. Rodriguez and G. Heusch, On the recent use of local binary patterns for faceauthentication, Int. J. Image Video Process.: Special Issue on Facial Image ProcessingIDIAP-RR06-34 (2006).

17. H. M�endez-V�azquez, E. Garcia and Y. Condes, A new combination of local appearancebased methods for face recognition under varying lighting conditions, in Progress inPattern Recognition, Image Analysis and Applications, Lecture Notes in ComputerScience, Vol. 5197 (Springer, 2008), pp. 535�542.

18. H. M�endez-V�azquez, J. Kittler, C. H. Chan and E. García, On combining local DCT withpreprocessing sequence for face recognition under varying lighting conditions, in Progressin Pattern Recognition, Image Analysis and Applications, Lecture Notes in ComputerScience, Vol. 6419 (Springer, 2010), pp. 410�417.

19. K. Messer, J. Kittler, J. Short, G. Heusch, F. Cardinaux, S. Marcel, Y. Rodriguez,S. Shan, Y. Su, W. Gaod and X. Chen, Performance characterisation of face recognitionalgorithms and their sensitivity to severe illumination changes, in Proc. Int. Conf. Bio-metrics, ICB (2006), pp. 1�11.

20. K. Messer, J. Matas, J. Kittler and K. Jonsson, Xm2vtsdb: The extended m2vts database,Second Int. Conf. Audio and Video-Based Biometric Person Authentication (1999),pp. 72�77.

21. T. Ojala, M. Pietikäinen and D. Harwood, A comparative study of texture measures withclassi¯cation based on feature distributions, Pattern Recogn 29(1) (1996) 51�59.


1360005-25

Int.

J. P

att.

Rec

ogn.

Art

if. I

ntel

l. 20

13.2

7. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by U

NIV

ER

SIT

Y O

F O

TA

GO

on

09/3

0/13

. For

per

sona

l use

onl

y.

22. J. Phillips, T. Scruggs, A. O'toole, P. Flynn, K. Bowyer, C. Schott and M. Sharpe, FRVT2006 and ICE 2006 large-scale results, Technical Report, National Institute of Standardsand Technology (NIST), March 2007.

23. Z. Rahman, D. Jobson and G. Woodell, Multi-scale retinex for color image enhancement,Int. Conf. Image Processing (ICIP), Vol. III (1996), pp. 1003�1006.

24. K. R. Rao and P. Yip, Discrete Cosine Transform: Algorithms, Advantages, Applica-tions. Academic Press Professional, Inc. (San Diego, CA, USA, 1990).

25. S. G. Shan, W. Gao, B. Cao and D. B. Zhao, Illumination normalization for robust facerecognition against varying lighting conditions, IEEE Int. Workshop on Analysis andModeling of Faces and Gestures: AMFG, 2003, pp. 157�164.

26. J. Short, J. Kittler and K. Messer, A comparison of photometric normalisation algorithmsfor face veri¯cation, in FGR '04: Proc. 6th Int. Conf. Automatic Face and GestureRecognition (AFGR) (2004), pp. 254�259.

27. J. Short, J. Kittler and K. Messer, Photometric normalisation for component-based faceveri¯cation, in FGR '06: Proc. 7th Int. Conf. Automatic Face and Gesture Recognition,Washington, DC, USA (IEEE Computer Society, 2006), pp. 114�119.

28. T. Stockham, Image processing in the context of a visual model, Proc. IEEE, 60(7),(1972) 828�842.

29. X. Tan and B. Triggs, Enhanced local texture feature sets for face recognition underdi±cult lighting conditions, in IEEE Int. Workshop on Analysis and Modeling of Facesand Gestures: AMFG (2007), pp. 168�182.

30. M. Villegas and R. Paredes, Comparison of illumination normalization methods for facerecognition, Third COST 275 Workshop-Biometric on the Internet (2005), pp. 27�30.

31. H. Wang, S. Z. Li and Y. Wang, Face recognition under varying lighting conditions usingself quotient image, in FGR '04: Proc. 6th Int. Conf. Automatic Face and GestureRecognition (AFGR) (2004), p. 819.

32. Y. H. Wang, X. J. Ning, C. X. Yang and Q.F Wang, A method of illumination com-pensation for human face image based on quotient image, Inform. Sci. 178(12) (2008)2705�2721.

33. A. R. Webb, Statistical Pattern Recognition, 2nd edn. (John Wiley and Sons Ltd, 2002),Chap. 8.3, pp. 266�271.

34. J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry and Yi Ma, Robust face recognition viasparse representation, IEEE Trans. Pattern Anal. Mach. Intell. 31(2) (2009) 210�227.

35. J. Zhao, Y. Su, D. J. Wang and S. W. Luo, Illumination ratio image: Synthesizing andrecognition with varying illuminations, Pattern Recogn. Lett. 24(15) (2003) 2703�2710.

36. X. Zou, J. Kittler and K. Messer, Illumination invariant face recognition: A survey, in FirstIEEE Int. Conf. Biometrics: Theory, Applications, and Systems (BTAS), September 2007.


1360005-26

Int.

J. P

att.

Rec

ogn.

Art

if. I

ntel

l. 20

13.2

7. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by U

NIV

ER

SIT

Y O

F O

TA

GO

on

09/3

0/13

. For

per

sona

l use

onl

y.

Heydi Mendez-Va�zquezreceived her B.S. degree(with Honors) in Com-puting in 2005 and herPh.D. in Automatic andComputing in 2010, bothfrom the PolytechnicUniversity of Havana(CUJAE), Cuba. Since2005, she has been withthe Biometric Research

Group of the Advanced Technologies Applica-tion Center (CENATAV), in Cuba. Her re-search interests include computer vision,biometrics, face recognition and image proces-sing. She is a member of the Cuban Associationof Pattern Recognition (ACRP) and the Inter-national Association of Pattern Recognition(IAPR). She received a best student paperaward in the 5th International Summer Schoolon Biometrics endorsed by the IAPR, inAlghero, Italy, 2008 and was awarded one ofthe annual prizes by the Cuban Academy ofSciences in 2011.

Josef Kittler receivedhis B.A., Ph.D., and D.Sc.degrees from the Univer-sity of Cambridge in 1971,1974, and 1991, respec-tively. He heads the Cen-tre for Vision, Speech andSignal Processing at theSchool of Electronics andPhysical Sciences, Uni-versity of Surrey, U.K. He

teaches and conducts research in the subject areaof machine intelligence, with a focus on bio-metrics, video and image database retrieval, au-tomatic inspection, medical data analysis, andcognitive vision. He published a Prentice-Halltextbook, Pattern Recognition: A StatisticalApproach and several edited volumes, as well asmore than 600 scienti¯c papers, including morethan 170 journal papers. He serves on the Edi-torial Board of several scienti¯c journals inpattern recognition and computer vision.

Chi Ho Chan receivedhis Ph.D. from the Uni-versity of Surrey, U.K. in2008. He is currently a re-search fellow at the Centrefor Vision, Speech andSignal Processing, Uni-versity of Surrey. From2002 to 2004, he served asa researcher at ATR In-ternational (Japan). His

research interests include image processing, pat-tern recognition, biometrics, and vision-basedhuman-computer interaction.

Edel García-Reyes re-ceived his B.S. degree inMathematics and Cyber-netic from Havana Uni-versity, in 1986 andreceived his Ph.D. inTechnical Sciences fromthe Technical Military In-stitute Jos�e Martí" of Ha-vana, in 1997. Currently,he is working as a re-

searcher at the Advanced Technologies Applica-tion Center (CENATAV). Dr. Edel has focusedhis research on digital image processing of remotesensing data, biometrics and video surveillance.He has participated as a member in both technicalcommittees and experts groups and has been areviewer for di®erent events and journals such asPattern Recognition Letter, Journal of Real-TimeImage Processing, etc. Dr. Edel worked at theCuban Institute of Geodesy and Cartography(1986�1995) and in the Enterprise Group Geo-Cuba (1995�2001) where he directed the Agencyof the Centre of Data and Computer Science ofGeocuba — Investigation and Consultancy(1998�2001).


1360005-27

Int.

J. P

att.

Rec

ogn.

Art

if. I

ntel

l. 20

13.2

7. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by U

NIV

ER

SIT

Y O

F O

TA

GO

on

09/3

0/13

. For

per

sona

l use

onl

y.

photometric normalization for face recognition using local discrete cosine transform

Documents