face recognition || characterising virtual eigensignatures for general purpose face recognition

Characterising Virtual Eigensignatures for General Purpose Face Recognition

Daniel B Graham and Nigel MAllinson

Image Engineering and Neural Computing Group Department of Electrical Engineering and Electronics University of Manchester Institute of Science and Technology. Manchester M60 1 QD, UK.

Abstract. We describe an eigenspace manifold for the representation and recognition of pose-varying faces. The distribution of faces in this manifold allows us to determine theoretical recognition characteristics which are then verified experimentally. Using this manifold a framework is proposed which can be used for both familiar and unfamiliar face recognition. A simple implementation demonstrates the pose dependent nature of the system over the transition from unfamiliar to familiar face recognition. Furthermore we show that multiple test images, whether real or virtual, can be used to augment the recognition process. The results compare favourably with reported human face recognition experiments. Finally, we describe how this framework can be used as a mechanism for characterising faces from video for general purpose recognition.

1 Introduction

One of the fundamental problems that challenge face recognition systems, whether human or machine, is that natural behaviour and conditions can significantly change the appearance of a facial image. Given a sufficient change in conditions it is most probable that, in the representation of the faces, the in-condition change will be greater than the in-person change and so lead to an incorrect identification or, at best, an incorrect rejection. The human face recognition system, whilst not invariant to such changes, can usually accommodate the most frequently occurring changes with reasonable success - the performance often decaying once condition changes tend towards the extreme. Many psychological experiments have attempted to capture the extent of the human ability to recognise faces under such changes as illumina~ion, planar rotation, depth rotation, expression, disguise, photographic inversion and occlusion. This paper will describe a system which models the characteristic manner in which faces change over pose and a method for developing an individuals' characteristic which is dependent upon the degree of familiarity of that individual.

H. Wechsler et al. (eds.), Face Recognition© Springer-Verlag Berlin Heidelberg 1998

447

Figure 1: Pose Varying Images

2 Face Representation

2.1 Pose Varying Eigenspace

The use of eigenspace methods for facial image analysis has been common since early papers by Sirovich and Kirby [9] and, the more often cited, Turk and Pentland [10]. The majority of such systems have shown that separating the shape information (e.g. by morphing) from the texture information yields additional performance enhancements as in Costen et al.[2]. The viewbased eigenspaces of Moghaddam and Pentland [4] have also shown that separate eigenspaces perform better than using a combined eigenspace of the pose-varying images. This approach is essentially several discrete systems (multiple-observers) and so highly dependent upon the number of views chosen to sample the viewing sphere and of the accuracy in the alignment of the views. Producing an eigenspace from all of the different views (a pose varying eigenspace), could continuously describe an individual through an eigenspace in the form of a convex curve. This has been shown by McKenna et al.[3] for faces and Murase and Nayar [6] for 3D objects. We state simply the standard eigenvalue formulation for an image set, X, where X is formed by concatenating the rows of each image to produce a single column vector for each image which are then placed into the rows of the vector X (i.e. transposed).

The eigenvectors /-ta of X are described by the following equation:

X/X/-ta = Aa/-ta (1)

where Aa is the corresponding eigenvalue to the eigenvector /-ta, and X'X is the covariance matrix of the image set X. The eigenvectors of this matrix can be obtained by diagonalising X'X. However, for image-sized matrices it is generally computationally easier to compute the eigenvectors of the alternative covariance matrix XX' which is of order N 2 where N is the number of images in the set and use the relationship:

1 I /-ta = ~X Va

VAa (2)

448

where Va is the eigenvector of the covariance matrix XX'. The eigenvectors of the covariance matrix XX' are readily computable for small image sets in a reasonable amount of time as algorithms are available to achieve the required diagonalisation. The eigenvalues of a novel image Ware then given by:

(3)

It is in the space spanned by these eigenvalues, 'l/Ja, that the consequent analysis is performed and we will refer to this space as the eigenspace.

Our pose varying eigenspaces are constructed from images like those shown in Figure 1. An eigenspace constructed from such images captures that manner in which faces change over pose. Figure 2 shows the mean face and the effect of the first eigenvector (as in Valentin and Abdi [11]). As can be seen the first eigenvector indicates the major face orientation with negative values of the first eigenvector giving a more profile-oriented face and positive values a frontal view. This effect can be used as the basis of a pose determination system as in [3].

+ --

Mean First Eigenvector Combination

Figure 2: The effect of the first eigenvector

Figure 3 shows the characteristic curves of the two individual faces in Figure 1 in this eigenspace as they rotate from profile to frontal view. Note that these curves are represented here by ten 3D points corresponding to the first three eigenvalues of the images in an eigenspace constructed from 40 images randomly sampled from a database of 563 pose varying images of 20 people.

We have called these loops in the eigenspace eigensignatures as each one corresponds uniquely to a specific individual.

449

0.2 Frontal

C')

0 t5 Q) 0 > c Q) 0)

iIi

-0.2 -0.3

-0.3

0.3 0.3 Eigenvector 1 Eigenvector 2

Figure 3: Eigensignatures of two people.

2.2 Properties of this Manifold

We have seen in Section 2.1 that the faces in our database are represented in the pose-varying eigenspace as a convex curve. The properties of the distribution of faces in this manifold are of interest.

0.19

.,.-- ...........

/V """-

" / \

0.18

0.17

/

/ 0.16

0.14

/

./ V

V

0.15

10 20 30 40 50 60 70 80 90

Figure 4: Average Distance between faces over Pose

In particular the distance between faces over pose allows us to predict the

450

pose dependency of a recognition system that uses this manifold. If faces are further apart they will be easier to recognise using distance measures in the eigenspace. Those measures can be used to set thresholds or confidence limits upon the identity of a test face.

Figure 4 shows the average Euclidean distance between the people in the database over the pose angles sampled (cubic fit). From this we can predict that faces should be most easy to recognise around the 60° range and, consequently, the best pose samples to use for an analysis should be concentrated around this range. Additionally we would expect that faces are easier to recognise at the frontal view than the profile. The experiments of Section 3 will attempt to confirm these predictions.

2.3 Virtual Eigensignatures

If we consider the pose varying eigenspace described in Section 2.1, where a unified pose/identity subspace is generated which captures the manner in which faces change over varying pose and quantifies the extent of that change in terms of distances in the subspace. It can be seen in Figure 3 that individuals all differ in this subspace but that each subject undergoes a characteristic motion through the subspace. As the motion we are capturing is the same in each case and the 3D structures of each subject are closely related, it is not unreasonable to assume that the general nature of these characteristic curves can be obtained, and that a curve may be estimated from a single given point.

Formally, the recognition of faces in previously unseen views requires a function f which maps a real point P to a virtual eigensignature T. This virtual eigensignature has a confidence factor c5(p) which depends upon the initial point p. That is:

f(P) = T,c5(p) (4)

Given further real points Pi we can generate further virtual eigensignatures Ti each with their own confidence factor c5(Pi). We can then combine virtual eigensignatures to produce a refinement of the virtual eigensignature which approaches the true eigensignature n. Given that the confidence factors c5(pi) lie in the range {O,l} we can define the weight function Wi for each virtual eigensignature:

c5(Pi) Wi=,======

V'L/:=O(c5(Pi))2 (5)

We can combine the eigensignatures to produce:

N

n~ LW/ri (6) i=O

451

Note that this framework is independent of the chosen representation of the eigensignatures, the confidence factors and the weight function. Additionally, the weight deduction (eqn 5) is sub-optimal, in that real points in the eigenspace (8(Pi) = 1.0) should remain in the eigensignature and not be influenced by other points. The development of an algorithm for effectively combining multiple eigensignatures will be described in a later paper.

In order to investigate the above formulation we define an eigensignature as consisting of ten points in the eigenspace sampled from profile to frontal view in '" 100 degree steps. Virtual eigensignatures 'Ii are generated from a test point Pi using a Radial Basis Function Network (RBFN - see Moody and Darken [5) ) as the mapping function r. The RBFN was trained on one view (Pi) to produce the full eigensignature 'Ii. The output from the RBFN thus gives ten points in the eigenspace for an individual which estimates the characteristic curve of that face in the eigenspace. Each RBFN is trained on 19 of the subject's true eigensignatures and the remaining subject's eigensignature was generated from the RBFN to form a virtual eigensignature - this was repeated for each of the 20 subjects (leave-one-out cross-validation) to produce 20 virtual eigensignatures. To investigate the pose dependent nature of the method an RBFN was trained using each of the ten views (producing ten virtual eigensignatures per person) and the performance of each of these eigensignatures was compared. In total 200 virtual eigensignatures were produced.

Recognition is performed by matching a test image (i.e a test point in the eigenspace) to one of the virtual eigensignatures using a nearest neighbour Euclidean distance. Different metrics in this eigenspace, such as the Mahalanobis distance, have also been investigated and found to perform similarly. It should be noted that simple point matching in this formulation describes the base-level performance achievable. Matching with multiple, ordered, points -whether real or virtual- would improve the performance of such systems. See Section 3.4 for an example of this.

3 Experimental Results

3.1 Thain/Test View Interaction

The performance of this approach is dependent on several factors as described in Section 2.3. Here we establish the baseline performance of the system by matching the real eigensignatures (omitted during the RBFN training) with the virtual eigensignatures generated by the RBFN. Table' 1 shows the percentage of correct identifications at each train/test view. It can be seen from the Mean row that there is a clear advantage to testing at the 400 to 500 view. This is normally referred to as the 3/4 view and is often reported as the best performing pose in human face recognition experiments e.g. Bruce et al.[l], Valentin et al.[12) and partially in Patterson and Baddeley [7).

452

Table 1: Train/Test View Interaction (Pose 0 = Profile, 90 = Frontal)

Training Test View View 0 10 20 30 40 50 60 70 80 90

0 100 90 60 40 20 20 20 15 15 15 10 90 100 70 60 50 45 50 40 15 15 20 35 65 100 70 30 20 30 20 20 25 30 35 75 85 100 70 60 25 25 20 20 40 35 25 40 70 100 80 55 50 25 30 50 25 30 40 65 95 100 60 55 30 30 60 15 15 30 30 60 85 100 80 55 45 70 10 10 25 30 45 55··· 90 100 75 50 80 15 20 35 20 20 30 45 70 100 65 90 20 10 10 20 20 15 25 30 60 100

Mean 38 44 49.5 50.5 51 51 50 48.5 41.5 39.5

A similar result would be observed for an average over testing view to determine the relative performances of each training view but it was felt that such an interpretation would be biased in favour of the central views by the window effects of the data around the end views of 0° and 90°. As such it is difficult to determine the optimal training view. However, were we to assume that all tests were to be carried out in this pose range, we would have reason to suppose the 3/4 view as the preferred training view.

3.2 Multiple Training Images

Recognition of a face when having only seen one previous image of that face is classed as unfamiliar face recognition. As the number of images increases the process tends towards familiar face recognition. The system presented in Section 2.3 provides a general purpose formulation for these two types of recognition. Section 3.1 has shown the base-line performance for unfamiliar face recognition and examined the pose dependent nature of the system. Here we examine the effect of increasing the number of training images used to form the refined eigensignature according to eqns 5 & 6. In a simple experiment we show the effect of increasing N (the number of virtual eigensignatures) and the pose dependency of this increase. For this evaluation we have used a confidence factor, centered around a test pose Pi, which decays sharply with distance from Pi. Namely:

(7)

where Pi is the pose used to train the RBFN. Table 2 shows the performance of this system as N increases, and how

this performance varies over pose. The results shown are the percentage of

453

Table 2: Refined Eigensignature Performance (%)

Test View PL N 0 10 20 30 40 50 60 70 80 90 mi' 1 27.0 32.0 40.5 45.0 45.5 54.5 49.5 48.0 43.5 36.0 42.1 2 36.9 43.7 53.0 57.8 58.2 69.1 66.0 64.3 61.2 49.6 56.0 3 44.5 52.9 61.5 65.4 68.3 78.3 75.8 74.8 73.9 59.4 65.5 4 50.3 59.7 67.2 69.2 76.5 84.7 82.2 81.2 82.8 68.3 72.2 5 56.1 65.5 72.0 72.3 83.1 89.3 86.6 85.7 87.7 76.0 77.4 6 62.1 71.0 77.2 74.8 88.3 92.4 90.1 89.4 90.8 82.2 81.8 7 68.4 77.2 82.8 78.3 92.7 95.1 92.8 92.4 93.0 87.6 86.0 8 75.9 84.4 87.9 84.2 95.7 97.3 94.3 94.6 94.0 92.0 90.0 9 81.0 89.5 93.0 91.5 98.0 98.5 94.5 95.5 93.5 96.0 93.1

mN 55.8 64.0 70.6 70.9 78.5 84.4 81.3 80.6 80.0 71.9

correct identifications at each pose for every possible combination of N virtual eigensignatures from ten. As in Section 3.1 it can be clearly seen that, on average (mN), the 50° test view outperforms all other views. There is also a clear trend of performance increasing with N (mj). Furthermore we see a preference for testing at frontal views over profile views - another common observation in human face recognition experiments [1]. This preference is more pronounced for unfamiliar faces (low N) than for familiar faces (high N) - also noted in [1].

These results show the maximum performance increase obtainable with multiple training views as the multiple views are all pose-aligned on the test views. However, we would expect similar improvements in local test areas for non-aligned images due to the nature of the eigensignature combination (eqns 5 & 6) and the confidence factor (eqn 7).

3.3 Multiple Testing Images

The experiments described in Sections 3.1 & 3.2 have demonstrated the use of virtual eigensignatures for recognition, including the case where mUltiple training images are available. Conversely, in real world systems, the number of training images may be low and fixed whereas the number of test images may be large and variable (e.g. video monitoring). We show here the simple situation where multiple training images are used to produce a total Euclidean distance from which we again attempt recognition. There were 363 test images of the same twenty people in the database; none of which were used at any stage during the RBFN training. These images were taken at the same time and setting as the previous images but were considered to lie in intermediate views to the 100 views used in the previous experiments. '

Figure 5 shows the recognition improvements gained by using increasing numbers of test images for each of the virtual eigensignatures. The lines shown indication the percentage of correct identifications for all possible combinations of N test images of the same subject where N is shown below each line.

70

65

~60 ~ 0055

* 0:: 50 c:: 245 ·c C)

8 40 Q)

0::35

30

I II

/1 'I

~ ~ ~ ;7

'" /

'" V

454

5

~ ~ r- ~ ,\.'\.

2 ~~ ~ \. ......... \'\ ~ ~ 1 ~ ~ ~

~ ~ " ,,\ \\ ~

250 10 20 30 40 50 60 70 80 90 Training Pose

Figure 5: Use of Multiple Test Images

The solid line represents the best case of N = 5, As would be expected, there is a clear improvement in using additional test images. It can be seen there is little change in the performance of the system at the frontal and profile areas of the training views for increasing numbers of images. However there is a marked improvement (as N increases from 1 to 5) at the 500 training view by some 17% (compared with 2-3% at the extremes), providing further evidence for the preference of this view as the best training view to use, but with the same reservations as in Section 3.1. Conversely, we see that the performance is marginally better at profile than at frontal. It is thought that the manual alignment procedure and the actual quality of pose from each subject affects the performance at the frontal view.

3.4 Virtual Test Images

Section 3.3 has shown the minimum improvement available from the use of multiple real images which are not considered to be in any particular order. If we have, or can generate, multiple test images that are of known relative pose then we may begin to utilise the nature of each eigensignature over a local region for recognition purposes. In this case we may calculate the total Euclidean distance between the ordered test images with an ordered section of the eigensignature.

To demonstrate this we take the 363 test images as used in Section 3.3 and, for each image, generate a further two virtual images by rotating the image ±100 using a cylindrical model of face shape. We then calculate the total Euclidean distance between these three images and each set of three ordered points for the true eigensignatures.

100

~ ~

80

rn Q)

10 60 a:: c 0

:;::::; ·c 40 0) 0 () Q)

a:: 20

455

Real and Vi ~ual

~ - ~ ./ / V V / /

II V Real :>nly

III rl

2 34567 Number of Eigenvectors

Figure 6: Use of Ordered Virtual Test Images

8 9

Figure 6 shows the performance of this approach compared to simple point matching for increasing numbers of eigenvectors. From the graph we can see a clear improvement at all points - especially when using a small number of eigenvectors. For ten eigenvectors (as used in all other experiments in this paper) we see an improvement of approximately four percent.

4 Conclusion

We have described a manifold for the recognition of pose-varying faces and examined its properties. Experiments have shown that the proposed framework performs face recognition in a manner similar to reported human face recognition. The basis of the technique does not depend upon the the sampling viewpoints - as long as the eigenspace constructed is sufficiently representative of the test viewpoints. Furthermore, as long as the eigenspace is representative of the test viewpoints, it does not necessarily mean that a individual face image must be well represented in the eigenspace. This is evident in the above experiments where only ten eigenvectors are needed for near-ideal performance.

Future work on the technique will concentrate on the automatic construction of eigensignatures, image acquisition, eigenspace optimisation and intelligent matching algorithms. Given these components we can effectively construct an automatic visual system which could continually adapt using eqn 6 in an un constricted manner. This approach will be used as the basis for a video surveillance system for monitoring and characterising individuals

456

from their motion and behaviour. Initial experiments in this direction have shown that automatically constructed eigensignatures can classify as well as recognise and further experiments will test the use of such eigensignatures.

References

[1] Bruce, V., Valentine, T., Baddeley, A. (1987) The Basis of the 3/4 View Advantage in Face Recognition. App. Cog. Psych., 1, 109-120

[2] Costen, P., Craw, 1., Robertson, G., Akamatsu, S. (1996) Automatic face recognition: What representation? Computer Vision, ECCV'96, LNCS, Springer-Verlag, 1064, 504-513

[3] McKenna, S., Gong,. S. Collins, J. (1996) Face Tracking and Pose Representation. British Machine Vision Conference, Edinburgh

[4] Moghaddam, B. and Pentland, A. (1994) Face Recognition using viewbased and modular eigenspaces. SPIE, 2277, 12-21

[5] Moody, J. and Darken, C. (1989) Fast Learning in Networks of LocallyTuned Processing Units. Neural Computation, 1, 281-294

[6] Murase, H. and Nayar, S. (1993) Learning Object Models from Appearance. Proc. of the AAAI, Washington, 836-843

[7] Patterson, K. and Baddeley, A. (1977) When Face Recognition Fails. J. of Exp. Psychology: Learning Memory and Cognition, 3(4),406-417

[8] Pentland, A., Moghaddam B., Starner, T. (1994) View-Based and Modular Eigenspaces for Face Recognition. IEEE Conf. CVPR, 84-91

[9] Sirovich, L. and Kirby, M. (1987) Low Dimensional procedure for the characterisation of human faces. J.O.S.A, 4(3), 519-525

[10] Turk, M. and Pentland, A. (1991) Eigenfaces for Recognition. J. of Cognitive Neuroscience, 3(2), 71-86

[11] Valentin, D. and Abdi, H. (1996) Can a Linear Autoassociator Recognize Faces From New Orientations. J.O.S.A, 13(4),717-724

[12] Valentin, D., Abdi, H., Edelman, B. (1997) What Represents a Face: A Computational Approach for the Integration of Physiological and Psychological Data. Perception, 26

[13] Vetter, T and Poggio, T. (1995) Linear Object Classes and Image Synthesis from a Single Example Image. TR 16, Max-Planck-Institut fur biologische Kybernetik

face recognition || characterising virtual eigensignatures for general purpose face recognition

Documents