recognition of jpeg compressed face images based on statistical methods

Recognition of JPEG compressed face images based on statistical methods

S. Eickeler* , S. Muller, G. Rigoll

Gerhard-Mercator-University—Duisburg, Department of Computer Science, Faculty of Electrical Engineering, Bismarckstraße 90, 47057 Duisburg,Germany

Received 29 September 1998; received in revised form 30 June 1999; accepted 13 July 1999

Abstract

A face recognition system based on 2D DCT features and pseudo 2D Hidden Markov Models is presented. An extension of the system iscapable of recognizing faces by using JPEG compressed image data. Experiments to evaluate the proposed approach are carried out on theOlivetti Research Laboratory (ORL) face database. The recognition rates are 100% for the uncompressed original images and 99.5% forJPEG compressed domain recognition. A comparison with other face recognition systems evaluated on the ORL database, shows that theseare the best recognition results ever reported on this database.q 2000 Elsevier Science B.V. All rights reserved.

Keywords: Face recognition; Pseudo 2D hidden Markov models; Compressed domain recognition

1. Introduction

Face recognition has become one of the major topics inthe research areas of image processing and pattern recogni-tion in the recent years. The applications are manifold, likeaccess control, advanced human–computer interaction,video surveillance, and automatic indexing of image andvideo databases.

Many approaches to face recognition have been developed.Chellappa et al. present in Ref. [1] an overview of thedifferent face recognition techniques. The most popularface recognition algorithm is the Eigenface Method [2,3].The Bunch Graph Matching [4] is a relatively new technique.Several methods use Neural Networks for face recognition[5,6]. Various types of Hidden Markov Models (HMMs) areapplied to face recognition in Refs. [7–11].

In this paper, we present an advanced face recognitionsystem that is based on the use of pseudo 2D HMMs. Weshow that this approach is very suitable for face recognitionproblems due to the two-dimensional warping capabilitiesof pseudo 2D HMMs. We also demonstrate that we canfurther enhance our approach by using special features,such as improved initialization techniques, two-dimensionalfeature extraction methods or mirrored images. Anothermajor novelty of our approach is the fact that our facerecognition system works directly with JPEG-compressedface images. It uses directly the DCT-features provided bythe JPEG standard, without any necessity of decompressing

the image before recognition. We consider this as a majorpractical advantage compared to other face recognitionsystems.

This paper is organized as follows: Section 2 gives a briefintroduction to Hidden Markov Models for the one-dimensional and the pseudo two-dimensional case. A shortoverview of the JPEG standard is given in Section 3. Theproposed face recognition system is described in Section 4,while Section 5 presents the experimental setup and theresults.

2. Hidden Markov Models

The proposed face recognition approach is based onpseudo two-dimensional Hidden Markov Models. Thesemodels are extensions of the one-dimensional models, thatare well known from speech-recognition [12]. HMMs arestatistical models that have several states. At each step atransition to another state depending on a transition prob-ability matrix is performed and a symbol is createddepending on a probability density function (pdf), whichis assigned to each state. Fig. 1 shows a one dimensionalHidden Markov Model with four states and assigned pdfs.Like artificial Neural Networks HMMs can be learned ontraining data.

2.1. One-dimensional Hidden Markov Models

In the one-dimensional case a Hidden Markov Modelconsists ofN statesQ � s1; s2;…; sN and a two-dimensional

Image and Vision Computing 18 (2000) 279–287

IMAVIS 1675

0262-8856/00/$ - see front matterq 2000 Elsevier Science B.V. All rights reserved.PII: S0262-8856(99)00055-4

www.elsevier.com/locate/imavis

* Corresponding author. Tel.:149-2033791139; fax:149-2033794363.E-mail address:[email protected] (S. Eickeler).

transition matrixA � �aij � where

aij � P�qt � sj uqt21 � si�; 1 # i; j # N: �1�and qt denotes the state at timet of the state sequenceq � q1; q2;…; qT; qt [ Q: The probability density functionis a sum of Gaussian mixtures with the mean vectormkj andthe covariance matrixSkj for statej and mixturek.

bj�Ot� �XKk�1

ckj·N�Ot;mkj;Skj� �2�

The initial state distribution is determined by the vectorp � �pi� where

pi � P�q1 � si�; 1 # i # N: �3�A Hidden Markov Model is specified asl (A, B, p).

The main problem dealing with Hidden Markov Modelsis to compute the probability of an observation sequenceOfor the given modell :

P�Oul� �X

q[QT

YTt�1

bqt�Ot�pq1

YTt�2

aqt21qt�4�

This probability can be calculated efficiently by theForward–Backward Algorithm.

In Ref. [12], two other algorithms are assigned to otherimportant problems:

• Viterbi Procedure—Determines the most likely statesequence of a HMMl and their probability for a givenobservation sequenceO.

• Baum–Welch Method—adjusts the model parameters ofHMM l to maximizeP�Oul�:

Both algorithms are closely related to the Forward–Backward Algorithm: The Viterbi Procedure is very similarto the Forward–Backward Algorithm and the Baum–WelchMethod is based on the Forward–Backward Algorithm.

2.2. Pseudo 2D Hidden Markov Models

An extension of the HMMs to work on two-dimensionaldata are pseudo 2D HMMs [13,14]. The differencecompared to real 2D HMMs is the fact that the statesequences in the columns are modeled independently ofthe state sequences of neighboring columns. Pseudo 2DHMMs are nested one-dimensional HMMs. A superiorHMM models the sequence of columns in the image. Insteadof a probability density function the states of the superiormodel (superstates) have a one-dimensional HMM to modelthe rows inside the columns. Fig. 2 shows a pseudo 2DHMM with four superstates containing a three state 1DHMM in each superstate. The probability density functionsof the inferior models are omitted in this figure. Thedisplayed pseudo 2D HMM has linear topologies for thesuperior and the inferior models. This means that only selftransitions and transitions to the following (super)state arepossible. The joint probability over all possible statesequences is

P�Oul� �X

q[QX

pHq1

Pq1�x�

YXx�2

aHqx-1qx

Pqx�x� �5�

where

Pi�x� �X

q[QYi

piq1

biq1�Ox1�

YYy�2

aiqy-1qy

biqy�Oxy�; �6�

and can be calculated by two nested Forward–BackwardAlgorithms.

After inserting column start markers into the pseudo 2DHMM, it can be transformed into an equivalent 1D HMM[8] and the two-dimensional observation can be convertedinto a one-dimensional sequence by scanning the columnsand inserting column start markers at each column.

S. Eickeler et al. / Image and Vision Computing 18 (2000) 279–287280

Fig. 1. One-dimensional Hidden Markov Model.

3. Overview of JPEG standard

The standard defined by the Joint Picture Expert Group(JPEG) [15,16] is a commonly used method for lossy imagecompression of photographic images. The JPEG compressionis a transform coding. It reduces the visual information that isnot important for the human eye and it decorrelates the pixelsand therefore eliminates redundancy. The distortions of theJPEG compression are blurring and blocking artifacts. For thehuman eye both artifacts compensate each other somehow atlower compression rates and are almost invisible, but theymake automatic processing difficult and the processing has tobe adapted to the compression method.

The JPEG compression standard uses the block-baseddiscrete cosine transform (DCT). The image is sampledusing non-overlapping blocks of the size 8× 8 pixels, thatare transformed utilizing the 2D DCT. The coefficients ofthe transformed block are quantized and then coded by aHuffman entropy encoder. Fig. 3 shows a typical block

diagram of the JPEG encoder. The JPEG decoder iscomposed of the inverse processing steps in reverse order.

4. System overview

The face recognition system consists of the featureextraction, which calculates the observation sequencefrom the input image, and the statistical classification,which is based on the Viterbi Algorithm or the Forward–Backward Algorithm. In the case of compressed domainface recognition, the scheme for the recognition is extendedby the Huffman entropy decoding and the inverse quantiza-tion, which are adopted from a JPEG decoder. A blockdiagram of the recognition procedure is shown in Fig. 4.The block diagram is similar to a JPEG decoder, but theViterbi Decoder with assigned models replaces the inverseDCT.

S. Eickeler et al. / Image and Vision Computing 18 (2000) 279–287 281

Fig. 2. Pseudo 2D Hidden Markov Model.

Fig. 3. JPEG encoder.

4.1. Feature extraction

The feature extraction is based on the DCT. The image isscanned with a sampling window (block) top to bottom andleft to right. The pixels in this sampling window of the size8 × 8 are transformed using the DCT according to theequation:

C�u; v� � a�u�a�v�X7x�0

X7y�0

f �x; y� cos�2x 1 1�up

16

� �

cos�2y 1 1�vp

16

� ��7�

A triangle shaped mask extracts the first 15 coefficients�u 1 v # 4�; which are arranged in a vector. The result ofthe feature extraction is a two-dimensional array of vectorsO(x, y) with the dimensionality 15.

The DCT decorrelates the subimages and allows the useof diagonal covariance matrices for the probability densityfunction of the HMMs. A test based on gray values ofsubsampled pixels as features showed that the recognitionrate is on the average 2% (absolute) below the recognitionrate using the DCT features. Another evidence for theusefulness of the DCT is the fact that it is used for imagecompression in the JPEG and MPEG standard. The use ofDCT coefficients allows the system to work directly oncompressed image data.

The size of 8× 8 pixels of the sampling window is used inthis approach, because the JPEG image compression isbased on this size. The use of 16× 16 blocks gave similarrecognition rates, but this paper will focus on a samplingwindow of the size 8× 8 to reduce the number of parametersin the experiments.

An overlap between adjacent sampling windowsimproves the ability of the HMM to model the neighbor-hood relations between the windows. The effect of this over-lap is somehow comparable to the use of delta-features inspeech recognition, and includes redundant information into

the features. Experiments showed that an overlap of 75%(6 pixels) in each direction gives the best results for variousobject recognition tasks.

The use of coefficients of the block-based DCT enablesthe recognition of JPEG compressed face images in thecompressed domain. This has two main advantages:

• the inverse DCT and the feature extraction can beomitted;

• JPEG compression artifacts like blocking have lessinfluence on the recognition.

The proposed feature extraction is compatible to theJPEG compression standard except of the missing overlapof adjacent blocks in the JPEG standard, which would not beuseful to get high compression rates. An overlap of 75% ineach direction is used for the face recognition in the originaldomain. Two solutions for these conflicting specificationsare possible: calculation of the missing blocks for the JPEGdata or use of non-overlapping blocks for recognition. Themethod to calculate overlapping blocks proposed in Ref.[17] is computational more expensive than an inverseDCT and the subsequent DCT, and the blocking artifactswould affect the calculated blocks. Both advantages of thecompressed domain recognition get lost. Therefore thesecond method must be analyzed in order to solve thisproblem. An evaluation of the effects of the block overlapwill show if recognition without overlap is useful. In theexperiment the overlap of the blocks in the training datawill be kept constant and the block overlap of the test datawill be reduced. This will show that an overlap of 75% forthe training data and an overlap of 0% for the test data issuitable for compressed domain face recognition.

4.2. Statistical classification

The next step is the statistical classification based onHidden Markov Models. A single HMM is trained foreach person in the database using the Baum–Welch


Fig. 4. JPEG face recognition and decoder.

Algorithm and the training features. For the recognition theViterbi Algorithm or the Forward–Backward Algorithm isused to determine the probability of each face model for thetest image. On the used database both algorithms gave thesame recognition results, but the Viterbi Algorithm is fasterthan the Forward–Backward Algorithm and allows an auto-matic segmentation of the face image (Fig. 7). The image tobe recognized is assigned to the person, whose model hasthe highest production probability on the tested image.

The Baum–Welch Algorithm, which is used for trainingthe HMM for each person, provides the HMM parameterscorresponding to a local maximum of the likelihoodfunction depending on the initial model parameters [12].Therefore it is very important to use a good initial modelfor the training. We exploit the similarity of all facescompared to other objects and train a common initialmodel on all faces in the training set using the Baum–Welch Algorithm. This common model is refined on thetraining faces of one person to get the face model for theperson. Fig. 5 shows an illustration of this training process.We consider this as one of the main improvements of ourrecognition system compared to other pseudo 2D HMMapproaches. A test of a 2D linear initialization or a flatstart gave much lower recognition rates than the use of thecommon initial model for the training of the personalmodels.

The estimation of the Hidden Markov Models parameters

needs as much training data as possible to estimate goodmodels. In this approach we use the mirrored images of thetraining set to increase the amount of training data. In Ref.[2] it is shown that this exploitation of the symmetries of thehuman faces can improve the face recognition.

5. Experimental results

The presented recognition system is evaluated on theOlivetti Research Laboratory face database. This databasecontains 10 different images for each of 40 people. Theimages of the same person are taken at different times,slightly varying lighting conditions and different facialexpressions. Some people are captured with and withoutglasses. The head of the people in the images is slightlytilted or rotated. The images in the database are manually


Fig. 5. Training of common initial model and personal models.

Table 1Recognition results for varying numbers of states and Gaussian mixturecomponents

States 1 Gaussian (%) 2 Gaussians (%) 3 Gaussians (%)

4 × 4 81.5 99.5 100.05 × 5 97.0 100.0 100.06 × 6 98.5 100.0 100.07 × 7 98.5 100.0 100.08 × 8 100.0 100.0 100.0

cropped and rescaled to a resolution of 92× 112: The firstfive images of each person are used for training of themodels, the remaining five images are used for testingthem. This partitioning seems to be used in Ref. [8]. Otherpublications use a random partitioning for the training andtest data, but the recognition results of these random parti-tionings cannot be directly compared to each other andcannot be verified, because the exact partitioning isunknown.

The recognition system is tested on different quadraticmodel sizes�4 × 4 to 8× 8� with linear topology and oneup to three Gaussian mixture components for the probabilitydensity function. A model size larger than 8× 8 is not usefulbecause of the high computation time, which is proportionalto �statesx × statesy�2: Table 1 shows the top match recogni-tion rates for the tested parameterizations. The maximumrate of 100% is achieved for most of the parameterizations.

The recognition rates are generally increasing with thenumber of states and number of mixtures.

In order to evaluate the effect of the overlap of adjacentblocks in the feature extraction an experiment with differentblock overlaps is carried out. The overlap of 75% hasemerged to give good recognition results in various applica-tions and is kept for the extraction of the training features.The overlap of the sampling window for the extraction ofthe test features varies from 75% (6 pixels) to 0% (0 pixels).The size of the HMM is 7× 7 states, which is a trade offbetween recognition rate and required computation time,and one up to three Gaussian mixtures are used. Table 2shows the results of these experiments. For each overlap theresulting number of blocks are listed. The general result ofthis experiment is that the recognition rates decrease for adecreasing overlap. The parameterization 7× 7 states, threemixtures and 0% overlap is used for compressed domain


Fig. 6. Recognition ratio versus compression ratio.

Table 2Recognition results for varying block overlap in the feature extraction of the test data for 7× 7 states and 1–3 Gaussian mixtures

Recognition rate

Overlap (%) Blocks 1 Gaussian (%) 2 Gaussians (%) 3 Gaussians (%)

75.0 43× 53 98.5 100.0 100.062.5 29× 35 99.5 100.0 100.050.0 22× 27 98.5 100.0 100.037.5 17× 21 98.0 99.5 99.525.0 15× 18 98.0 99.5 100.012.5 13× 15 95.0 99.0 99.5

(JPEG) 0.0 11× 14 94.5 98.5 99.5

recognition. Thus, we have been able to demonstratethat a high quality recognition of compressed imagesis possible if the training data has a sufficiently highoverlap. This means that the system needs the originalimages in the training phase. But this is not a practicalproblem, because in the image acquisition procedure, theuncompressed image will always be available first. Themajor advantage of this system is the fact, that it can bedirectly used on large databases that are only available inJPEG like format, including large compressed video andimage databases.

The configuration�7 × 7 states, three mixtures) determinedin the previous experiment is used to evaluate the effect of thecompression artifacts on the recognition rate. The compres-sion of the face image is done by “The Independent JPEGGroup’s JPEG software”. The Huffman decoding and theinverse quantization in Fig. 4 is adopted from this software.The recognition was performed on JPEG compressed imagesat all possible quality settings from 100 for the best quality to1 for the highest compression ratio. The result of the experi-ments is shown in Fig. 6. The recognition rate versus thecompression rate is depicted. Additionally the peak-signal-to-noise-ratio (PSNR) of the uncompressed image to theoriginal image and the quality factor, which is a compressionparameter used by the IJG encoder to control the compressionratio, is displayed. Up to a compression ratio of 7.5:1 therecognition rate is nearly constant with�99:5^ 0:5�%: Forcompression ratios over 12.5:1 (quality factor,10) therecognition rate drops below 90%.

6. Comparison with other systems

Table 3 shows a comparison of the different face recogni-tion techniques on the ORL face database sorted by the year

of publication. The recognition rate of our system is higherthan the rates of all other face recognition systems on thisdatabase. The table shows the recognition time that ismeasured by each author. This time is measured on differentcomputer systems and in different years, and allows only arough comparison. The recognition time of our system wasdetermined on a PentiumII/400 computer.

The first face recognition approach based on pseudo 2DHMMs is presented in Ref. [8] with a recognition rate of94.5%. The main differences of our approach are the use of


Table 3Recognition results of different methods for the ORL face database

Method Recognition rate (%) Recognition time (s) Reference Year

top-down HMM1 gray tonefeatures

87.0 [7] 1994

Eigenface 90.5 [8] 1994Pseudo 2D HMM1 gray tonefeatures

94.5 240 [8] 1994

Elastic matching 80.0a [18] 1997Probabilistic decision-basedNeural Net

96.0 , 0.1 [6] 1997

Convolutional Neural Network 96.2 , 0.5 [5] 1997Continuous n-tuple classifier 97.3 0.33 [19] 1997top-down HMM1 DCTcoefficients

84.0 2.5 [10] 1998

Point-matching and correlation 84.0 4–6 [20] 1998Ergodic HMM1 DCTcoefficients

99.5 3.5 [11] 1998

Pseudo 2D HMM1 DCTcoefficients

100.0 1.5 This paper 1999

a Determined on a subset of the ORL-database.

Fig. 7. Self organizing segmentation of a human face image.

DCT features instead of gray values, the use of mirroredtraining images and the common initial model. Additionally,in the proposed approach the columns of the images aremodeled by the superstates, while in Ref. [8] the rows aremodeled by the superstates. We think that this has advantagesfor the recognition of a person tilting the head. The eyes,which are very important points in the human face, are noton the same level in the image for this case. This effect can becompensated by modeling the columns by the superstates.For more complicated databases this should be an advantage.Fig. 7 shows the segmentation of a tilted face. The regions ofthe face that were aligned to the same state are framed bywhite lines. The segmentation is more accurate than thesegmentation in Ref. [8], but this segmentation is self-organizing and the result cannot be compared to the resultof facial feature detecting methods.

The first compressed domain face recognition ispresented in Ref. [21]. It was tested on the ORL-databasewith manually cropped face images and achieved a recogni-tion rate of 88%. The proposed method is able to work onuncropped images and has much better recognition results.The recognition rate of 99.5% for compressed domainprocessing is much higher than the recognition rates ofmost other systems in Table 3 and is as good as the secondbest system, which is based on HMMs and DCT features,too, but operates on the original images.

7. Conclusions and future work

This paper presented a face recognition system based onpseudo 2D Hidden Markov Models and DCT features,which is capable of recognizing faces in the compresseddomain. The recognition rate was 100% for the commontest conditions of the face database maintained by theOlivetti Research Laboratory on original images and99.5% for compressed domain recognition. This is thebest recognition rate published on this database. Acomparison with other face recognition systems evaluatedon the ORL database showed the advantages over systemspresented in other publications.

Our future work will focus on further improvements ofour system and the use of a larger face database to test thesystem. A test on the more difficult Bochum database [4]with 111 different individuals showed a recognition rateof 97% (fb: 94%, 118: 98%, 228: 97%). We will try toimprove the modeling of the faces. The use of a specialparameter sharing scheme will improve the utilizationof the face symmetries. In the future we will extendour system to work on MPEG compressed video data,motivated by the system presented in Ref. [22] which iscurrently just able to detect faces in MPEG videos. Anincorporation of a compressed domain face detectionalgorithm will create a powerful image and video data-base indexing system.

References

[1] R. Chellappa, C.L. Wilson, S.A. Sirohey, Human and machinerecognition of faces: a survey, Proceedings of IEEE 83 (5) (1995)705–740.

[2] M. Kirby, L. Sirovich, Application of the Karhunen–Loeveprocedure for the characterization of human faces, IEEE Trans-actions on Pattern Analysis and Machine Intelligence 12 (1) (1990)103–108.

[3] M. Turk, A. Pentland, Face recognition using eigenfaces, in:Proceedings of IEEE Conference on Computer Vision and PatternRecognition (CVPR), June 1991, pp. 586–591.

[4] L. Wiskott, J.-M. Fellous, N. Kruger, C. von der Malsburg, Facerecognition by elastic bunch graph matching, IEEE Transactionson Pattern Analysis and Machine Intelligence 19 (7) (1997) 775–779.

[5] S. Lawrence, C.L. Giles, A.C. Tsoi, A.D. Back, Face recognition: aconvolutional neural network approach, IEEE Transactions on NeuralNetworks 8 (1) (1997) 98–113.

[6] S.-H. Lin, S.-Y. Kung, L.-J. Lin, Face Recognition/detection by prob-abilistic decision-based neural network, IEEE Transactions on NeuralNetworks 8 (1) (1997) 114–132.

[7] F. Samaria, A. Harter, Parameterisation of a stochastic model forhuman face identification, Proceedings of IEEE Workshop onApplications of Computer Vision, Sarasota, Florida, December1994.

[8] F. Samaria, Face recognition using hidden Markov models, PhDthesis, Engineering Department, Cambridge University, October1994.

[9] B. Achermann, H. Bunke, Combination of face classifiers for personidentification, Proceedings of International Conference on PatternRecognition, August 1996, pp. C416–C420.

[10] A.V. Nefian, M.H. Hayes III, Hidden Markov models for facerecognition, Proceedings IEEE International Conference onAcoustics, Speech, and Signal Processing (ICASSP), Seattle, May1998, pp. 2721–2724.

[11] V.V. Kohir, U.B. Desai, Face recognition using DCT-HMMapproach, Workshop on Advances in Facial Image Analysis andRecognition Technology (AFIART), Freiburg, Germany, June1998.

[12] L.R. Rabiner, Tutorial on hidden Markov models and selectedapplications in speech recognition, Proceedings of IEEE 77 (2)(1989) 257–285.

[13] O.E. Agazzi, S.-S. Kuo, Pseudo two-dimensional hidden markovmodels for document recognition, AT&T Technical Journal 72 (5)(1993) 60–72.

[14] E. Levin, R. Pieraccini, Dynamic planar warping for optical characterrecognition, Proceedings of IEEE International Conference onAcoustics, Speech, and Signal Processing (ICASSP), San Francisco,California, March 1992, pp. 149–152.

[15] W.B. Pennebaker, J.L. Mitchell, JPEG Still Image Data CompressionStandard, Van Nostrand Reinhold, New York, 1993.

[16] V. Bhaskaran, K. Konstantinides, Image and Video CompressionStandards: Algorithms and Architecture, Kluwer Academic, Boston,MA, 1995.

[17] W. Kou, T. Fjallbrant, A direct computation of DCT coefficients for asignal block taken from two adjacent blocks, IEEE Transactions onSignal Processing 39 (7) (1991) 1692–1695.

[18] J. Zhang, Y. Yan, M. Lades, Face recognition: eigenface, elasticmatching, and neural nets, Proceedings of the IEEE 85 (9) (1997)1423–1435.

[19] S.M. Lucas, Face Recognition with the ContinuousN-TupleClassifier, Proceedings of British Machine Vision Conference,September 1997.

[20] K.-M. Lam, H. Yan, An analytic-to-holistic approach for facerecognition based on a single frontal view, IEEE Transactions


on Pattern Analysis and Machine Intelligence 20 (7) (1998) 673–686.

[21] N. Tsapatsoulis, N. Doulamis, A. Doulamis, S. Kollias, Faceextraction from non-uniform background and recognition incompressed domain, Proceedings of IEEE International Conference

on Acoustics, Speech, and Signal Processing (ICASSP), Seattle, May1998, pp. 2701–2704.

[22] H. Wang, S.-F. Chang, A highly efficient system for automatic faceregion detection in MPEG video, IEEE Transactions on Circuits andSystems for Video Technology 7 (4) (1997) 615–628.


recognition of jpeg compressed face images based on statistical methods

Documents