fusion of visible images and thermal image sequences …€¦ · 298 fusion of visible images and...

15
Journal of Mobile Multimedia, Vol. 10, No. 3&4 (2014) 294–308 c Rinton Press FUSION OF VISIBLE IMAGES AND THERMAL IMAGE SEQUENCES FOR AUTOMATED FACIAL EMOTION ESTIMATION HUNG NGUYEN FAN CHEN KAZUNORI KOTANI Japan Advanced Institute of Science and Technology 1-1 Asahidai, Nomi, Ishikawa, Japan [email protected], [email protected], [email protected] BAC LE University of Science, VNU - HCMC, Vietnam 227 Nguyen Van Cu, Ho Chi Minh city, Vietnam [email protected] The visible image-based approach has long been considered the most powerful approach to facial emotion estimation. However it is illumination dependency. Under uncontrolled operating conditions, estimation accuracy degrades significantly. In this paper, we focus on integrating visible images with thermal image sequences for facial emotion estimation. First, to address limitations of thermal in- frared (IR) images, such as being opaque to eyeglasses, we apply thermal Regions of Interest (t-ROIs) to sequences of thermal images. Then, wavelet transform is applied to visible images. Second, fea- tures are selected and fused from visible features and thermal features. Third, fusion decision using conventional methods, Principal Component Analysis (PCA) and Eigen-space Method based on class- features (EMC), and our proposed methods, thermal Principal Component Analysis (t-PCA) and norm Eigen-space Method based on class-features (n-EMC), is applied. Applying our suggested methods, experiments on the Kotani Thermal Facial Emotion (KTFE) database show significant improvement, proving its effectiveness. Keywords: facial emotions, thermal images, emotion estimation, feature fusion, decision fusion, ther- mal image sequences, t-PCA, n-EMC, KTFE database. 1 Introduction The detection and estimation of human emotions is a challenging task. In the last decade, automated estimation of human emotions has attracted the interest of many researchers, because such systems will have numerous applications in security, medicine, and especially human-computer interaction. Many previous works [1], [2], [3] proposed have been inclined towards developing facial expression estimation. Nevertheless, there is a lack of accurate and robust facial expression estimation methods to be deployed in uncontrolled environments. When the lighting is dim or when it does not uniformly illuminate the face, the accuracy decreases considerably. Moreover, human emotions estimation based on only the visible spectrum has proved to be difficult in cases where there are emotion changes that expressions do not show. Using thermal infrared (IR) imagery, which is not sensitive to light conditions, is a new and innovative way to fill the gap in the human emotions estimation field. Besides, human emotions could be manifested by changing temperature of face skin which is obtained by an IR camera. Consequently, thermal infrared imagery gives us more information to help us robustly estimate human emotions. Although there are many significant advantages when we use IR imagery, it has several drawbacks. Firstly, thermal data are subjected to change together with body temperature 294

Upload: others

Post on 05-Jun-2020

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: FUSION OF VISIBLE IMAGES AND THERMAL IMAGE SEQUENCES …€¦ · 298 Fusion of Visible Images and Thermal Image Sequences for Automated Facial Emotion Estimation Fig. 4. A example

Journal of Mobile Multimedia, Vol. 10, No. 3&4 (2014) 294–308c� Rinton Press

FUSION OF VISIBLE IMAGES AND THERMAL IMAGE SEQUENCESFOR AUTOMATED FACIAL EMOTION ESTIMATION

HUNG NGUYEN FAN CHEN KAZUNORI KOTANIJapan Advanced Institute of Science and Technology

1-1 Asahidai, Nomi, Ishikawa, [email protected], [email protected], [email protected]

BAC LEUniversity of Science, VNU - HCMC, Vietnam

227 Nguyen Van Cu, Ho Chi Minh city, [email protected]

The visible image-based approach has long been considered the most powerful approach to facialemotion estimation. However it is illumination dependency. Under uncontrolled operating conditions,estimation accuracy degrades significantly. In this paper, we focus on integrating visible images withthermal image sequences for facial emotion estimation. First, to address limitations of thermal in-frared (IR) images, such as being opaque to eyeglasses, we apply thermal Regions of Interest (t-ROIs)to sequences of thermal images. Then, wavelet transform is applied to visible images. Second, fea-tures are selected and fused from visible features and thermal features. Third, fusion decision usingconventional methods, Principal Component Analysis (PCA) and Eigen-space Method based on class-features (EMC), and our proposed methods, thermal Principal Component Analysis (t-PCA) and normEigen-space Method based on class-features (n-EMC), is applied. Applying our suggested methods,experiments on the Kotani Thermal Facial Emotion (KTFE) database show significant improvement,proving its effectiveness.

Keywords: facial emotions, thermal images, emotion estimation, feature fusion, decision fusion, ther-mal image sequences, t-PCA, n-EMC, KTFE database.

1 Introduction

The detection and estimation of human emotions is a challenging task. In the last decade, automatedestimation of human emotions has attracted the interest of many researchers, because such systemswill have numerous applications in security, medicine, and especially human-computer interaction.Many previous works [1], [2], [3] proposed have been inclined towards developing facial expressionestimation. Nevertheless, there is a lack of accurate and robust facial expression estimation methodsto be deployed in uncontrolled environments. When the lighting is dim or when it does not uniformlyilluminate the face, the accuracy decreases considerably. Moreover, human emotions estimation basedon only the visible spectrum has proved to be difficult in cases where there are emotion changesthat expressions do not show. Using thermal infrared (IR) imagery, which is not sensitive to lightconditions, is a new and innovative way to fill the gap in the human emotions estimation field. Besides,human emotions could be manifested by changing temperature of face skin which is obtained by anIR camera. Consequently, thermal infrared imagery gives us more information to help us robustlyestimate human emotions. Although there are many significant advantages when we use IR imagery,it has several drawbacks. Firstly, thermal data are subjected to change together with body temperature

294

Page 2: FUSION OF VISIBLE IMAGES AND THERMAL IMAGE SEQUENCES …€¦ · 298 Fusion of Visible Images and Thermal Image Sequences for Automated Facial Emotion Estimation Fig. 4. A example

H. Nguyen, F. Chen, K. Kotani, and B. Le 295

caused by variable ambient temperatures. Secondly, presence of eyeglasses may result in loss of usefulinformation around the eyes. Glass is opaque to IR, and object made of glass act as temperature screen,completely occluding the parts located behind them. Hence, the sensitivity of IR imagery is decreasedby facial occlusions. Thirdly, there are some facial regions not receptive to the emotion changes. Toeliminate the effects of these challenging problems above, we propose fusion of visible images andsequences of thermal images. To estimate seven emotions, we use the fusion of conventional methods,PCA and EMC, and our proposed methods, t-PCA and n-EMC, over obtained the fusion features.

2 Related work

In the recent years, a number of studies have demonstrated that thermal infrared imagery offers apromising alternative to visible imagery in facial emotion estimation problems by better handlingthe visible illumination changes. Jarlier et al. [4] extracted the features as representative temper-ature maps of nine action units (AUs) and used K-nearest neighbor to classify seven expressions.The database for testing has four persons and the accuracy rate is 56.4%. Khan et al. [5] suggestedusing Facial Thermal Feature Points (FTFPs), which are defined as facial points that undergo sig-nificant thermal changes in presenting an expression, and used Linear Discriminant Analysis (LDA)to classify intentional facial expressions based on Thermal Intensity Values (TIVs) recorded at theFacial Thermal Feature Points (FTFPs). The database has sixteen persons with five expressions andthe accuracy rate ranges from 66.3% to 83.8%. Trujillo et al. [6] proposed using a local and globalautomatic feature localization procedure to perform facial expression in thermal images. They usedPCA to reduce the dimension and interest point clustering to estimate facial feature localization andSupport Vector Machine (SVM) to classify three expressions. B.Hernandez et al. [7] used SVM toclassify the expressions “surprise”, “happy”, “neutral” from two inputs. The first input consists ofselections of a set of suitable regions where the feature extraction is performed, second input is theGray Level Co-occurrence Matrix used to compute region descriptors of the IR images. Nhan et al.[8] extracted time, frequency and time-frequency features from thermal infrared data to classify thenatural responses in terms of subject-indicated levels of arousal and valence stimulated by the Inter-national Affective Picture System. Yoshitomi et al. [9] used two dimensional detection of temperaturedistribution on the face using infrared rays. Based on studies in the field of psychology, several blockson the face are chosen for measuring the local temperature difference. With Back Propagation NeutralNetwork, the facial expression is recognized. The recognition accuracy reaches 90% with “neutral”,“happy”, “surprising” and “sad” expressions. However, the testing database is obtained from onlyone female frontal view. Yoshimomi generated feature vectors by using a two-dimensional DiscreteCosine Transformation (2D-DCT) to transform the grayscale values of each block in the facial areaof an image into their frequency components, and used them to recognize five expressions, including“angry”, “happy”, “neutral”, “sad”, and “surprise”. The mean expression accuracy is 80% with fourtest subjects [10]. Koda et al. used the idea from [10] and added a proposed method for efficiently up-dating of training data, by only updating the training data with “happy” and “neutral” facial expressionafter an interval [11]. The expression accuracy increased from 80% to 87% with this new approach.All these studies with thermal infrared imagery have shown that the facial temperature changing isuseful for estimating the human emotions.

Recently, a little attention has been paid to facial emotion estimation by using fusion informationfrom visible images and thermal information. Wang et al. [12] proposed both decision-level andfeature-level fusion methods using visible and IR imagery. In feature-level, they used tools for the

Page 3: FUSION OF VISIBLE IMAGES AND THERMAL IMAGE SEQUENCES …€¦ · 298 Fusion of Visible Images and Thermal Image Sequences for Automated Facial Emotion Estimation Fig. 4. A example

296 Fusion of Visible Images and Thermal Image Sequences for Automated Facial Emotion Estimation

Active Appearance Model (AAM) to extract features and extracted three features of head motionfor visible feature and calculated several statistical parameters including mean, standard deviation,minimum and maximum as IR features. To select the feature, they used F-test statistic. They alsoused Bayesians networks (BNs) and SVMs to obtain the feature fusion. In decision-level, BNs andSVMs are used to classify three emotions, happiness, fear and disgust. The results show that theirmethods improved about 1.35% accuracy compare with only using visible features. Yoshitomi etal. [13] proposed decision-level fusion of voices, visual and IR imagery to recognize the affectivestates. DCT is used to extract the visible and IR features, then two neutral networks are trainedfor obtained visible and IR features, respectively. For voice recognition, Hidden Markov Models(HMMs) are used. To decide the results, simple weighted voting is used. Following the related work,there are a few researches using fusion of visible and thermal imagery or these approaches that use theextracted features from a single infrared thermal image may lose some useful information which couldbe contained in the sequences. Therefore, we consider two methods of facial emotion estimation byfusing visible images and sequence of thermal imagery at decision-level and feature-level respectively.

Fig. 1. t-ROIs.

3 Methods

In this section, we propose a feature fusion method to integrate visible images and sequence of thermalimages by delicate selection of representative features (i.e. t-ROI) in Section 3.1, two classificationmethods (t-PCA and n-EMC) and a decision-level fusion method which automatically explores thebest fusion weights of features in Section 3.2.

3.1 Feature-level fusion

Before selecting features, we perform some preprocessings such as face normalization, noise deduc-tion.

First, with sequences of thermal images, we find the regions of interest based on t-ROIs.In our definition, interest regions are regions in which temperature increases or decreases significantlywhen human emotions change. We use the two regions which are the hottest and coldest regions of

Page 4: FUSION OF VISIBLE IMAGES AND THERMAL IMAGE SEQUENCES …€¦ · 298 Fusion of Visible Images and Thermal Image Sequences for Automated Facial Emotion Estimation Fig. 4. A example

H. Nguyen, F. Chen, K. Kotani, and B. Le 297

Fig. 2. Wavelet decomposition at level 1 and 2.

Fig. 3. Feature fusion of visible images and sequences of thermal images.

the face, except the eyeglasses, usually the forehead, eyeholes, and cheek-bone regions, as our interestregions. Before finding the t-ROIs, to avoid any ambient temperature change from frame to frame, weupdate the temperature of each point of each frame based on the difference between mean of ambienttemperature and mean of the first m frame ambient temperature.Let h be a map from face (Fa ⇢ R2) to temperature (T ⇢ R) space

h : Fa ! T

(i, j) 7! h(i, j)

We obtain the t-ROIs by using the following equations:

�TFa

= TFa

Max

� TFa

Min

; �TFa

= �TFa

/5

LFa

k,idx

= {(i, j) 2 Fa|TFa

Min

+ �TFa

⇤ (idx� 1) h(i, j) < TFa

Max

� �TFa

⇤ (5� idx)} (1)

where TFa

Max

, TFa

Min

are maximum and minimum of temperature of each human face at frame k,respectively; idx 2 {2, 5}.

After obtaining the t-ROIs for each frame, we find the dominant levels of frames. A frame a ismore dominant than frame b if only if temperature change of all t-ROIs between frame a and framea-1 is bigger than temperature change of all t-ROIs between frame b and frame b-1. Based on theobtained dominant level of each frame, we can automatically put the weight for each frame.

Second, with visible images, to eliminate of effects of noise and to omit unnecessary details, weuse multiresolution analysis with Antonini filter bank. After two level of analysis with 7/9 Antoninifilter bank, we keep coefficents of LL part wavelet transform [14]. Figure 2 shows wavelet decompo-sition at level 1 and 2 over our visible data.

Page 5: FUSION OF VISIBLE IMAGES AND THERMAL IMAGE SEQUENCES …€¦ · 298 Fusion of Visible Images and Thermal Image Sequences for Automated Facial Emotion Estimation Fig. 4. A example

298 Fusion of Visible Images and Thermal Image Sequences for Automated Facial Emotion Estimation

Fig. 4. A example procedure for fusion of visible images and sequences of thermal images.

To select the feature between visible feature and thermal feature, we perform feature-level fusionof visible and thermal image by using t-ROIs and PCA.Step 1. Find t-ROIs over sequence of thermal images.Step 2. Apply Wavelet transform over visible facial images and keep LL.Step 3. Apply PCA over accumulate the weighted discriminant t-ROI.Step 4. Build matrix from feature vectors obtained from step 2 and 3.Step 5. Using PCA, EMC, PCA-EMC, t-PCA and n-EMC to classify emotions.

Figure 4 shows a example procedure for fusion of visible images and sequences of thermal images.

3.2 Decision-level fusion

To estimate human emotions, we use a decision fusion method of three conventional methods (PCA,EMC and PCA-EMC) and our proposed methods (thermal-Principal Component Analysis (t-PCA)and norm-Eigenspace method based on class feature (n-EMC)).

With PCA, the aim is to build a face space, including the basis vectors called principal components,which better describes the face images [15]. The difference between PCA and EMC is that PCAfinds the eigenvector to maximize the total variance of the projection to line, while EMC is obtainedeigenvector to maximize the difference between the within-class and between-class variance [16]. TheFigure 5 shows the advantage of EMC over PCA.

Figure 6 shows the procedure of estimating human emotions using PCA. Figure 7 shows theprocedure of estimating human emotions using EMC.

To analyze the human emotion using thermal infrared images, we propose two methods, thermal-Principal Component Analysis (t-PCA) and norm-Eigenspace method based on class feature (n-EMC).

Page 6: FUSION OF VISIBLE IMAGES AND THERMAL IMAGE SEQUENCES …€¦ · 298 Fusion of Visible Images and Thermal Image Sequences for Automated Facial Emotion Estimation Fig. 4. A example

H. Nguyen, F. Chen, K. Kotani, and B. Le 299

Fig. 5. Examples of a eigenvector of PCA and EMC [17].

Fig. 6. Estimation of emotion using PCA.

Fig. 7. Estimation of emotion using EMC.

Page 7: FUSION OF VISIBLE IMAGES AND THERMAL IMAGE SEQUENCES …€¦ · 298 Fusion of Visible Images and Thermal Image Sequences for Automated Facial Emotion Estimation Fig. 4. A example

300 Fusion of Visible Images and Thermal Image Sequences for Automated Facial Emotion Estimation

To illustrate the feasibility of using eigen-space to fulfill facial emotion estimation task, ther-mal PCA (t-PCA) is modified from the PCA reconstruction method and evaluated over thermal data[15]. With PCA, the aim is to build a face space, including the basis vectors called principal compo-nents, which better describe the face images. PCA has several advantages over other face recognitionschemes in its speed and simplicity [15]. We modified PCA to estimate facial emotion from thermaldata.

Let F be a set of classes to be analyzed. Here, F is a set of all emotion classes. Assume that Mf

m

thermal frames of training data are given as the facial temperature pattern for each class f 2 F whereF = {anger, disgust, fear, happiness, neutral, sadness, surprise}. Let �f

i

be the i � th facialtemperature pattern where i = 1,M

f

; the dimension of �fi

, n ⇥m , is equal to the number of pixelsin a thermal frame, and each element of �f

i

indicates the temperature of each pixel.Compute the mean of training data f

=

1M

f

PMf

i=1 �f

i

and let the normalized vector be �f

i

=

f

i

� f . We seek a set of M orthonormal vectors, uf

k

, which best describes the distribution of thetraining data. The kth vector, uf

k

is chosen by

�f

k

=

1

Mf

MfX

i=1

((uf

k

)

f

i

)

2. (2)

is a maximum, subject to (uf

l

)

⌧uf

k

=

(1, if l = k.

0, if otherwise.The eigenvectors and eigenvalues are the vector uf

k

and scalar �f

k

of covariance matrix Cf =1

Mf

PMf

i=1(�f

i

(�

f

i

)

= Af

(Af

)

⌧ where Af

= [�

f

1 ,�f

2 , ...,�f

Mf]

(2) ) �f

k

=

1Mf

PMf

i=1((uf

k

)

f

i

)((uf

k

)

f

i

)

)

, �f

k

= (uf

k

)

1Mf

PMf

i=1(�f

i

)((�

f

i

)

((uf

k

)

)

)

, �f

k

= (uf

k

)

(

1Mf

PMf

i=1(�f

i

(�

f

i

)

)uf

k

, �f

k

(uf

k

)

= (uf

k

)

(

1Mf

PMf

i=1(�f

i

(�

f

i

)

)uf

k

(uf

k

)

, �f

k

(uf

k

)

= (uf

k

)

(

1Mf

PMf

i=1(�f

i

(�

f

i

)

)uf

k

(uf

k

)

, �f

k

(uf

k

)

= (uf

k

)

⌧Cf

, (�f

k

(uf

k

)

)

= ((uf

k

)

⌧Cf

)

, �f

k

(uf

k

) = (Cf

)

((uf

k

)

)

⌧ , �f

k

(uf

k

) = (Cf

)

⌧uf

k

, �f

k

(uf

k

) = (Af

(Af

)

)

⌧uf

k

, �f

k

(uf

k

) = Af

(Af

)

⌧uf

k

, �f

k

(uf

k

) = Cfuf

k

(3)

The size of covariance matrix Cf , nm ⇥ nm, is too large to determine n ⇥ m eigenvectors andeigenvalues. Turk et al. [18] suggested a computationally feasible method to find these eigenvectors.Let a matrix Hf

= (Af

)

⌧Af , then size of Hf is size M ⇥ M ⌧ nm ⇥ nm; let ⌫fi

denote theeigenvectors of Hf .

We have (Af

)

⌧Af⌫fi

= µf

i

⌫fi

, Af

(Af

)

⌧Af⌫fi

= Afµf

i

⌫fi

, Af

(Af

)

⌧Af⌫fi

= Afµf

i

⌫fi

, CfAf⌫fi

= µf

i

Af⌫fi

(4)

From Equation(4), Af⌫fi

is the eigenvector of Cf

= Af

(Af

)

⌧ [18]. Therefore we can obtain ⇢f

eigenvector of Cf by calculating the ⇢f (⇢f ⌧ nm) eigenvectors (⌫fi

) of matrix Hf and multiplying

Page 8: FUSION OF VISIBLE IMAGES AND THERMAL IMAGE SEQUENCES …€¦ · 298 Fusion of Visible Images and Thermal Image Sequences for Automated Facial Emotion Estimation Fig. 4. A example

H. Nguyen, F. Chen, K. Kotani, and B. Le 301

Fig. 8. Estimation of emotion using decision fusion.

Af to ⌫fi

. After obtaining eigenfaces from the training data of each emotion, we map facial thermaltraining data to feature spaces by !f

i

(train) = (uf

i

)

(�

f � f

), i = 1, ⇢f .We use the idea that if the input frame is much similar to some emotion training set, the recon-

structed data will have less distortion than the data reconstructed from other eigenvectors of trainingemotions [15]. For each testing facial thermal frame �

test

, firstly we project it onto the eigenfaces ofeach class.

!f

test

= (Uf

)

(�

test

� f

), where Uf

= (uf

i

), i = 1, ⇢f .Secondly, for each emotion, we find the feature which is most similar to the testing projected vectorby calculate the angle between vector of training feature space and testing projected vector.

�f

= argmax

i

!f

test

!f

i

(train)

k!f

test

kk!f

i

(train)k; i = 1, ⇢f . (5)

Thirdly, we find reconstruction of the testing data by the obtained feature in each class.�

f

reconst

= Uf!f

f +

f

Finally, we choose an emotion in which the reconstructed testing data and the original data are themost similar.

� = argmax

f

test

f

reconst

k�test

kk�freconst

k; f = 1, 7 (6)

The second valuation to estimate human emotion uses n-EMC over thermal data. n-EMC is mod-ified from EMC [19]. The difference between EMC and n-EMC is formulation to calculate the differ-ence between the within-class and between-class variance.

In mathematics, with n-EMC, instead of finding the eigenvectors, uf

k

and eigenvalues �f

k

of co-variance matrix Cf = 1

Mf

PMf

i=1(�f

i

(�

f

i

)

= Af

(Af

)

⌧ where Af

= [�

f

1 ,�f

2 , ...,�f

Mf], we find

eigenvectors, uf

k

and eigenvalues �f

k

of matrix S = kSB

� SW

k2 where

M =

X

f2F

Mf

(7)

f

=

1

Mf

MfX

i=1

f

i

; =

1

M

X

f2F

MfX

i=1

f

i

(8)

Page 9: FUSION OF VISIBLE IMAGES AND THERMAL IMAGE SEQUENCES …€¦ · 298 Fusion of Visible Images and Thermal Image Sequences for Automated Facial Emotion Estimation Fig. 4. A example

302 Fusion of Visible Images and Thermal Image Sequences for Automated Facial Emotion Estimation

SB

=

1

M

X

f2F

Mf

k f

� k2k f

� k⌧2 . (9)

SW

=

1

M

X

f2F

MfX

i=1

k�fi

� f

k2k�fi

� f

k⌧2 . (10)

For each testing facial thermal frame �test

, firstly we project it onto the eigenfaces of each class.!f

test

= (Uf

)

(�

test

� f

), where Uf

= (uf

i

), i = 1, ⇢f .Secondly, for each emotion, we find the feature which is most similar to the testing projected vectorby calculate the angle between vector of training feature space and testing projected vector.

�f

= argmax

i

!f

test

!f

i

(train)

k!f

test

kk!f

i

(train)k; i = 1, ⇢f . (11)

Finally, we choose an emotion which has maximum of �f

� = argmax

f

�f

; f = 1, 7 (12)

To estimate human emotions, we use decision fusion method of PCA, t-PCA, EMC and n-EMC.Figure 8 shows the general procedure to estimate human emotions using decision fusion.

When using decision fusion of PCA (t-PCA), we used the estimation of emotion module as de-scribed in figure 6. To determine the best class of emotions, after using PCA (t-PCA), the votingmethod with weights is used. The weights are set to fusion data and visible image, respectively. Wedetermine the emotion class f of input image by choosing j satisfied minimum of following equation:

f = argmin�w1 ⇤MSEV I

j

+ w2 ⇤MSEFU

j

�(13)

where MSEV I

j

and MSEFU

j

are mean square errors calculated at class j of visible image and fusiondata. We set w1 =

43 and w2 =

23 in experiment.

To estimate human emotion using decision fusion of EMC (n-EMC), we used the estimation ofemotion module as described in figure 7. To determine the best class of emotions, after using EMC(n-EMC), the voting method with weights is used. The weights are set to fusion data and visibleimage, respectively.

We determine the emotion class f of input image by choosing j satisfied maximum of followingequation:

k = maxi

fV I

h

⇤ FV I

i

kfV I

h

k ⇤ kFV I

i

k, i = 1, n (14)

g = maxi

fFU

h

⇤ FFU

i

kfFU

h

k ⇤ kFFU

i

k, i = 1, n (15)

f = argmax (w1 ⇤ k + w2 ⇤ g) , (16)

where n is a number of the training images of class j; fV I

h

and fFU

h

are testing image h of visibleimage and fusion data, respectively; FV I

i

and FFU

i

are vector i of eigenface of visible image andfusion data, respectively. We set w1 =

23 and w2 =

43 in experiment.

Page 10: FUSION OF VISIBLE IMAGES AND THERMAL IMAGE SEQUENCES …€¦ · 298 Fusion of Visible Images and Thermal Image Sequences for Automated Facial Emotion Estimation Fig. 4. A example

H. Nguyen, F. Chen, K. Kotani, and B. Le 303

Fig. 9. Sample thermal and visible images of seven emotions.

Fig. 10. Sample sequence of thermal images.

Page 11: FUSION OF VISIBLE IMAGES AND THERMAL IMAGE SEQUENCES …€¦ · 298 Fusion of Visible Images and Thermal Image Sequences for Automated Facial Emotion Estimation Fig. 4. A example

304 Fusion of Visible Images and Thermal Image Sequences for Automated Facial Emotion Estimation

Fig. 11. Facial emotion estimation results using EMC.

4 Experiments and analysis

4.1 Database

The KTFE database [17] includes 152GB visible and thermal facial emotion videos, visible facialexpression image database and thermal facial expression image database. This database contains 30subjects who are Vietnamese, Japanese, Thai and Chinese from 11 year-old to 32 year-old with sevenemotions. The example of visible and thermal images is shown in Figure9.

From draw data of KTFE database, we extract manually visible images and sequences of thermalimages based on self-reports of participants, expressions and changing of facial temperatures. Caus-ing the time-lag phenomenon, the sequence of thermal images are designed from a frame which weextracted the visible image to a frame which is after the participant emotion is neutral. Figure 10shows a sample sequence of thermal images.

4.2 Experimental results

In our experiments, we separate the training and testing data as 60% and 40% of total visible images,sequence of thermal images, and fusion of visible image and thermal image sequence.

Figure 11 shows the results of facial emotion estimation of EMC with visible images (vi EMC),sequence of thermal images (ther EMC) and fusion of visible images and sequences of thermal images(fu EMC). Following [20], accuracy of estimating human emotion using thermal images is lower thanusing visible images. Because emotions of thermal images are always not clearer than emotionsof visible images. Therefore, with EMC methods, good for classification, the results using visibleimages are better than results using thermal images. However, with our new results, accuracy ofestimating human emotion using sequences of thermal images is better than using visible images.When using fusion information, the average accuracy increases of 2.77% in compare with using onlyvisible features, especially for happiness. In general, average accuracy of each emotion increases

Page 12: FUSION OF VISIBLE IMAGES AND THERMAL IMAGE SEQUENCES …€¦ · 298 Fusion of Visible Images and Thermal Image Sequences for Automated Facial Emotion Estimation Fig. 4. A example

H. Nguyen, F. Chen, K. Kotani, and B. Le 305

Fig. 12. Facial emotion estimation results using n-EMC.

when we use fusion information. The results prove the necessary of fusion information.

Figure 12 shows the results of facial emotion estimation of n-EMC with visible images (vi n-EMC), sequence of thermal images (ther n-EMC) and fusion of visible images and sequences ofthermal images (fu n-EMC). With n-EMC, the average accuracy using visible images, sequences ofthermal images and fusion of visible images and sequences of thermal images increases 3.15%, 2.41%,2.35%, respectively compared with EMC. Similar to the results using EMC, the accuracy using fusiondata is better than using other data.

Fig.13 shows the results of facial emotion estimation of PCA with visible images (vi PCA), se-quences of thermal images (ther PCA) and fusion of visible images and sequences of thermal images(fu PCA). With PCA, accuracy using thermal images is better than accuracy using visible images. Al-though, emotions of thermal images are not clearer than emotion of visible image, PCA works betterthan EMC, which is good to classify each emotion. In general, with PCA, using fusion data gives thebest results comparing using thermal images, visible images and sequence of thermal images.

Fig.14 shows the results of facial emotion estimation of t-PCA with visible images (vi t-PCA),sequences of thermal images (ther t-PCA) and fusion of visible images and sequences of thermalimages (fu t-PCA). With t-PCA, the average accuracy using visible images, sequences of thermalimages and fusion of visible images and sequences of thermal images increases 2.03%, 1.31%, 0.48%respectively in compare with PCA. Especially, for disgust with visible images, accuracy improvementis 11.97%. Our method, t-PCA, yields an average improvement of 2.33% in performance of facialemotion estimation compared with using only visible features.

In conclusion, comparing the results of visible images, sequences of thermal images, and fusiondata, the accuracy of estimating emotion using fusion data is better than the accuracy of estimationemotion using only visible images, thermal images and sequences of thermal images.

Page 13: FUSION OF VISIBLE IMAGES AND THERMAL IMAGE SEQUENCES …€¦ · 298 Fusion of Visible Images and Thermal Image Sequences for Automated Facial Emotion Estimation Fig. 4. A example

306 Fusion of Visible Images and Thermal Image Sequences for Automated Facial Emotion Estimation

Fig. 13. Facial emotion estimation results using PCA.

Fig. 14. Facial emotion estimation results using t-PCA.

Page 14: FUSION OF VISIBLE IMAGES AND THERMAL IMAGE SEQUENCES …€¦ · 298 Fusion of Visible Images and Thermal Image Sequences for Automated Facial Emotion Estimation Fig. 4. A example

H. Nguyen, F. Chen, K. Kotani, and B. Le 307

5 ConclusionsIn this paper, we have proposed the fusion of visible features and thermal features for estimating hu-man emotions. Specially, two classification methods, t-PCA and n-EMC, are proposed to performdecision-level fusion. Our experiment on KTFE spontaneous database show that our methods yieldand average improvement of 2.74% in performance of facial emotion estimation compared with usingonly visible features. Our method has several advantaged points. First, to the best of our knowledge,this is one of the first methods using sequences of thermal images. Emotion is complex action ofhuman. To understand it clearly, using a single image cannot figure out the exact emotion. Besides,using thermal information with single frame cannot give the right emotion. Therefore, it is necessaryto use sequences of thermal images. Second, with t-ROIs, we fill the gaps of thermal image, eyeglassproblem. Third, using the weight discriminant features help our method decrease the running cost andincrease accuracy rate. Because there are some frames more important than others, the weights setfor frames are necessary. Fourth, using wavelet transform for visible image gives several advantagessuch as to reduce the unnecessary details, and so on. The fusion features, obtained from importantvisible features and necessary thermal feature, are better than only visible and thermal features. Twoproposed classification methods, t-PCA and n-EMC, are successful to improve estimation accuracy.We also suggest decision fusion with weighted similarity measure for the conventional methods (PCAand EMC) and the our proposed methods (t-PCA, n-EMC) to increase the estimation accuracy. Exper-iments are tested in fusion database, specially designed from KTFE database. The results prove thatthe fusion of visible images and thermal image sequences performs better than either of the data. As afuture work, we plant to improve the t-ROIs considerably, and also investigate more sophisticated fu-sion techniques for visible images and thermal image sequences and sequences of visible and thermalimages.

References

1. Z.Zeng, M.Pantic, G.T.Roisman, and T.S.Huang (2009), A survey of affect recognition methods: Audio, visual,and spontaneous expressions, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 31 (1),pp. 39-58.

2. B.Fasel and J.Luettin (2003), Automatic facial expression analysis: a survey, Pattern Recognition, Vol. 36(1),pp. 259-275.

3. M.Pantic, S.Member, L.J.M.Rothkrantz (2000) Automatic analysis of facial expressions: The state of the art,IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22, pp. 14241445.

4. S.Jarlier,D.Grandjean, S.Delplanque, K.N0Diaye, I.Cayeux, M.Velazco, D.Sander, P.Vuilleumier, andK.Schere (2011), Thermal Analysis of Facial Muscles Contractions, IEEE Transaction on Affective Com-puting, Vol. 2 (1), pp. 2-9.

5. M.M.Khan, R.D.Ward, and M.Ingleby (2009), Classifying pretended and evoked facial expression of positiveand negative affective states using infrared measurement of skin temperature, ACM Transactions on AppliedPerception, Vol. 6 (1), pp. 1-22.

6. L.Trujillo, G.Olague, R.Hammoud, and B.Hernandez (2005), Automatic feature localization in thermal im-ages for facial expression recognition, IEEE Computer Society Conference on Computer Vision and PatternRecognition-Workshops, pp. 14.

7. B.Hernandez, G.Olague, R.Hammoud, L.Trujillo, and E.Romero (2007), Visual learning of texture descriptorsfor facial expression recognition in thermal imagery, Computer Vision and Image Understanding, Vol. 106,pp. 258 - 269.

8. B.R.Nhan, and T.Chau, T (2010), Classifying affective states using thermal infrared imaging of the humanface, IEEE Transactions on Biomedical Engineering, Vol. 57, pp. 979-987.

9. Y.Yoshitomi, N.Miyawaki, S.Tomita, and S.Kimura (1997), Facial expression recognition using thermal im-

Page 15: FUSION OF VISIBLE IMAGES AND THERMAL IMAGE SEQUENCES …€¦ · 298 Fusion of Visible Images and Thermal Image Sequences for Automated Facial Emotion Estimation Fig. 4. A example

308 Fusion of Visible Images and Thermal Image Sequences for Automated Facial Emotion Estimation

age processing and neural network, Robot and Human Communication, ROMAN ’97 Proceedings , 6th IEEEInternational Workshop, pp. 380 - 385.

10. Y.Yoshitomi (2010), Facial expression recognition for speaker using thermal image processing and speechrecognition system, Proceedings of the 10th WSEAS International Conference on Applied Computer Science,pp. 182-186.

11. Y.Koda, Y.Yoshitomi, M.Nakano, and M.Tabuse (2009),A facial expression recognition for a speaker of aphoneme of vowel using thermal image processing and a speech recognition system, The 18th IEEE Interna-tional Symposium on Robot and Human Interactive Communication, ROMAN 2009, pp. 955-960.

12. S.Wang, S.He, Y.Wu, M.He, and Q.Ji (2014), Fusion of visible and thermal images for facial expressionrecognition, Frontiers of Computer Science, Vol 8(2), pp. 232-242.

13. Y.Yoshitomi, S.Kim, T.Kawano, and T.Kilazoe (2000), Effect of sensor fusion for recognition of emotionalstates using voice, face image and thermal image of face, Proceedings of the 9th IEEE International Workshopon Robot and Human Interactive Communication, pp.178 -183.

14. M.Antonini, M.Barlaud, P.Mathieu, and I.Daubechies (1992), Image coding using wavelet transform, IEEETransactions on Image Processing, Vol. 1, pp.205 -220.

15. D.T.LIN (2006), Facial expression classification using PCA and hierarchical radial basis function network,Journal of Information Science and Engineering, Vol. 22, pp. 1033-1046.

16. T.Yabui, Y.Kenmochi, and K.Kotani, Facial expression analysis from 3D range images; comparison with theanalysis from 2D images and their integration, 2003 International Conference on Image Processing, Vol.2,pp.879-882.

17. H.Nguyen, K.Kotani, F.Chen, and B.Le (2014), A thermal facial emotion database and its analysis, Imageand Video Technology, Lecture Notes in Computer Science, Vol. 8333, pp. 397-408.

18. M. Turk and A. Pentland (1991), Eigenfaces for recognition, Journal of Cognitive Neuroscience, Vol.3, pp.71-86.

19. T.Kurozumi, Y.Shinza, Y.Kenmochi, and K.Kotani (1999), Facial individuality and expression analysis byeigenspace method based on class features or multiple discriminant analysis, 1999 International Conferenceon Image Processing, Vol.1, pp. 648-652.

20. H.Nguyen, F.Chen, K.Kotani, and B.Le (2014), Human emotion estimation using wavelet transform and t-ROIs for fusion of visible images and thermal image sequences, Computational Science and Its ApplicationsICCSA 2014, Lecture Notes in Computer Science, Vol.8584, pp. 224-235.