a talking profile to distinguish identical twins · model (emrm), is illustrated in figure 1. 1)...

6
A Talking Profile to Distinguish Identical Twins Li Zhang 1 , Hossein Nejati 1 , Lewis Foo 1 , Keng Teck Ma 1 ,Dong Guo 2 , Terence Sim 1 1 School of Computing, National University of Singapore, 2 Facebook Inc., Palo Alto, CA {lizhang,tsim}@comp.nus.edu.sg, [email protected] Abstract— Identical twins pose a great challenge to face recognition due to high similarities in their appearances. Moti- vated by the psychological findings that facial motion contains identity signatures and the observation that twins may look alike but behave differently, we develop a talking profile to use the identity signatures in the facial motion to distinguish between identical twins. The talking profile for a subject is defined as a collection of multiple types of usual face motions from the video. Given two talking profiles, we compute the similarities of all same type of face motion in both profiles and then perform the classification based on those similarities. Our approach, named Exceptional Motion Reporting Model (EMRM), is unrelated with appearance, and can handle realistic facial motion in human subjects, with no restrictions of speed of motion, or video frame rate. The experimental results on a video database containing 39 pairs of twins demonstrate that identical twins can be distinguished by their talking profiles. Moreover, we also apply our approach on non-twin population on a moderate YouTube dataset, with results verifying that the talking profile can be the potential biometric. I. I NTRODUCTION The occurance of twins has progressively increased in the past decades as twins birth rate has risen to 32.2 per 1000 birth with an average 3% growth per year since 1990 [1]. With the increase of twins, identical twins are becoming more common. This, in turn, is urging biometric identification systems to accurately distinguish between twin siblings. Although identical twins represent a minority (0.2% of the world’s population), it is worth noting that they equal the whole population of countries like Portugal or Greece. Therefore failing to identify them is a significant hindrance for the success of biometric systems. Identical twins share the same DNA code and therefore they look extremely alike. Nevertheless, some biometrics depend not only on the genetic signature but also on the individual development in the womb. As a result, identical twins have some different biometrics such as fingerprint and retina. Sev- eral researchers have taken advantage of this fact and have shown promising results in automatic recognition systems that use these discriminating traits: fingerprint [2], palmprint [3], iris [4] and combinations of some of the above biometrics [5]. However, these biometrics require the cooperation of the subject. Thus, it is still desirable to identify twins by pure facial features, since they are non-intrusive, they do not require explicit cooperation of the subject and are widely available from photos or videos captured by ordinary cam- corders. Unfortunately, the high similarity between identical twins is known to be a great challenge for face recognition systems. The performance of various face appearance based approaches on recognizing identical twins recently has been questioned [5], [6]. They both confirmed the difficulties encountered by appearance based face recognition systems on twin databases, and strongly suggested the need for new ways to improve performance of recognizing identical twins. In psychology, it has been demonstrated that the human visual system utilizes both appearance and facial motion for face recognition [7], [8]. Appearance information provides the first route for face recognition, while the dynamic signa- tures information embedded in facial motion are processed in the superior temporal sulcus and provide a secondary route for face recognition. This secondary route for face recogni- tion, also known as supplemental information hypothesis, is supported by many works both in psychology and computer vision [7], [9]. The traditional appearance face recognition simulates the first route to recognize faces, but fails to effectively distin- guish between identical twins. Considering both studies that verify facial motion contains identity information, and the observation that twins may look alike but behave differently, we propose to use talking profiles which consists of multiple type of usual face motions to recognize twins. Our intention is to simulate the secondary route for face recognition. The flowchart of our approach, named Exceptional Motion Report Model (EMRM), is illustrated in Figure 1. 1) Given a video, the talking profile is extracted. The profile consists of multiple types of usual face motions, such as pose change and 2D in-plane movement. Each type of motion in the profile is represented as a sequence of local motions between two adjacent frames. 2) Given two talking profiles, we compute the similarity of each pair of corresponding face motions of the same type from both talking profiles. Given a pair of corresponding face motions, since they may be unsynchronized, i.e. dif- ferent frame rate or speed, we perform a motion alignment in advance. The alignment is achieved by minimizing the abnormality function. A gaussian mixture model is employed to estimate the abnormality of each local motion. This ab- normality scheme is inspired from [10] in psychology which proves that humans use exceptional motions to identify faces. This is also observed in real life that humans always use a person’s peculiar head motion (e.g. tilt) rather than common head movement to help recognition. After alignment, the similarity of this pair of corresponding face motions is computed as the weighted sum of local motion similarity. 3) Given the similarities of every pair of corresponding face motions in talking profiles, we perform a SVM clas- sification on those similarities. Also, our model is set in verification mode, that is, to claim the faces in two videos

Upload: others

Post on 15-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Talking Profile to Distinguish Identical Twins · Model (EMRM), is illustrated in Figure 1. 1) Given a video, the talking profile is extracted. The profile consists of multiple

A Talking Profile to Distinguish Identical Twins

Li Zhang1, Hossein Nejati1, Lewis Foo1, Keng Teck Ma1,Dong Guo2, Terence Sim1

1 School of Computing, National University of Singapore, 2 Facebook Inc., Palo Alto, CA{lizhang,tsim}@comp.nus.edu.sg, [email protected]

Abstract— Identical twins pose a great challenge to facerecognition due to high similarities in their appearances. Moti-vated by the psychological findings that facial motion containsidentity signatures and the observation that twins may lookalike but behave differently, we develop a talking profile touse the identity signatures in the facial motion to distinguishbetween identical twins. The talking profile for a subject isdefined as a collection of multiple types of usual face motionsfrom the video. Given two talking profiles, we compute thesimilarities of all same type of face motion in both profilesand then perform the classification based on those similarities.Our approach, named Exceptional Motion Reporting Model(EMRM), is unrelated with appearance, and can handle realisticfacial motion in human subjects, with no restrictions of speedof motion, or video frame rate. The experimental results on avideo database containing 39 pairs of twins demonstrate thatidentical twins can be distinguished by their talking profiles.Moreover, we also apply our approach on non-twin populationon a moderate YouTube dataset, with results verifying that thetalking profile can be the potential biometric.

I. INTRODUCTION

The occurance of twins has progressively increased inthe past decades as twins birth rate has risen to 32.2 per1000 birth with an average 3% growth per year since1990 [1]. With the increase of twins, identical twins arebecoming more common. This, in turn, is urging biometricidentification systems to accurately distinguish between twinsiblings. Although identical twins represent a minority (0.2%of the world’s population), it is worth noting that theyequal the whole population of countries like Portugal orGreece. Therefore failing to identify them is a significanthindrance for the success of biometric systems. Identicaltwins share the same DNA code and therefore they lookextremely alike. Nevertheless, some biometrics depend notonly on the genetic signature but also on the individualdevelopment in the womb. As a result, identical twins havesome different biometrics such as fingerprint and retina. Sev-eral researchers have taken advantage of this fact and haveshown promising results in automatic recognition systemsthat use these discriminating traits: fingerprint [2], palmprint[3], iris [4] and combinations of some of the above biometrics[5]. However, these biometrics require the cooperation ofthe subject. Thus, it is still desirable to identify twins bypure facial features, since they are non-intrusive, they do notrequire explicit cooperation of the subject and are widelyavailable from photos or videos captured by ordinary cam-corders. Unfortunately, the high similarity between identicaltwins is known to be a great challenge for face recognitionsystems. The performance of various face appearance basedapproaches on recognizing identical twins recently has been

questioned [5], [6]. They both confirmed the difficultiesencountered by appearance based face recognition systemson twin databases, and strongly suggested the need for newways to improve performance of recognizing identical twins.

In psychology, it has been demonstrated that the humanvisual system utilizes both appearance and facial motion forface recognition [7], [8]. Appearance information providesthe first route for face recognition, while the dynamic signa-tures information embedded in facial motion are processed inthe superior temporal sulcus and provide a secondary routefor face recognition. This secondary route for face recogni-tion, also known as supplemental information hypothesis, issupported by many works both in psychology and computervision [7], [9].

The traditional appearance face recognition simulates thefirst route to recognize faces, but fails to effectively distin-guish between identical twins. Considering both studies thatverify facial motion contains identity information, and theobservation that twins may look alike but behave differently,we propose to use talking profiles which consists of multipletype of usual face motions to recognize twins. Our intentionis to simulate the secondary route for face recognition. Theflowchart of our approach, named Exceptional Motion ReportModel (EMRM), is illustrated in Figure 1.

1) Given a video, the talking profile is extracted. Theprofile consists of multiple types of usual face motions, suchas pose change and 2D in-plane movement. Each type ofmotion in the profile is represented as a sequence of localmotions between two adjacent frames.

2) Given two talking profiles, we compute the similarityof each pair of corresponding face motions of the same typefrom both talking profiles. Given a pair of correspondingface motions, since they may be unsynchronized, i.e. dif-ferent frame rate or speed, we perform a motion alignmentin advance. The alignment is achieved by minimizing theabnormality function. A gaussian mixture model is employedto estimate the abnormality of each local motion. This ab-normality scheme is inspired from [10] in psychology whichproves that humans use exceptional motions to identify faces.This is also observed in real life that humans always use aperson’s peculiar head motion (e.g. tilt) rather than commonhead movement to help recognition. After alignment, thesimilarity of this pair of corresponding face motions iscomputed as the weighted sum of local motion similarity.

3) Given the similarities of every pair of correspondingface motions in talking profiles, we perform a SVM clas-sification on those similarities. Also, our model is set inverification mode, that is, to claim the faces in two videos

Page 2: A Talking Profile to Distinguish Identical Twins · Model (EMRM), is illustrated in Figure 1. 1) Given a video, the talking profile is extracted. The profile consists of multiple

Probe Video Clip

Profile Extractor

Talking Profile

GMM

Abnormality Weighting

Dynamic Programming

Temporal Alignment

SVM

Gallery Video clip

Classification

Fig. 1. Flowchart of our approach, Exception Motion Report Model (EMRM).

are genuine or imposters.We test our algorithm by conducting several experiments

on a free talking video database with 39 pairs of identicaltwins1. Our experimental results shows that the talkingprofile can be used to accurately distinguish between iden-tical twins. We further apply talking profile on a moderatefree talking video database of 20 subjects from YouTube.Results from our second set of experiments are in agreementwith the psychological findings that facial motion containsidentity signatures and demonstrate that the talking profilehas potential to be used in biometrics.

The contributions of our work are four fold: 1) we showthat talking profiles can be used to distinguish between iden-tical twins. 2) we propose a novel EMRM to analyze facialmotion in video, which also provides a general frameworkof using abnormality for recognition. 3) our experimentson YouTube dataset supports psychological findings thatfacial motion does provide identity signatures. 4) we alsosuggest the most discriminating face motion as well as thebest gallery video sampling rates for archival. We continueby introducing existing twins recognition and motion basedface recognition works in Section II. In Section III, wedescribe the details of our model. We present our datasetsand experiments in Section IV, concluding in Section V.

II. RELATED WORK

A. Face Recognition for Identical Twins

To our best knowledge, there are limited works on identi-cal twin recognition using 2D face biometric [5], [6], [11].Sun et al. [5] were the first to evaluate the performance of ap-pearance based face recognition to distinguish between twins.They compared with performances of iris, fingerprint and afusion of them. Their database was collected in 2007 at thefourth Annual Festival of Beijing Twins Day. The face subsetused in the experiments contained 134 subjects, each havingaround 20 images. All images were collected during a singlesession over a short interval. Experiments were conductedusing the FaceVACS commercial matcher and showed thatidentical twins are a challenge to current face recognitionsystems. Phillips et.al [6] thoroughly extended the analysis ofthe performance of face recognition systems in distinguishing

1This is the largest video database of identical twins, known to authors

between identical twins on another database collected at theTwins Days festival in Ohio in 2009 and 2010. It consistedof images of 126 pairs of identical twins collected on thesame day and 24 pairs with images collected one year apart.Facial recognition performance was tested using three of thetop submissions to the Still Face Track at Multiple BiometricEvaluation 2010. Based on their experimental results thebest performance was observed under ideal conditions (sameday, studio lighting and neutral expression), and under morerealistic conditions, distinguishing between identical twinswas very challenging. Klare et.al [11] analyzed the featuresof each facial component to distinguish identical twins fromthe same database in [6]. They also analyzed the possibilityof using facial marks to distinguish identical twins. Theyalso confirmed the challenge of recognizing identical twinsmerely based on appearance. All these works showed theneed for new approaches to help improve performance whenrecognizing identical twins.

B. Motion Based Face Recognition

Psychological studies have shown that humans better rec-ognize faces with expressive motion. Hill and Johnston [7]showed in their experiments that humans utilize rigid headmovements in face recognition. Thornton and Kourtzi [12]observed that showing moving face images rather than staticface images in training sessions improves human subjects’performances in face recognition. Pilz et al. [9] claimed fur-ther that moving images not only increased recognition rate,but also reduced reaction time. These psychological findingsimply that face motions contain considerable identity infor-mation which is fairly reliable for face recognition. Thereare some recent attempts [8], [13] in computer vision to useface motions for face recognition. They computed either adense or sparse displacement on tracked points and used it toidentify general human subjects. Tulyakov et.al [8] manuallymarked some landmark points and used their displacementas features for face recognition. Ye et.al [13] proposed tocompute the dense optical flow from the neutral to the apexof smile, then used the optical flow field as feature for recog-nition. Later they designed a local deformation pattern as afeature and tried to find identity signature between differentexpressions. All these works achieved some breakthroughin motion based face recognition, but all of them only

Page 3: A Talking Profile to Distinguish Identical Twins · Model (EMRM), is illustrated in Figure 1. 1) Given a video, the talking profile is extracted. The profile consists of multiple

considered general population and required the cooperationof subjects. Moreover, among all possible face motions, onlyfacial expressions are considered in these works, while manyother types of face motions remain unexplored.

One recent work tried to apply motion based face recog-nition on the identical twins problem [14]. Their proposalonly focused on expression and required fixed heads positionwithout any movement. Our work is different from [14] inthree aspects. First, in our model, we do not require any facealignment while they required very accurate face alignment.Second, our work uses 6 different types of usual face motionswhile they only used facial expressions. Third, our algorithmcan handle occurrence of multiple and unknown motions ina single video, while they can only handle the existence ofone expression per video.

III. PROPOSED ALGORITHM

In this section we describe the details of our model,Exception Motion Reporting Model (EMRM), to use talkingprofiles for face recognition. We follow the setting in LabeledFace in the Wild [15] to set EMRM in verification model,that is, to claim the faces in two videos are imposters orgenuines. There are in total five steps in the EMRM. Thefirst step is to extract talking profile in each video. Thesecond and third steps are to assign the abnormality weightand compute local motion similarity. The fourth step is topresent how to align two face motion of same type fromtwo talking profiles and compute their similarity. The finalstep is to use the similarities from all the corresponding pairsin talking profiles for classification using SVM.

A. Extracting Talking Profile

For each video, the talking profile is comprised of multipletypes of usual face motions. In our work, it includes 6types: 2D in-plane translation, pose change, gaze change,pupil movement and eye/mouth open-closing magnitude (i.e.the extent that the eyes/mouths are open). Various toolshave been released to extract these information from thevideo, such as Luxand, Pitpatts and Omron SDK. In ourimplementation, we use Omron SDK to extract the talkingprofile. Note each type of face motion in talking profile isa sequence of local motions between two adjacent frames.We perform a sampling before processing to test the ro-bustness when the probe and the gallery have differentframe rate and save computation workload. Assume thesample rate is FPS (i.e. the local motion is computedbetween each FPS frames). We define talking profiles asTP = {φhead, φgaze, φpose, φpupil, φeye, φmouth}. Assumeφi ∈ TP is one type of face motion, then it can be expressedas follows in the temporal order, where ςi is a local motion:

φi = {ς1, ς2, · · · , ςt} (1)

B. Encoding Local Motion Abnormality

For a talking profile, there are 6 different types of facemotions, and each type of face motion is a sequence oflocal motions. In this section, we address how to encode the

abnormality weight to each local motion. The motivation isfrom psychological studies [10] which prove that the humanvisual system uses visual abnormality for recognizing faces.

Considering human variations (e.g. ethnicity, gender,etc), we employ a Gaussian Mixture Model, G ={g1, g2, g3, ..., gτ} ∀i, gi ∼ N(µi, σi), to estimate the localmotion distribution in each type of face motion space. µi andσi are estimated by using expectation maximization [16].In our implementation, we will run the gaussian mixturemodel six times as we have 6 types of face motions. Thengiven a local motion ς ∈ φi, we use maximum likelihoodto find its intrinsic gaussian distribution as κ, where ς ∈gκ ⇐⇒ κ = argmax

iP (ς|gi). After knowing its gaussian

distribution, the probability of this local motion can becomputed as P (ς|gκ). Then, we approximate its abnormality,ω, as ω(ς) = 1− P (ς|g).

C. Computing Local Motion Similarity

Given two talking profiles, each with 6 type of facemotions, we need to compute the similarity of correspondingface motions both talking profiles. In our work, six similari-ties in total will be computed, each for one type of face mo-tion. Here, we choose pose change as an example. Assumethere are two pose change sequences from the two talkingprofiles, φ = {ς1, ς2, · · · , ςn} and ϕ = {ς ′1, ς ′2, · · · , ς ′m},we first define how to compute the local motion similaritybetween two local motions ςi ∈ φ, ςj ∈ ϕ and then performmotion alignment.

We define the local motion similarity W in Eq 2.

W (ςi, ς′j) =

ω(ςi)× ω(ςj)× (Sim(ςi, ς′j))

(DRAD(gs, g′t)2 + C1)

(2)

ω(ςi) = (1− P (ςi|g)); ςi ∈ g (3)s = argmax

kP (ςi|gk) (4)

t = argmaxk

P (ς ′j |gk) (5)

where ω(ςi) and ω(ςj) are the abnormality for each localmotion, ςi and ς ′j , respectively. Sim(ςi, ς

′j) is the similarity

of the motion in Euclidean space. DRAD(gs, g′t) is the

difference between two gaussian distributions, gsand g′t. gsis the intrinsic gaussian distribution of ςi, and g′t is theintrinsic gaussian distribution of ς ′j . Note gsand g′t can besame or different. To estimate the difference between twogaussian distributions, the common way is the Kullback-Leibler Divergence [17], defined as Eq 6.

DKL(p ‖ q) .=

∫p(x)log2(

p(x)

q(x))dx (6)

where p(x) and q(x) are distributions. KL distance is non-negative and equal to zero iff p(x) ≡ q(x), however, itis asymmetric. Thus, we use its symmetrical extension,Resistor-Average Distance (RAD), defined as Eq 7.

DRAD(p, q) = [DKL(p ‖ q)−1 +DKL(p ‖ q)−1]−1 (7)

Similar to KL, RAD is non-negative and equal to zero iffp(x) ≡ q(x). Moreover, it is symmetric.

Page 4: A Talking Profile to Distinguish Identical Twins · Model (EMRM), is illustrated in Figure 1. 1) Given a video, the talking profile is extracted. The profile consists of multiple

From Eq. 2, several points can be concluded. First, wereward the more abnormal local motions, seen ςi and ς ′j .The larger ςi and ς ′j are, the more abnormal W (ςi, ς

′j) will

be. Secondly, the higher the similarity of the local motionsin Euclidean space is, the higher the final local motionsimilarity is, seen Sim(ςi, ς

′j)). In our implementation, we

define the similarity in Euclidean space as the inverse of theEuclidean distance between ςi and ς ′j . Note ςi and ς ′j arelocal motions from the same type of face motion, thus theyare vectors of same length in Euclidean space. Thirdly, if theintrinsic distribution of ςi and ς ′j is same, then DRAD(gs, g

′t

is equal to zero, otherwise it will be larger than 0. Therefore,W (ςi, ς

′j) penalizes the situation when ςi and ς ′j are from

different gaussian distribution. C1 is a constant to avoid zerodivision. We set it to C1 = 1 in our implementation.

D. Aligning Motion Sequences

In previous section, we describe how to compute thesimilarity between two local motions of the same type. Thenwe need to align these two motion sequences in temporalorder (i.e.find the best matches between two sequences).Mathematically, we maximize the total local motion simi-larity score Υ(φ, ϕ) between φ and ϕ as follows:

max Υ(φ, ϕ) = {ςi1 : ς ′j1, ςi2 : ς ′j2, ...ςik : ς ′jk}s.t.∀is, it, js, jt, is > it ⇒ js > jt

where is, it, js, jt represent the frame number in temporalorder. Temporal consistency is imposed in the constraint.This maximization problem is similar to finding the longestcommon sub-sequence, with addition of element continuousmatch scoring, instead of a binary 1/0 match scoring. Basedon the local motion similarity described above, we proposea dynamic programming script to calculate the maximummatching score given two feature sequences. The rules ofthis dynamic programming are described in Algorithm 1. Thegeneral idea of Algorithm 1 is to continuously update thecurrent best match up to action ςi ∈ φ in Table γ(φ, ϕ). Asconvention, we use |V | to denote the length of vector V . Rule1 is the initialization. Based on the Rule 2, if there is only onelocal motion to match (i.e. either |φ| = 1 or |ϕ| = 1), then thematching weight would be W (ςi, ς

′j). Based on Rule 3 (when

there are more local motions to match), at each step we eitherabandon previous matches to local motion ς ′j and decide tomatch local motion ςi with ς ′j (i.e. W (ςi, ς

′j)) or prefer to keep

the previous match to local motion ς ′j (i.e. γi−1,j(φ, ϕ)) andmatch the ςi to one of the next local motions in the temporalorder (i.e. W (ςi, ς

′j′) s.t. j

′ = j+1, ...,m). Finally, based onthe outcome of Rule 3, all of the next rows of the γ Tablewill be updated.

E. Performing Classification

Through the above steps, we align two face motionsequences of the same type from both talking profiles inΥ(φ, ϕ). Then we can compute the final similarity forpose change in talking profile scores as the summationof corresponding local motion similarity in Υ(φ, ϕ). Werepeat those steps to compute the similarity for gaze change,

Algorithm 1 Dynamic programming to align two sequencesby maximizing total local motion similarities.Υ(φ, ϕ) : Table γ(φ, ϕ) is an n × m table where |φ| = nand |ϕ| = m.i and j are row index and column index of table γInitialize i = 0Start Loop i

Rule 1: if i <= 0 then, γi,j(φ, ϕ) = 0Rule 2: else if i = 1 then, γi,j(φ, ϕ) = W (ςi, ς

′j)

Rule 3: else if i <= nStart Loop jγi,j(φ, ϕ) = max(W (ςi, ς

′j),max(γi−1,j(φ, ϕ) +

W (ςi, ς′j′))), j′ varies from j + 1, ...,mγi+1,j(φ, ϕ) = γi+1,j(φ, ϕ) + γi,j(φ, ϕ)End Loop j

End Loop iΥ(φ, ϕ) = argmax

j(γn,j(φ, ϕ))

Fig. 2. Some examples from identical twins database.

2D in-plane change, pupil movement, eye/mouth opening-closing, respectively. Finally we employ SVM to performclassification [18] by using those similarities for verification.

IV. EXPERIMENTS

A. The Identical Twins Dataset

In experiments, we used the Twins dataset from [14].They collected an identical twin free talking database at theSixth Mojiang International Twins Festival during Mayday2010 in China. This database includes Chinese, Canadiansand Russians. There are 39 pairs of twins (78 subjects)and each subject has at least 2 video clips with nearly 45seconds. A Sony HD color video camera is used to capturethe video clips. They do not constrain the face position whilespeaking so the expression, head and neck movement of eachparticipant is realistic in each clip. Figure 2 shows 4 subjects(2 pairs of identical twins) from this database.

B. Experiment 1: Performance with Same Sample Rate

In this experiment, we evaluate the performance of EMRMwhen the probe and gallery have same sample rate, i.e.Same FPS for both training and testing videos. For SVM,we use 60% of the videos for training and the remaining40% for testing. The training data and testing data are mu-tually exclusive and the training and testing videos are fromdifferent recordings. Our experiment test the performancewhen FPS is equal to 2, 3, 5, 7, 10, 15, 20, 30, 40, 50,and 60, respectively. For example, if the FPS is 25 and the

Page 5: A Talking Profile to Distinguish Identical Twins · Model (EMRM), is illustrated in Figure 1. 1) Given a video, the talking profile is extracted. The profile consists of multiple

2 3 4 5 6 7 8 9 10 15 20 30 40 500.7

0.75

0.8

0.85

0.9

0.95

1

FPS

Acc

urac

y

Accuracy on Same FPS

Fig. 3. Experiment 1: Accuracy when gallery and probe have same FPS

video frame rate is 50, then only 2 frames are sampled. Theverification accuracy is shown in Figure 3.

From Figure 3, several points can be observed. First, theverification accuracies on identical twins are above 0.90. Thebest accuracy in [14] using facial expression is 0.82 and thebest accuracy using appearance in [5] is 0.87, our proposalshows a great improvement. This experimental results againverify the hypothesis that twins look similar but behavedifferently. Secondly, we can see three local maximum alongthe entire ranges of FPS rate: the first local maximum isaround FPS = 6, then the second maximum is aroundFPS = 9, and the last one is around 30. These threepeaks suggest that identical twins can be better recognized bytalking profile with different different speeds (fast, medium,and slow).

C. Experiment 2: Performance with Different Sample Rate

In this experiments, we consider the scenario when theprobe and gallery have different FPS. Among the gallery,we assume the FPS for all videos is the same. It ispractical in real applications because gallery is usually pre-collected. Our motivation to use different FPS rates betweenprobe and gallery is to test the robustness of our algorithmwhen video frame rate is not fix. For example, the galleryvideo may be captured by a 50fps camera, while the probevideo may be captured by 100fps camera. Given a FPS ingallery, we compute the average accuracy of various FPSin probe to evaluate the performance. For example, if theFPS in gallery is 2, then we test the accuracy for eachFPS except 2 in probe, and use the average accuracy torepresent the performance. The performance in such settingis presented in Table I. The variance distance between thelargest and the smallest is also listed. In terms of accuracy,for the identical twins database, the best performance canbe achieved when the FPS in gallery is 5 or 7. When theFPS in gallery is either too large or small, the performancedegrades significantly. We think the reason is because if theFPS is in the middle, the local motions in gallery at leasthave some overlap with the local motions in probe, otherwisethe local motions in gallery jump too much or too little.

D. Experiment 3: Performance of Single Type of Motion

It would be interesting to see the discriminating abilityof each type of face motion in talking profile to distinguishbetween identical twins. We conduct experiment with the

hhhhhhhhhPerformance

Type of MotionFace Gaze Eye Mouth Pose Pupil

Average accuracy 0.447 0.331 0.377 0.324 0.324 0.324

TABLE IIEXPERIMENT 3: ACCURACY ON TWINS DATABASE FOR EACH MOTION

same FPS in gallery and probe. Various FPSs, such asFPS = 2, 3, 4, 5, ..., 10 are tested and we compute theaverage performance to evaluate the discriminating ability ofeach type of facial motions on same FPS settings. We alsouse the average accuracy of each single type of face motionfor evaluation. The final result is shown in Tab II. We cansee that the best performance of single type of face motioni.e., 2D in-plane face translation, is less than 0.50. This resultshows that even though the low discriminating ability of eachtype of facial motion, together they convey enough identityspecific information for recognition, as shown in Experiment1. The reason may be because identical twins may havedifferent face motions but not restrict to one single type.For example, for pair A, their pose changes are different andgaze changes are same, while for pair B, it is reversed.

E. Discussion

Several points can be concluded from the experiments: 1)Twins can be distinguished by talking profiles. As shown inour first experiment, our algorithm obtains the best accuracyto recognize identical twins up to now. This conclusionverifies the observation that parents of twins prefer to usethe motion of their children for recognition. The proposal inthis work presents a new way to recognize identical twins,as suggested in the previous research [5], [6], [11].

2)Though the proposed talking profile can provide enoughidentity information for twin recognition, each type of facemotion in talking profile lacks such discriminating power.This proves that there exist some differences of face motionsbetween identical twins, but those differences are occurs onsome types of face motions and such difference may besubject dependent.

3)Synchronization of motions is also an important factoraffecting recognition performance. Different FPS can signif-icantly degrade the accuracy, as seen in experiment 1 and 2.With same FPS, the accuracy can be as high as 0.90, whilefor different FPS, it reduces to at best, around 0.70, whenFPS in probe is 5 or 7.

F. Experiment 4: Youtube Non-twin Database

Besides the identical twins database, we further investigatethe possibility of using talking profiles for non-twin popu-lation. To verify it, we collected a moderate database fromYoutube. It contains 20 subjects in 60 clips of 45 secondseach. These videos only have a single person talking. Thequality of the video ranges from medium to high, as thevariation of webcams. The person is either sitting or standingstill and the environment can be either indoor or outdoorwithout controlled lightings. These type of videos rangefrom speeches, technical talks to interviews. The gender andethnicity of the speakers are diverse.

Page 6: A Talking Profile to Distinguish Identical Twins · Model (EMRM), is illustrated in Figure 1. 1) Given a video, the talking profile is extracted. The profile consists of multiple

hhhhhhhhhPerformanceFPS in Gallery

2 3 4 5 6 7 8 9 10

Average accuracy 0.321 0.407 0.482 0.756 0.738 0.753 0.619 0.735 0.550Variance 0.096 0.0951 0.069 0.031 0.073 0.022 0.081 0.071 0.069

TABLE IEXPERIMENT 2: PERFORMANCE WITH DIFFERENT SAMPLE RATE BETWEEN PROBE AND GALLERY

2 3 4 5 6 7 8 9 10 15 20 30 40 500.7

0.75

0.8

0.85

0.9

0.95

1

FPS

Acc

urac

y

Accuracy on Youtube Dataset

Fig. 4. Performance of EMRM on Youtube database

To conduct experiment, we use 60% of the videos fortraining, and 40% for testing. The training and testing aremutually exclusive. The verification accuracies are illustratedin Figure 4. These results show similar performance to theidentical twins Database results. Again, three peaks can beseen for fast, medium, and slow actions when FPS = 4, 8and 40 respectively. Although our focus in this work ison distinguishing between identical twins, our results onthis YouTube dataset seems to indicate the potential oftalking profile to be used in biometric for generic (non-twin)population. We also conduct the experiment to investigate theperformance of each type of face motion on Youtube dataset,similar to the experiment 3. The accuracies for face 2D in-plane translation, gaze change, pose change, pupil movement,eye open-close and mouth open-close magnitude are 0.45,0.34,0.32,0.32,0.45 and 0.40. The result is also consistentwith identical twins database. This further verified that ourproposed model is invariant to the population. However,due to small size of our YouTube database, we cannotfully confirm the robustness of talking profile for genericverification. We will experiment on larger datasets in future.

V. CONCLUSION

Distinguishing between identical twins is a challengingproblem in face recognition. In this paper, we verify that thetalking profile can be used to distinguish between identicaltwins. To use talking profile, we proposed a framework,EMRM, to effectively use identity-related abnormalities inface motions, with explicit focus on temporal informationin the motion sequences. The experimental results on 2databases, collected under free talking scenario, verified therobustness of our algorithm with both fixed and variableframe rates. We also suggest the most discriminating facemotion type and best gallery video sampling rates for archivalto achieve best performance for twin and non-twin subjects.Finally, our results on the YouTube non-twins database showspotential of talking profile to be used for subject recognition.

For future works, we would like to research on more

relaxed scenarios such as face motions in presence of cameraand body motions, which is more practical in real lifescenarios. Secondly, though the current experimental resultsare promising, we would investigate the scalability of ourmodel on larger datasets, to explore the scalability andstability of face motion features. Thirdly, we will test thestability of talking profile with increase of time intervals.And finally, we will explore if the accuracy can be boostedby cascading the face motions at different sample rates.

REFERENCES

[1] J. Martin, H. Kung, T. Mathews, D. Hoyert, D. Strobino, B. Guyer,and S. Sutton, “Annual summary of vital statistics: 2006,” Pediatrics,2008.

[2] A. Jain, S. Prabhakar, and S. Pankanti, “On the similarity of identicaltwin fingerprints,” Pattern Recognition, vol. 35, no. 11, pp. 2653–2663,2002.

[3] A. Kong, D. Zhang, and G. Lu, “A study of identical twins’ palmprintsfor personal verification,” Pattern Recognition, vol. 39, no. 11, pp.2149–2156, 2006.

[4] J. Daugman and C. Downing, “Epigenetic randomness, complexity andsingularity of human iris patterns,” Proceedings of the Royal Societyof London. Series B: Biological Sciences, vol. 268, no. 1477, p. 1737,2001.

[5] Z. Sun, A. Paulino, J. Feng, Z. Chai, T. Tan, and A. Jain, “A study ofmultibiometric traits of identical twins,” SPIE, 2010.

[6] P. Phillips, P. Flynn, K. Bowyer, R. Bruegge, P. Grother, G. Quinn,and M. Pruitt, “Distinguishing identical twins by face recognition,” inFG 2011.

[7] H. Hill and A. Johnston, “Categorizing sex and identity from thebiological motion of faces,” Current Biology, vol. 11, no. 11, pp. 880–885, 2001.

[8] S. Tulyakov, T. Slowe, Z. Zhang, and V. Govindaraju, “Facial expres-sion biometrics using tracker displacement features,” in Proc. CVPR,2007.

[9] K. S. Pilz, I. M. Thornton1, and H. H. Bulthoff, “A search advantagefor faces learned in motion,” Experimental Brain Research, vol. 171,pp. 436–447, 2006.

[10] M. Unnikrishnan, “How is the individuality of a face recognized?”Journal of theoretical biology, vol. 261, no. 3, pp. 469–474, 2009.

[11] B. Klare, A. Paulino, and A. Jain, “Analysis of facial features in identi-cal twins,” in Biometrics (IJCB), 2011 International Joint Conferenceon. IEEE, 2011, pp. 1–8.

[12] I. M. Thornton and Z. Kourtzi, “A matching advantage for dynamichuman faces,” Perception, vol. 31, pp. 113–32, 2002.

[13] Y. Ning and T. Sim, “Smile, youre on identity camera,” in PatternRecognition, 2008. ICPR 2008. 19th International Conference on.IEEE, 2008, pp. 1–4.

[14] L. Zhang, N. Ye, E. Marroquin, D. Guo, and T. Sim, “New hope forrecognizing twins by using facial motion.”

[15] G. Huang, M. Mattar, T. Berg, E. Learned-Miller et al., “Labeled facesin the wild: A database forstudying face recognition in unconstrainedenvironments,” 2008.

[16] T. Moon, “The expectation-maximization algorithm,” Signal Process-ing Magazine, IEEE, vol. 13, no. 6, pp. 47–60, 1996.

[17] T. Cover and J. Thomas, Elements of information theory. New York:Wiley, 1991.

[18] C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vectormachines,” ACM Transactions on Intelligent Systems and Technology,vol. 2, pp. 27:1–27:27, 2011.