video face recognition: a literature review hao zhang computer science department 1

25
Video Face Recognition: A Literature Review Hao Zhang Computer Science Department 1

Upload: susanna-hodges

Post on 17-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Video Face Recognition: A Literature Review Hao Zhang Computer Science Department 1

1

Video Face Recognition: A Literature Review

Hao Zhang

Computer Science Department

Page 2: Video Face Recognition: A Literature Review Hao Zhang Computer Science Department 1

2

Problem Statement

Verification Identification

A B

Same / Different persons?

A

B C D

Which has the same identity as A?

Page 3: Video Face Recognition: A Literature Review Hao Zhang Computer Science Department 1

3

Solutions

• Extensions of still face recognition algorithms• 3D model reconstruction• Employing temporal information• Set-to-set matching methods

Page 4: Video Face Recognition: A Literature Review Hao Zhang Computer Science Department 1

4

Extensions of still face recognition algorithms

• Joint sparse representation

Data: k-th partition of a query videoDictionary: a concatenation of all dictionaries of k-th partition of training videos

probe

gallery

Page 5: Video Face Recognition: A Literature Review Hao Zhang Computer Science Department 1

5

Extensions of still face recognition algorithms

• Joint sparse representation : Conclusion – Joint sparse representation– Only suitable for face identification– Cannot handle new faces– Violates the protocol of face verification

Page 6: Video Face Recognition: A Literature Review Hao Zhang Computer Science Department 1

6

• Multiple metric learning (MML)

Extensions of still face recognition algorithms

Video

Volumes

Patches

Feature Extraction

MML

* A part of this figure is from [5]

Page 7: Video Face Recognition: A Literature Review Hao Zhang Computer Science Department 1

7

Extensions of still face recognition algorithms

• Multiple metric learning (MML): A conclusion– It can be easily adapted to solve both still and

video problems. – It discards additional information in the video.

Page 8: Video Face Recognition: A Literature Review Hao Zhang Computer Science Department 1

8

3D model reconstruction• From a single frontal image: Analysis

* The two above images are from [8]

reconstructed 3D shape

Mean training 3D shape

PCA projection matrix of training 3D shapes

2D mappings of

input 2D shape

scale and translation term

Page 9: Video Face Recognition: A Literature Review Hao Zhang Computer Science Department 1

9

3D model reconstruction

• Reconstruction from a single image: Synthesis

Pose Illumination Expression

* This figure is from [8]

Page 10: Video Face Recognition: A Literature Review Hao Zhang Computer Science Department 1

10

3D model reconstruction• Reconstruction from a single image: Conclusion

– Handle pose and illumination variations– 2D images of good quality– Synthesis of lighting and expression is far from

perfect

Page 11: Video Face Recognition: A Literature Review Hao Zhang Computer Science Department 1

11

Employing temporal information• Dynamic system model, ARMA

: state vector encoding pose at time t

: face appearance at time t

Video similarity is computed using an observability matrix formed by A and C.

Page 12: Video Face Recognition: A Literature Review Hao Zhang Computer Science Department 1

12

Employing temporal information

• Dynamic system model: Conclusion– Incorporate time information for recognition– Linear assumption– Manifold learning methods can be applied using

the observability matrix

Page 13: Video Face Recognition: A Literature Review Hao Zhang Computer Science Department 1

13

Employing temporal information

• Probabilistic model

* The figure is from [9]

: Image I’s distance to the manifold of k-th video

Can be adapted to handle occlusion

: probability of image I’s projection in

Page 14: Video Face Recognition: A Literature Review Hao Zhang Computer Science Department 1

14

Employing temporal information

• Probabilistic model: Conclusion– Incorporate time information to make decisions

more robustly– Error can propagate– Majority voting

Page 15: Video Face Recognition: A Literature Review Hao Zhang Computer Science Department 1

15

Set-to-set matching

• Manifold-manifold distance

distance

Manifold A Manifold B

Clustering criteria:

Page 16: Video Face Recognition: A Literature Review Hao Zhang Computer Science Department 1

16

Set-to-set matching

• Manifold-manifold distance: Conclusion– Overcomes the drawbacks of voting methods– Clustering results will be different due to random initialization

Page 17: Video Face Recognition: A Literature Review Hao Zhang Computer Science Department 1

17

Set-to-set matching

• Affine Hull Representation

Convex hull

Affine hull

Reduced affine hull:

Page 18: Video Face Recognition: A Literature Review Hao Zhang Computer Science Department 1

18

Set-to-set matching

• Affine Hull Representation: Conclusion– “Size changeable” affine hulls– Unclear which representation is better

Which to use: convex hull, affine hull or linear span?

Page 19: Video Face Recognition: A Literature Review Hao Zhang Computer Science Department 1

19

Set-to-set matching

• Statistical methods on Grassmann manifolds

Local mapping using exponential map preserves geodesic distance

Distribution is defined on the tangent plane of Karcher mean

Page 20: Video Face Recognition: A Literature Review Hao Zhang Computer Science Department 1

20

Set-to-set matching

• Statistical methods on Grassmann manifolds: Conclusion– Distribution models on manifold– A video is simply represented as a linear space– Too few samples

• Thoughts:– Partition the video to obtain multiple points on Grassmann

manifold

Page 21: Video Face Recognition: A Literature Review Hao Zhang Computer Science Department 1

21

A summary for each category

Approach Summary

Still extensions

Largely inherit properties of still algorithms

3D modelHandle pose and illumination variations2D image of good qualitySynthesis is not good

Temporal Encode face dynamicsError may propagate

Set-to-set Solid mathematical backgroundGenerally less computational burden

Page 22: Video Face Recognition: A Literature Review Hao Zhang Computer Science Department 1

22

Important Datasets

2001

2003

2009

2011

2013

Page 23: Video Face Recognition: A Literature Review Hao Zhang Computer Science Department 1

23

Comparing Results?

SR MML MBGS ARMA Prob Affine M2M Stat

MoBo x x x x x 0.98 (1,3)

0.94 (rand)

x

Honda 0.97 (#frames)

x x 0.9 (15,30)

0.92 ?

0.92 (20,39,noise)

0.97 (rand)

x

MBGC 0.88 (s234)

x x x x x x 0.71 (s234)

YTF x 0.79 (cr)

0.76 (cr)

x x x x x

Still extensions Temporal Set-to-set

AlgData

set

Page 24: Video Face Recognition: A Literature Review Hao Zhang Computer Science Department 1

24

Summary

• Current trends:– Extensions of still face recognition algorithms– Set-to-set matching methods

• Common issues:– Computational burden– Pose variations

• Thoughts: good training data and transfer learning

– Need common protocols and datasets• Much better recently

Page 25: Video Face Recognition: A Literature Review Hao Zhang Computer Science Department 1

25

References• [1]  G. Aggarwal, A. K. R. Chowdhury, and R. Chellappa. A system identification approach for video-based face recognition. In Pattern

Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, volume 4, pages 175–178. IEEE, 2004. • [2]  J. R. Beveridge, P. J. Phillips, D. Bolme, B. A. Draper, G. H. Givens, Y. M. Lui, M. N. Teli, H. Zhang, W. T. Scruggs, K. W. Bowyer,

et al. The challenge of face recognition from digital point-and-shoot cameras. IEEE Conference on Biometrics: Theory, Applications and Systems, 2013.

• [3]  H. Cevikalp and B. Triggs. Face recognition based on image sets. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 2567–2573. IEEE, 2010.

• [4]  Y.-C. Chen, V. Patel, S. Shekhar, R. Chellappa, and P. Phillips. Video-based face recognition via joint sparse representation. In Automatic Face and Gesture Recognition (FG), 2013 10th IEEE International Conference and Workshops on, pages 1–8, 2013.

• [5]  Z. Cui, W. Li, D. Xu, S. Shan, and X. Chen. Fusing robust face region descriptors via multiple metric learning for face recognition in the wild. In Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, pages 3554–3561, 2013.

• [6]  G. Doretto, A. Chiuso, Y. N. Wu, and S. Soatto. Dynamic textures. International Journal of Computer Vision, 51(2):91–109, 2003. • [7]  R. Gross and J. Shi. The cmu motion of body (mobo) database. Technical Report CMU-RI-TR- 01-18, Robotics Institute,

Pittsburgh, PA, June 2001. • [8]  D. Jiang, Y. Hu, S. Yan, L. Zhang, H. Zhang, and W. Gao. Efficient 3d reconstruction for face recognition. Pattern Recognition,

38(6):787–798, 2005. • [9]  K.-C. Lee, J. Ho, M.-H. Yang, and D. Kriegman. Video-based face recognition using probabilistic appearance manifolds. In

Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on, volume 1, pages I–313. IEEE, 2003.

• [10]  P. J. Phillips, P. J. Flynn, J. R. Beveridge, W. T. Scruggs, A. J. OToole, D. Bolme, K. W. Bowyer, B. A. Draper, G. H. Givens, Y. M. Lui, et al. Overview of the multiple biometrics grand challenge. In Advances in Biometrics, pages 705–714. Springer, 2009.

• [11]  J. B. Tenenbaum, V. De Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319–2323, 2000.

• [12]  P. Turaga, A. Veeraraghavan, A. Srivastava, and R. Chellappa. Statistical computations on grassmann and stiefel manifolds for image and video-based recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 33(11):2273–2286, 2011.

• [13]  R. Wang, S. Shan, X. Chen, and W. Gao. Manifold-manifold distance with application to face recognition based on image set. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1–8. IEEE, 2008.

• [14]  L. Wolf, T. Hassner, and I. Maoz. Face recognition in unconstrained videos with matched back- ground similarity. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 529–534. IEEE, 2011.