chernodub paris deeplearning

Post on 08-Dec-2015

36 Views

Category:

Documents

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

Deeplearning for Face recognition

TRANSCRIPT

Deep Learning for Face Recognition

Artem Chernodub, Yurii Paschenko

IMMSP NASU

Paris, 30 September 2015.

ZZWolf

𝑝 (𝑥|𝑦 )=𝑝 ( 𝑦|𝑥 )𝑝 (𝑥)

𝑝 (𝑦 )

Biological-inspired models

Neuroscience

Machine Learning

2 / 50

Biological Neural Networks

3 / 50

Artificial Neural Networks

Traditional (Shallow) Neural Networks

Deep Neural Networks

Deep Feedforward Neural Networks

Recurrent Neural Networks

4 / 50

Deep Neural Networks

Deep Learning = Learning of Representations (Features)

The traditional model of pattern recognition (since the late 50's):

fixed/engineered features + trainable classifierHand-

crafted Feature

Extractor

TrainableClassifier

Trainable Feature

Extractor

TrainableClassifier

End-to-end learning / Feature learning / Deep learning:

trainable features + trainable classifier

6 / 50

Deep Feedforward Neural Networks

• 2-stage training process: i) unsupervised pre-training; ii) fine tuning (vanishing gradients problem is beaten!).

• Number of hidden layers > 1 (usually 6-9).• 100K – 100M free parameters.• No (or less) feature preprocessing stage. 7 / 50

FLOPS comparison

https://ru.wikipedia.org/wiki/FLOPS

Type Name Flops Cost

Mobile Raspberry Pi 1st Gen, 700 Mhz

0,04 Gflops $35

Mobile Apple A8 1,4 Gflops $700 (in iPhone 6)

CPU Intel Core i7-4930K (Ivy Bridge), 3.7 GHz

140 Gflops $700

CPU Intel Core i7-5960X (Haswell), 3.0 GHz

350 Gflops $1300

GPU NVidia GTX 980 4612 Gflops (single precision),

144 Gflops (double precision)

$600 + cost of PC (~$1000)

GPU NVidia Tesla K80 8740 Gflops (single precision),

2910 Gflops (double precision)

$4500 + cost of PC (~1500)

8 / 50

Sparse Autoencoders

9 / 50

Dimensionality reduction

• Use a stacked RBM as deep auto-encoder

1. Train RBM with images as input & output

2. Limit one layer to few dimensions

Information has to pass through middle layer

G. E. Hinton and R. R. Salakhutdinov. Reducing the Dimensionality of Data with Neural Networks // Science 313 (2006), p. 504 – 507.10 /

50

Original

Deep RBN

PCA

Dimensionality reduction

Olivetti face data, 25x25 pixel images reconstructed from 30 dimensions (625 30)

G. E. Hinton and R. R. Salakhutdinov. Reducing the Dimensionality of Data with Neural Networks // Science 313 (2006), p. 504 – 507.11 /

50

How to use unsupervised pre-training stage / 1

12 / 50

How to use unsupervised pre-training stage / 2

13 / 50

How to use unsupervised pre-training stage / 3

14 / 50

How to use unsupervised pre-training stage / 4

15 / 50

Unlabeled data

Unlabeled data is readily available

Example: Images from the web

1. Download 10’000’000 images2. Train a 9-layer DNN3. Concepts are formed by DNN

G. E. Hinton and R. R. Salakhutdinov. Reducing the Dimensionality of Data with Neural Networks // Science 313 (2006), p. 504 – 507.16 /

50

Dimensionality reduction

PCA Deep RBN

804’414 Reuters news stories, reduction to 2 dimensions

G. E. Hinton and R. R. Salakhutdinov. Reducing the Dimensionality of Data with Neural Networks // Science 313 (2006), p. 504 – 507.17 /

50

Hierarchy of trained representations

Low-level feature

Middle-level

feature

Top-level feature

Feature visualization of convolutional net trained on ImageNet from [Zeiler & Fergus 2013]

18 / 50

Face Recognition

Standard Face Recognition Scenario

Step 1. Face Detection

Step 2. Face Alignment

Step 3. Face Matching

20 / 50

Face Matching (Face Recognition)

A) Training the classifier B) Matching two face images

Classifier

Features

Answer21 /

50

Recognition of Face Attributes

• man;• white;• 45-50 years;• glasses – no;• moustache –

yes;• beard – yes.

22 / 50

Beauty Recognition

http://www.linkface.cn/face/beauty

Чернодуб А.Н., Пащенко Ю.А., Головченко К.А. Нейросетевая система определения привлекательности лица человека // «Нейроинформатика-2013», Москва, 21-25 января 2013, c. 254 — 259. 23 /

50

Face Matching

EigenFaces, Principal Component Analysis (PCA) for Face Matching, 1991

M. Turk, A. Pentland. Face Recognition using EigenFaces // Journal of cognitive neuroscience 3 (1), p. 71-86.

• Well-known method for dimensionality reduction.• Building features from the training data.• Impressive visualization of the basis vectors.

25 / 50

Local Binary Pattern Histograms (LBPH), 2006

• Fast & robust features for texture descriptions applied to face recognition.

• distance for comparison of histograms.• Matrix of importance regions (weighted .

T. Ahonen, A. Hadid. Face Description with Local Binary Patterns:Application to Face Recognition // IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 28, No. 12, p. 2037-2041.

26 / 50

Deep Face (Facebook), 2014

Y. Taigman, M. Yang, M.A. Ranzato, L. Wolf. DeepFace: Closing the Gap to Human-Level Performance in Face Verification // CVPR 2014.

Model # of parameters

Accuracy, %, LFW

Deep Face Net 128M 97.25

Human level N/A ~ 97.64

Training data: 4M facial images

27 / 50

Convolutional Neural Networks: Return of Jedi

Andrej Karpathy and Fei-Fei. CS231n: Convolutional Neural Networks for Visual Recognition http://cs231n.github.io/convolutional-networks

Yoshua Bengio, Ian Goodfellow and Aaron Courville. Deep Learning // An MIT Press book in preparation http://www-labs.iro.umontreal.ca/~bengioy/DLbook 28 /

50

DeepFace Training Framework

Step 1. Supervised training for identification

Step 2. Use produced features for face matching

Big Face database

~1-10M images,~ 1-5K persons

ConvolutionalNeuralNetwork

Feature

1-5Klabels

Feature 2

Feature 1 - Cosine metric- Euclid metric

- metric- Siamese networks

Similarity

29 / 50

Face Recognition Algorithm

Face localization

Normalization

Feature extraction

Comparing

Verification

metricFALSE

30 / 50

Databases for Deep Face Training

• Facebook’s Social Face Classification (SCF) dataset, 2014. 4.4M labeled faces, 4030 people with 800 to 1200 faces each. Private dataset.

• Visual Geometry Group Dataset, Oxford, 2015. 2622 people with 1000 faces each. Private dataset.

• CASIA WebFace Database, 2014. 10575 people, 500K faces. Public dataset.http://www.cbsr.ia.ac.cn/english/CASIA-WebFace-Database.html

31 / 50

CASIA

• Dataset: CASIA WebFace (10 575 subjects and 494 414 face images)

• Alignment: 2D• Architectures: single CNN + triplet

loss• Verification metric:– Cosine– Joint Bayes

D. Yi, Z. Lei, S. Liao, and S. Z. Li. Learning face representation from scratch. Technical report, arXiv:1411.7923, 2014.

32 / 50

CASIA architecture

• Input: gray image 100x100

• Feature vector size: 320

• Parameters: ~5M

33 / 50

CASIA replication

Training data Accuracy on LFW

Normalized 97,27 %

No alignment 94,15 %

Our 2D alignment 96,23 %

34 / 50

Error rate vs Database size

Identities

Faces Error rate

1.5K 150K 3.1%

9K 450K 1.35%

18K 1.2M 0.87%

J. Liu, Y. Deng, T. Bai, Zh. Wei, C. Huang. Baidu Targeting Ultimate Accuracy: Face Recognition via Deep Embedding, 2015 http://hdl.handle.net/1721.1/97999

35 / 50

Face Databases (benchmarks)

AT&T Database (Olivetti, ORL): 1992-1994

F. Samaria, A. Harter. Parameterisation of a Stochastic Model for Human Face Identification // Proceedings of 2nd IEEE Workshop on Applications of Computer Vision, Sarasota FL, December 1994.

http://www.uk.research.att.com/

• Grayscale images, 92x112.

• 40 subjects, 10 images per subject.

• Different facial expressions, poses, glasses.

• Studio, single session.

37 / 50

FERET dataset, 1994-1996

P.J. Phillips, H. Moon, P. Rauss, S. A. Rizvi. The FERET September 1996 Database and Evaluation Procedure // First International Conference, AVBPA'97 Crans-Montana, Switzerland, March 12–14, 1997, pp. 395-402

http://www.nist.gov/srd/

• 2413 still facial images, representing 856 individuals, studio, two sessions.

• Different facial expressions, poses, light.

38 / 50

Labeled Faces in the Wild dataset, 2007

G. B. Huang, M. Ramesh, T. Berg, E. Learned-Miller. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments // University of Massachusetts, Amherst, Technical Report 07-49, October, 2007.http://vis-www.cs.umass.edu/lfw/

• Images with faces collected from the web.

• Pairs comparison, restricted mode.

• test: 10-fold cross-validation, 6000 face pairs.

39 / 50

Results for Face Matching, LWF (option “Labeled outside data”)

http://vis-www.cs.umass.edu/lfw/results.html

Method Accuracy

1 EigenFaces 60,20%

2 LBP-classic (3K features) 72,43%

3 High-dim LBP (100K features) 95,17%

4 DeepFace 97,25%

5 Human level 97,64%

6 DeepID3 99,53%

7 Baidu 99,77%

40 / 50

Face Detection

Viola-Jones Object Detector

• Very popular for Face Detection.• Different features except HAAR: LBP, SURF/SIFT, etc.• Available free in OpenCV library (http://opencv.org).• OpenCV implementation supports Windows, Linux,

Mac OS, iOS (not officially) and Android.• May be trained with MATLAB tool traincascade.

42 / 50

Images pyramid for Viola-Jones

43 / 50

FDDB: Face Detection Data Set and Benchmark

Vidit Jain and Erik Learned-Miller. FDDB: A Benchmark for Face Detection in Unconstrained Settings // Technical Report UM-CS-2010-009, Dept. of Computer Science, University of Massachusetts, Amherst. 2010.

http://vis-www.cs.umass.edu/fddb

• 5171 faces in a set of 2845 LWF images.

• Special software tool for testing.

44 / 50

Viola-Jones Object Detector Classifier Structure

P. Viola, M. Jones. Rapid object detection using a boosted cascade of simple features // Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001.

45 / 50

FDDB: state-of-the-art

http://vis-www.cs.umass.edu/fddb/results.html46 /

50

Face Alignment

3D Face Alignment

Y. Taigman, M. Yang, M.A. Ranzato, L. Wolf. DeepFace: Closing the Gap to Human-Level Performance in Face Verification // CVPR 2014. 48 /

50

2D Face Alignment

D. Yi, Z. Lei, S. Liao, S.Z. Li. Learning Face Representation from Scratch // CoRR abs/1411.7923 (2014). [CASIA]

49 / 50

Some Computer Vision / Machine Learning tools• OpenCV, VLFeat – popular Computer Vision

libraries.• mexopencv – MATLAB interface for

OpenCV.• pythonXY – all-in-one Machine Learning

Package.• libSVM – reliable cross-platform Support

Vector Machine library.• NetLab – Shallow Neural Networks & other

ML models, from Christopher Bishop’s book.

• Caffe, Theano, Torch, matconvnet – frameworks for training Deep Neural Networks. 50 /

50

contact: a.chernodub@gmail.com

Thanks!

top related