challenges in learning the appearance of faces for automated image analysis: part i alessandro verri...

Challenges in Learning the Challenges in Learning the Appearance of Faces for Automated Appearance of Faces for Automated

Image Analysis: Image Analysis:

part Ipart I

alessandro verrialessandro verri

DISI – università di genovaDISI – università di genova

[email protected]@disi.unige.it

actually, i’m gonna talk actually, i’m gonna talk about:about:

►brief introduction (the whole thing)brief introduction (the whole thing)►what some people do for detecting what some people do for detecting

facesfaces►what we are doingwhat we are doing

the problem(s)the problem(s)

►geometry (position, rotation, pose, geometry (position, rotation, pose, scale,…)scale,…)

►facial features (beards, glasses,…)facial features (beards, glasses,…)►facial expressions facial expressions ►occlusionsocclusions► imaging conditions (illumination, imaging conditions (illumination,

resolution, color contrast, camera resolution, color contrast, camera quality,…)quality,…)

where we are: face detectionwhere we are: face detection

we address face detection as a brute we address face detection as a brute force classification problem force classification problem (sometimes sophisticated, but still (sometimes sophisticated, but still brute force)brute force)

the model is encoded in the training the model is encoded in the training samples but not explicitly defined samples but not explicitly defined

face recognition and the like face recognition and the like

explicit image models are derived from explicit image models are derived from examples separating identity and examples separating identity and imaging parametersimaging parameters

motivationmotivation

we want to explore who should learn we want to explore who should learn from whom… from whom…

we come back to this at the end! we come back to this at the end!

some approachessome approaches

►knowledge-based knowledge-based (Yang & Huang 94)(Yang & Huang 94)

►feature invariant feature invariant (Leung et al, 95; Yow & (Leung et al, 95; Yow & Cipolla, 97)Cipolla, 97)

►template matching template matching (Lanitis et al, 95)(Lanitis et al, 95)

►appearance based appearance based • eigenfaces eigenfaces (Turk & Pentland, 91)(Turk & Pentland, 91)

• SVMSVM (Osuna et al, 97) (Osuna et al, 97) • naive bayesnaive bayes (Schneiderman & Kanade, 98)(Schneiderman & Kanade, 98)

• AdaBoostAdaBoost (Viola and Jones, 01) (Viola and Jones, 01)

SVM: global detectorSVM: global detector (Poggio’s group)(Poggio’s group)

►some preprocessing essential some preprocessing essential (equalization and normalization)(equalization and normalization)

►polynomial SVM applied to pixelspolynomial SVM applied to pixels►training set: training set:

• about 2,500 face images (58x58 pixels)about 2,500 face images (58x58 pixels)• about 10,000 non face images (extended about 10,000 non face images (extended

to 13,000) to 13,000)

SVM: component-based detectorSVM: component-based detector (Poggio’s group)(Poggio’s group)

►some preprocessing essential some preprocessing essential (equalization and normalization)(equalization and normalization)

►two level system (always linear SVMs): two level system (always linear SVMs): • component classifiers (14: eyes, nose,…)component classifiers (14: eyes, nose,…)• geometrical configuration classifier based geometrical configuration classifier based

on maximal outputs on maximal outputs

global vs component-basedglobal vs component-based

►component-based performs better component-based performs better (more robust to pose variation and/or (more robust to pose variation and/or occlusion)occlusion)

►global a little faster (though they are global a little faster (though they are both pretty slow, too many patches both pretty slow, too many patches have to be stored!)have to be stored!)

naive bayes naive bayes (Kanade’s group)(Kanade’s group)

►multiple detectors for different views (size multiple detectors for different views (size and orientation)and orientation)

► for each view: statistical modeling using for each view: statistical modeling using predefined attribute histograms (17), about predefined attribute histograms (17), about 2,000 face examples2,000 face examples• independency is required…independency is required…

very good for out-of-plane rotation but involved very good for out-of-plane rotation but involved procedure for building histograms (bootstrap, procedure for building histograms (bootstrap, AdaBoost…)AdaBoost…)

AdaBoost AdaBoost (Viola & Jones)(Viola & Jones)

►wavelet like features (computed wavelet like features (computed efficiently)efficiently)

►feature selected through AdaBoost feature selected through AdaBoost (each weak classifier depends on a (each weak classifier depends on a single feature) single feature)

►detection is obtained by training a detection is obtained by training a cascade of classifierscascade of classifiers

►very fast and effective on frontal facesvery fast and effective on frontal faces

summing upsumming up

► SVM: components based on prior SVM: components based on prior knowledge, simple, very good results but knowledge, simple, very good results but rather slow (optimization approaches…)rather slow (optimization approaches…)

► naive bayes: robust against rotation, prior naive bayes: robust against rotation, prior knowledge on feature selection, rather hand knowledge on feature selection, rather hand crafted statistical analysis, many models crafted statistical analysis, many models need to be learned (each with many need to be learned (each with many examples) examples)

► AdaBoost: data-driven feature selection, AdaBoost: data-driven feature selection, fast, frontal face only fast, frontal face only

what we dowhat we do

►we assume we are given a we assume we are given a fairfair number number of positive examples only (no of positive examples only (no negatives)negatives)

►we want to explore how far one can get we want to explore how far one can get by combining fully by combining fully data drivendata driven techniques based on 1D datatechniques based on 1D data

►we look at old-fashioned hypothesis we look at old-fashioned hypothesis testing (false negative rate under full testing (false negative rate under full control)control)

one possible way to object one possible way to object detection detection

building models can be expensive building models can be expensive (domain dependent)(domain dependent)

learning from positive examples only is learning from positive examples only is more difficult, but… more difficult, but…

classical hypothesis testing controls classical hypothesis testing controls the false negative rate naturallythe false negative rate naturally

testing hypothesestesting hypotheses

►HT with only one observationHT with only one observation►testing for independence with rank testing for independence with rank

test (seems appropriate for comparing test (seems appropriate for comparing different features) different features)

CBCL databaseCBCL database

faces(19x19pixels)training: 2429 test: 472

nonfaces(19x19pixels)training: 4548test: 23573

training by hypothesis training by hypothesis testingtesting

I. we first compute a large number of features (for the moment about 16,000) on the training set images

II. a subset of good features (about 1,000) is then selected

III.of these, a subset of independent features is considered (ending up with about 100)

IV.multiple statistical tests are then constructed using the training set (one test for each feature)

image measurementsimage measurements

►grey value at fixed locations (19 x 19)grey value at fixed locations (19 x 19)►tomographies (19 vertical + 19 tomographies (19 vertical + 19

horizontal + 37 along the 45deg horizontal + 37 along the 45deg diagonals)diagonals)

►ranklets (5184 vertical, 5184 ranklets (5184 vertical, 5184 horizontal, 5184 diagonal)horizontal, 5184 diagonal)

►a total of about 16,000 featuresa total of about 16,000 features

ranklets (Smeraldi, 2002)ranklets (Smeraldi, 2002)

vertical ranklets vertical ranklets ((variance-to-natural supportvariance-to-natural support

ratio)ratio)

a salient and a non-salient a salient and a non-salient featurefeature

we discard all features for which the we discard all features for which the ratio falls below the threshold ratio falls below the threshold 0.15 0.15 (this leaves us with about 2000 (this leaves us with about 2000 features) features)

independent feature independent feature selectionselection

1.1. we run independence tests on all we run independence tests on all possible pairs of salient features of possible pairs of salient features of the same categorythe same category

2.2. we build a we build a completecomplete graph for each graph for each category with as many nodes as category with as many nodes as features. An edge between two features. An edge between two features is features is deleteddeleted if for the two if for the two features the Spearman’s test features the Spearman’s test rejectsrejects the independence hypothesis with the independence hypothesis with probabilityprobability

independent feature independent feature selectionselection

3.3. we then search the graph for we then search the graph for maximally complete subgraphs maximally complete subgraphs ((cliquescliques) which we regard as sets of ) which we regard as sets of independent featuresindependent features

for for 0.5 we are left with 44 vertical, 0.5 we are left with 44 vertical, 64 horizontal, 35 diagonal ranklets, 64 horizontal, 35 diagonal ranklets, and 38 tomographies and 38 tomographies

testingtesting

for all image locations, all applicable scales and a fixed number

I. compute the values of the good, independent features

II.perform multiple statistical tests at a certain confidence level

III.a positive example is located if tests are passed

multiple testsmultiple tests

we run multiple tests living with the we run multiple tests living with the fact that we won’t detect a certain fact that we won’t detect a certain fraction of the objects we want to findfraction of the objects we want to find

luckily we are in a position to decide luckily we are in a position to decide the fraction beforehandthe fraction beforehand

we gain power because each test looks we gain power because each test looks at a new feature at a new feature

some results some results (franceschi et al, (franceschi et al, 2004)2004)

472 positive vs 23,573 negatives472 positive vs 23,573 negativestomographies + ranklets

randomly chosen

overlapping features

once you have detected a once you have detected a face…face…

ask Thomasask Thomas

challenges in learning the appearance of faces for automated image analysis: part i alessandro verri...

Documents

adaboost slide

defined slide

frontal faces slide

face detection

face recognition

examples adaboost

face examples independency

svm osuna et