ve ri human and computer face detection under occlusion · 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 200 400...

1
0 200 400 600 800 1000 1200 Human and computer face detection under occlusion Samuel E. Anthony 1 , Walter Scheirer 2 , David Cox 2 & Ken Nakayama 1 Deep Annotation Behavioral Testing E V I R TAS Harvard University Deep annotation can dramatically improve algorithm performance. The use of psychometric data in classifier training can provide improved generalization and state-of-the-art performance. Future Directions Apply deep annotation to everything Quantify model “humanness” References 1. Heisele, B. et al. (2007) IJCV 2. Portilla J. & Simoncelli E.P. (2000) IJCV 3. Jones, MJ & Viola, P (2001) Work. on Stat. and Comp. Th. of Vis. 4. Jain, V & Learned-Miller, E (2010) Tech. Rep. UM- CS-2010-009 Results Maximum margin hyperplane weighting all training samples equally Images were generated with parametric occluders. Occlusion is well known as a difficult challenge for face detection algorithms. Failed detections under occlusion in Google Street View Questions Standard SVM Loss Function “Deep annotated” training: New margin is farther from easy images Reweighted TestMyBrain faces Reweight TestMyBrain foils Human-weighted Loss Function Margin False positives True positive rate Deep annotation beats all previously published results on benchmark 0.6 0.5 0.8 Generate heatmaps from weighted sum of face image masks + + = Compared to noise, Simoncelli heatmap is less intense at face edges; subjects are worse at images without central face visible Simoncelli Task 3 Alternative Forced Choice Five types of generated occluded stimuli Complex Noise Simple Perpendicular Simoncelli Previous Work * normalized Accuracy Coherence N = 3897 Phase-scrambled noise is not a plausible real- world degradation of face images. Would the gap persist with more ecologically valid degradations? If the gap between humans and computers persists, is it possible to learn anything about why the humans are superior? Subjects on the TestMyBrain website completed 200 trials of 3-AFC spot-the- face. All conditions besides Simoncelli were behaviorally indistinguishable Test yourself at www.testmybrain.org! Google Picasa, Simoncelli Human, complex* Google Picasa, noise Face.com, noise background Face.com, simoncelli background Human, Simoncelli* N = 427 , 1934 * normalized Accuracy Area visible Behavioral Results All stimuli have noise backgrounds; Simoncelli condition has 2nd order statistic-matched Portilla-Simoncelli textures. Maximum percentage classification Standard Loss Function Deep Annotation False positives True positive rate Across-fold testing Human-weighting vs. Normal Train on one fold, test on nine, 90 Models were tested against FDDB benchmark 2845 images with 5171 hand-annotated faces 10 “folds” trained and tested independently Standard scoring function for model comparison Challenging faces including many with occlusion Standard Loss Function Deep Annotation 1 Department of Psychology, 2 Department of Molecular and Cell Biology Noise Compared to normally trained SVM, mean image of best detections for human-weighted loss has more internal features Human-weighted Standard Closing the Loop Conclusions Deep Annotation SURF Cascade Jain et al. Subburaman et al. Viola-Jones Classifier trained with deep, psychometric annotations 0.1 Human Computer Percent of face visible Sample from images and treat per- image accuracy as weight Mean accuracy Mean acc. 0.93 Mean acc. 0.58 Mean acc. 0.36 Accuracy is only one choice for psychometric measure; we have had success with RT, and there are many other candidates Algorithm includes a preprocessing stage where ultra-low-threshold viola-jones delivers candidate regions to speed up analysis Motivation Behavior Algorithms Models 0.2 0.3 0.4 0.5 0.6 0.7 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.1 0.12 0.14 0.16 0.18 0.2 0.22 0.24 0.26 0.28 0.3

Upload: others

Post on 16-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 200 400 600 800 1000 1200

True

Pos

itive

Rat

e

False Positives

Human and computer face detection under occlusionSamuel E. Anthony1, Walter Scheirer2, David Cox2 & Ken Nakayama1

Deep AnnotationBehavioral Testing

EV IR

TAS

Harvard University

Deep annotation can dramatically improve algorithm performance.

The use of psychometric data in classifier training can provide improved generalization and state-of-the-art performance.

Future DirectionsApply deep annotation to everything

Quantify model “humanness”

References1. Heisele, B. et al. (2007) IJCV 2. Portilla J. & Simoncelli E.P. (2000) IJCV 3. Jones, MJ & Viola, P (2001) Work. on Stat. and Comp. Th. of Vis. 4. Jain, V & Learned-Miller, E (2010) Tech. Rep. UM-CS-2010-009

Results

Maximum margin hyperplane weighting all training samples equally

Images were generated with parametric occluders. Occlusion is well known as a difficult challenge for face detection algorithms.

Failed detections under occlusion in Google Street View

Questions

Standard SVM Loss Function

“Deep annotated” training:New margin is farther from easy images

ReweightedTestMyBrain

faces

ReweightTestMyBrain

foils

Human-weighted Loss Function

Margin

False positives

True

pos

itive

rat

e

Deep annotation beats all previously published results on benchmark0.6 0.5 0.8

Generate heatmaps from weighted sum of face image masks

+ + =

Compared to noise, Simoncelli heatmap is less intense at face edges; subjects are worse at images without central face visible

Simoncelli

Task

3 Alternative Forced Choice

Five types of generated occluded stimuli

Complex Noise Simple Perpendicular Simoncelli

Previous Work

* normalized

Acc

urac

y

Coherence

N = 3897

Phase-scrambled noise is not a plausible real-world degradation of face images. Would the gap persist with more ecologically valid degradations?

If the gap between humans and computers persists, is it possible to learn anything about why the humans are superior?

Subjects on the TestMyBrain website completed 200 trials of 3-AFC spot-the-face. All conditions besides Simoncelli were behaviorally indistinguishable

Test yourself at www.testmybrain.org!

Google Picasa, SimoncelliHuman, complex*

Google Picasa, noise

Face.com, noise backgroundFace.com, simoncelli background

Human, Simoncelli*

N = 427, 1934

* normalized

Acc

urac

y

Area visible

Behavioral Results

All stimuli have noise backgrounds; Simoncelli condition has 2nd order statistic-matched Portilla-Simoncelli textures.

Max

imum

per

cent

age

clas

sifica

tion

Standard Loss Function

Deep Annotation

False positives

True

pos

itive

rat

e

Across-fold testing Human-weighting vs. Normal

Train on one fold, test on nine, 90

Models were tested against FDDB benchmark• 2845 images with 5171 hand-annotated faces •10 “folds” trained and tested independently •Standard scoring function for model comparison •Challenging faces including many with occlusion

Standard Loss Function

Deep Annotation

1Department of Psychology, 2Department of Molecular and Cell Biology

Noise

Compared to normally trained SVM, mean image of best detections for human-weighted loss has more internal features

Human-weighted

Standard

Closing the Loop

Conclusions

Deep AnnotationSURF CascadeJain et al.Subburaman et al.Viola-Jones

Classifier trained with deep, psychometric annotations

0.1

Human Computer

Percent of face visible

Sample from images and treat per-image accuracy as weight

Mea

n ac

cura

cy

Mean acc. 0.93

Mean acc. 0.58

Mean acc. 0.36

Accuracy is only one choice for psychometric measure; we have had success with RT, and there are many other candidates

Algorithm includes a preprocessing stage where ultra-low-threshold viola-jones delivers candidate regions to speed up analysis

MotivationBehavior

Algorithms Models

0.2

0.3

0.4

0.5

0.6

0.7

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0.1 0.12 0.14 0.16 0.18 0.2 0.22 0.24 0.26 0.28 0.3