estimating 3d facial pose in video with just three points

33
UNIVERSITY OF MURCIA (SPAIN) ARTIFICIAL PERCEPTION AND PATTERN RECOGNITION GROUP Estimating 3D Facial Pose in Video with Just Three Points Ginés García Mateos, Alberto Ruiz García Dept. de Informática y Sistemas P.E. López-de-Teruel, A.L. Rodriguez, L. Fernández Dept. Ingeniería y Tecnología de Computadores University of Murcia - SPAIN

Upload: holland

Post on 29-Jan-2016

36 views

Category:

Documents


0 download

DESCRIPTION

Estimating 3D Facial Pose in Video with Just Three Points. Ginés García Mateos, Alberto Ruiz García Dept. de Inform á tica y Sistemas P.E. López-de-Teruel, A.L. Rodriguez, L. Fernández Dept. Ingeniería y Tecnología de Computadores University of Murcia - SPAIN. Introduction (1/3). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Estimating 3D Facial Pose in Video with Just Three Points

UNIVERSITY OF MURCIA (SPAIN)

ARTIFICIAL PERCEPTION AND PATTERN

RECOGNITION GROUP

Estimating 3D Facial Pose in Video with Just Three PointsEstimating 3D Facial Pose in Video with Just Three Points

Ginés García Mateos, Alberto Ruiz GarcíaDept. de Informática y Sistemas

P.E. López-de-Teruel, A.L. Rodriguez, L. FernándezDept. Ingeniería y Tecnología de Computadores

University of Murcia - SPAIN

Page 2: Estimating 3D Facial Pose in Video with Just Three Points

2

ESTIMATING3D FACIAL

POSE IN VIDEO WITH JUST THREE

POINTS

G. GarcíaA. Ruiz

P.E. LópezA.L. RodríguezL. Fernández

3DFP’2008ANCHORAGEJUNE, 2008

Introduction (1/3) Introduction (1/3)

• Main objective: to develop a new method to estimate the 3D pose of the head of a human user:– Estimation through a video sequence– Working with the minimum necessary

information: a 2D location of the face– A very simple method, without training,

running in real-time: fast processing– Under realistic conditions: robust to

facial expressions, light, movements– Robustness preferred to accuracy

Page 3: Estimating 3D Facial Pose in Video with Just Three Points

3

ESTIMATING3D FACIAL

POSE IN VIDEO WITH JUST THREE

POINTS

G. GarcíaA. Ruiz

P.E. LópezA.L. RodríguezL. Fernández

3DFP’2008ANCHORAGEJUNE, 2008

Introduction (2/3) Introduction (2/3) • 3D pose estimation using 3D tracking…

http://www.lysator.liu.se/~eru/research/

http://www.merl.com/projects/3Dfacerec/ www.cs.bu.edu/groups/ivc/html/research_list.php

Active Appearance Model

Shape & texture models Cylindrical Models

3D morphable mesh

http://cvlab.epfl.ch/research/body

Page 4: Estimating 3D Facial Pose in Video with Just Three Points

4

ESTIMATING3D FACIAL

POSE IN VIDEO WITH JUST THREE

POINTS

G. GarcíaA. Ruiz

P.E. LópezA.L. RodríguezL. Fernández

3DFP’2008ANCHORAGEJUNE, 2008

Introduction (3/3) Introduction (3/3) • In short, we want to obtain something like this:

• The result is 3D location (x, y, x), and 3D orientation (roll, pitch, yaw): 6 D.O.F.

Page 5: Estimating 3D Facial Pose in Video with Just Three Points

5

ESTIMATING3D FACIAL

POSE IN VIDEO WITH JUST THREE

POINTS

G. GarcíaA. Ruiz

P.E. LópezA.L. RodríguezL. Fernández

3DFP’2008ANCHORAGEJUNE, 2008

Index of the presentationIndex of the presentation

• Overview of the proposed method– 2D facial detection and location

– 2D face tracking

• 3D Facial pose estimation– 3D Position

– 3D Orientation

• Experimental results• Conclusions

Page 6: Estimating 3D Facial Pose in Video with Just Three Points

6

ESTIMATING3D FACIAL

POSE IN VIDEO WITH JUST THREE

POINTS

G. GarcíaA. Ruiz

P.E. LópezA.L. RodríguezL. Fernández

3DFP’2008ANCHORAGEJUNE, 2008

Overview of the Proposed Method

Overview of the Proposed Method

• The key idea: separate the problems of 2D tracking and 3D pose estimation.

•Introducing some assumptions and simplifications, pose is extracted with very little information.

The proposed 3D pose estimator could

use any 2D facial tracker

The proposed 3D pose estimator could

use any 2D facial tracker

2D Face detection 2D Face tracking3D Pose

estimation

Page 7: Estimating 3D Facial Pose in Video with Just Three Points

7

ESTIMATING3D FACIAL

POSE IN VIDEO WITH JUST THREE

POINTS

G. GarcíaA. Ruiz

P.E. LópezA.L. RodríguezL. Fernández

3DFP’2008ANCHORAGEJUNE, 2008

2D Face Detection, Location and Tracking Using I.P.

2D Face Detection, Location and Tracking Using I.P.

• We use a method based on integral projections (I.P.), which is simple and fast.

• Definition of I.P.: average of gray levels of an image along rows and columns.

75 100 125 150 175 200 225

PV(y)

100

80

60

40

20

0

y

i(x, y)

PVi : [ymin, ..., ymax] → R

Given by: PVi(y) := i(·, y)

PHi : [xmin, ..., xmax] → R

Given by: PHi(x) := i(x, ·)

20 40 60 80x

225

200

175150

125

100

PH(x)

Page 8: Estimating 3D Facial Pose in Video with Just Three Points

8

ESTIMATING3D FACIAL

POSE IN VIDEO WITH JUST THREE

POINTS

G. GarcíaA. Ruiz

P.E. LópezA.L. RodríguezL. Fernández

3DFP’2008ANCHORAGEJUNE, 2008

2D Face Detection with I.P.2D Face Detection with I.P.

Global view of the I.P. face detector

a

Inputimage

PVface

PHeyes

Step 1. Vertical projections by

strips

Step 2. Horizontal

projection of the candidates

Step 3.Grouping of

the candidates

Final result

Page 9: Estimating 3D Facial Pose in Video with Just Three Points

9

ESTIMATING3D FACIAL

POSE IN VIDEO WITH JUST THREE

POINTS

G. GarcíaA. Ruiz

P.E. LópezA.L. RodríguezL. Fernández

3DFP’2008ANCHORAGEJUNE, 2008

2D Face Detection with I.P.2D Face Detection with I.P.

• To improve the results, we combine two face detectors: combined detector.

Face Detector 1.Look for candidates

Face Detector 2.Verify face candidates

Final detectionresult

Haar + AdaBoost[Viola and Jones, 2001]

Integral Projections[Garcia et al, 2007]

Page 10: Estimating 3D Facial Pose in Video with Just Three Points

10

ESTIMATING3D FACIAL

POSE IN VIDEO WITH JUST THREE

POINTS

G. GarcíaA. Ruiz

P.E. LópezA.L. RodríguezL. Fernández

3DFP’2008ANCHORAGEJUNE, 2008

2D Face Detection with I.P.2D Face Detection with I.P.

[Garcia et al, 2007]

ROC curves on UMU FaceDB (737 img./853 faces)

IP+Haar Haar+IPHaar NeuralNetIntProj TemMatch Cont

0.2 0.4 0.6 0.8 1 1.2% false positives

0.2

0.4

0.6

0.8

1

0.0050.01 0.050.1 0.5 1% false positives

0.2

0.4

0.6

0.8

1

% detected faces

% detected faces

Detec

tor

Det. r

atio

F.P.=

0.584,2% 91,8% 88,6% 39,0% 24,8% 88,6% 96,1%

Time

PIV 2

,6Gh

85 ms85 ms 293 ms293 ms 2338 ms2338 ms 389 ms389 ms 120 ms120 ms 97 ms97 ms 296 ms296 ms

Page 11: Estimating 3D Facial Pose in Video with Just Three Points

11

ESTIMATING3D FACIAL

POSE IN VIDEO WITH JUST THREE

POINTS

G. GarcíaA. Ruiz

P.E. LópezA.L. RodríguezL. Fernández

3DFP’2008ANCHORAGEJUNE, 2008

2D Face Location with I.P.2D Face Location with I.P.

Global view of the 2D face locator

Input image and face

0 5 1015202530x

250 200 150 100 500

0 5 1015202530x

PHojos(x) PH’ojos(x)

Step 1. Orientation estimation

Step 2. Vertical

alignment

Step 3. Horizontal alignment

Final result

250 200 150 100

10 20 30 40x

500MHeyes(x)

MVface(y)10

20

30

40

50

60

y

200 100

20 60 100 140

PVeyes(y)

15

10

5

0

20

15

10

5

0

20 60 100 140

PV’eyes(y)

50 150 250PV’face(y)

30

20

10

0

50 150 250PVface(y)

30

20

10

0

Page 12: Estimating 3D Facial Pose in Video with Just Three Points

12

ESTIMATING3D FACIAL

POSE IN VIDEO WITH JUST THREE

POINTS

G. GarcíaA. Ruiz

P.E. LópezA.L. RodríguezL. Fernández

3DFP’2008ANCHORAGEJUNE, 2008

2D Face Location with I.P.2D Face Location with I.P.

Location accuracy of the 2D face locator

Av. timePIV 2.6Gh 1,7 ms1,7 ms

IntProj NeuralNet EigenFeat

323,6 ms323,6 ms 20,5 ms20,5 ms

Page 13: Estimating 3D Facial Pose in Video with Just Three Points

13

ESTIMATING3D FACIAL

POSE IN VIDEO WITH JUST THREE

POINTS

G. GarcíaA. Ruiz

P.E. LópezA.L. RodríguezL. Fernández

3DFP’2008ANCHORAGEJUNE, 2008

2D Face Tracking with I.P.2D Face Tracking with I.P.

Initial facedetection&location Motion model

updatePrediction ofnew position

Facerelocation

Frame t+1

Lost face

Correct tracking

FACE TRACKING

Step 0.Prediction

Step 3.Orientationestimation

Step 1.Vertical

alignment

Step 2.Horizontalalignment

PHeyes(x)

0 20 40 60

200 150 100

x

PH’eyes(x)

0 20 40 60

200 150 100

x50 150 250PVface(y)

60

40

20

0

-20

50 150 250PV’face(y)

60

40

20

0

-20

50 150 250PV’eyes(y)

30

25 20

15 10

50

50 150 250PVeyes(y)

30

25 20

15 10

50

y

Page 14: Estimating 3D Facial Pose in Video with Just Three Points

14

ESTIMATING3D FACIAL

POSE IN VIDEO WITH JUST THREE

POINTS

G. GarcíaA. Ruiz

P.E. LópezA.L. RodríguezL. Fernández

3DFP’2008ANCHORAGEJUNE, 2008

2D Face Tracking with I.P.2D Face Tracking with I.P.

• Sample result of the proposed tracker.

320x240 pixels, 312 frames at 25fps, laptop webcam

(e1x, e1y) = location of left eye; (e2x, e2y) = right eye; (mx, my) = location of the mouth

Page 15: Estimating 3D Facial Pose in Video with Just Three Points

15

ESTIMATING3D FACIAL

POSE IN VIDEO WITH JUST THREE

POINTS

G. GarcíaA. Ruiz

P.E. LópezA.L. RodríguezL. Fernández

3DFP’2008ANCHORAGEJUNE, 2008

3D Facial Pose Estimation3D Facial Pose Estimation• In theory, 3 points should be enough to

solve the 6 degrees-of-freedom (if focal length and face geometry are known).

• But…

• Location errors are high in the mouth for non-frontal faces.

• Some assumptions are introduced to avoid the effect of this error.

Page 16: Estimating 3D Facial Pose in Video with Just Three Points

16

ESTIMATING3D FACIAL

POSE IN VIDEO WITH JUST THREE

POINTS

G. GarcíaA. Ruiz

P.E. LópezA.L. RodríguezL. Fernández

3DFP’2008ANCHORAGEJUNE, 2008

3D Facial Pose Estimation3D Facial Pose Estimation• Fixed body assumption: fixed user’s

body, moving the head 3D position is estimated in the first frame; 3D orientation in the following frames.

• A simple perspective projection model is used to estimate 3D position.

Page 17: Estimating 3D Facial Pose in Video with Just Three Points

17

ESTIMATING3D FACIAL

POSE IN VIDEO WITH JUST THREE

POINTS

G. GarcíaA. Ruiz

P.E. LópezA.L. RodríguezL. Fernández

3DFP’2008ANCHORAGEJUNE, 2008

3D Position Estimation3D Position Estimation

• f: focal length (known)• (cx,cy): tracked center of the face

X,Y

Z

f

pz

c cx y,p px y,

Imageplane

Center ofprojection

(0,0,0)

p= (px,py,pz) p= (px,py,pz)

cx= (e1x+e2x+mx)/3cy=

(e1y+e2y+my)/3

cx= (e1x+e2x+mx)/3cy=

(e1y+e2y+my)/3

Page 18: Estimating 3D Facial Pose in Video with Just Three Points

18

ESTIMATING3D FACIAL

POSE IN VIDEO WITH JUST THREE

POINTS

G. GarcíaA. Ruiz

P.E. LópezA.L. RodríguezL. Fernández

3DFP’2008ANCHORAGEJUNE, 2008

3D Position Estimation3D Position Estimation

• We have:

cx/f = px/pz ; cy/f = py/pz• Where:

cx= (e1x+e2x+mx)/3; cy= (e1y+e2y+my)/3• So:

px= (e1x+e2x+mx)/3·pz/f

py= (e1y+e2y+my)/3·pz/f• The depth of the face, pz, is computed

with: pz= f·t/r, where r is the apparent face size* and t is the real size.

* For more information, see the paper. .

Page 19: Estimating 3D Facial Pose in Video with Just Three Points

19

ESTIMATING3D FACIAL

POSE IN VIDEO WITH JUST THREE

POINTS

G. GarcíaA. Ruiz

P.E. LópezA.L. RodríguezL. Fernández

3DFP’2008ANCHORAGEJUNE, 2008

Estimation of Roll AngleEstimation of Roll Angle

• Roll angle can be approximately associated with the 2D rotation of the face in the image.

roll = arctane2y − e1y

e2x − e1x

• This equation is valid in most practical situations, but it is not precise in all cases.

roll = -43,7º roll = -2,8º roll = 15,9º roll = 34,6º

Page 20: Estimating 3D Facial Pose in Video with Just Three Points

20

ESTIMATING3D FACIAL

POSE IN VIDEO WITH JUST THREE

POINTS

G. GarcíaA. Ruiz

P.E. LópezA.L. RodríguezL. Fernández

3DFP’2008ANCHORAGEJUNE, 2008

Estimation of Pitch and YawEstimation of Pitch and Yaw

• The head-neck system can be modeled as a robotic arm, with 3 rotational DOF.

Y

Z Xroll

pitch

yaw

XY

bb

c

ZX

Y

bb

a

ORTHOGRAPHIC VIEW TOP VIEW FRONT VIEW

Zi

• In this model, any point of the head lies in a sphere its projection is related to pitch and yaw.

Y

X i

(dx0,dy0) (dxt,dyt)

r i

Page 21: Estimating 3D Facial Pose in Video with Just Three Points

21

ESTIMATING3D FACIAL

POSE IN VIDEO WITH JUST THREE

POINTS

G. GarcíaA. Ruiz

P.E. LópezA.L. RodríguezL. Fernández

3DFP’2008ANCHORAGEJUNE, 2008

Estimation of Pitch and YawEstimation of Pitch and Yaw• rw: radius of the sphere where the center of the eyes lies.

• ri: radius of the circle where that sphere is projected.

• (dx0, dy0): initial center of eyes.

• (dxt, dyt): current center of eyes

Y i

X i

(dx0,dy0)

r i

rw= sqrt(a2+c2)

ri= rw·f/pz

((e1x+e2x)/2, (e1y+e2y)/2)

Y i

X i

(dx0,dy0)(dx1,dy1)

r i

Y i

X i

(dx0,dy0)(dx2,dy2)

r i

Initial framepitch= 0, yaw= 0

Instant t = 1 Instant t = 2

Page 22: Estimating 3D Facial Pose in Video with Just Three Points

22

ESTIMATING3D FACIAL

POSE IN VIDEO WITH JUST THREE

POINTS

G. GarcíaA. Ruiz

P.E. LópezA.L. RodríguezL. Fernández

3DFP’2008ANCHORAGEJUNE, 2008

Estimation of Pitch and YawEstimation of Pitch and Yaw• In essence, we have a problem of computing

altitude and latitude for a given point in a circle.• The center of the circle is:

(dx0, dy0 − a·f/pz)

• So we have:

pitch = arcsin

• And:

yaw = arcsin

dyt − (dy0 − a · f/pz)

ri- arcsin a/c

dxt − dx0

ri · cos(pitch + arcsin(a/c))

Y

X i

(dx0,dy0) (dxt,dyt)

ri

Page 23: Estimating 3D Facial Pose in Video with Just Three Points

23

ESTIMATING3D FACIAL

POSE IN VIDEO WITH JUST THREE

POINTS

G. GarcíaA. Ruiz

P.E. LópezA.L. RodríguezL. Fernández

3DFP’2008ANCHORAGEJUNE, 2008

Experimental Results (1/7)Experimental Results (1/7)

• Experiments carried out:– Off-the-shelf webcams.– Different individuals.– Variations in facial expressions and

facial elements (glasses).

• Studies of robustness, efficiency, comparison with a projection-based 3D estimation algorithm.

• In a Pentium IV at 2.6Gh: ~5 ms file reading, ~3 ms tracking, ~0.006 ms pose estimation

Page 24: Estimating 3D Facial Pose in Video with Just Three Points

24

ESTIMATING3D FACIAL

POSE IN VIDEO WITH JUST THREE

POINTS

G. GarcíaA. Ruiz

P.E. LópezA.L. RodríguezL. Fernández

3DFP’2008ANCHORAGEJUNE, 2008

Experimental Results (2/7)Experimental Results (2/7)

• Sample input video: bego.a.avi

320x240 pixels, 312 frames at 25fps, laptop webcam

Page 25: Estimating 3D Facial Pose in Video with Just Three Points

25

ESTIMATING3D FACIAL

POSE IN VIDEO WITH JUST THREE

POINTS

G. GarcíaA. Ruiz

P.E. LópezA.L. RodríguezL. Fernández

3DFP’2008ANCHORAGEJUNE, 2008

Experimental Results (3/7)Experimental Results (3/7)

• 3D pose estimation results

320x240 pixels, 312 frames at 25fps, laptop webcam

Page 26: Estimating 3D Facial Pose in Video with Just Three Points

26

ESTIMATING3D FACIAL

POSE IN VIDEO WITH JUST THREE

POINTS

G. GarcíaA. Ruiz

P.E. LópezA.L. RodríguezL. Fernández

3DFP’2008ANCHORAGEJUNE, 2008

Experimental Results (4/7)Experimental Results (4/7)

-20

-10

0

10

20

30

0 100 200 300 400# frames

Pitch

Proposed method

Projection-based

Proposed method Projection-based

Page 27: Estimating 3D Facial Pose in Video with Just Three Points

27

ESTIMATING3D FACIAL

POSE IN VIDEO WITH JUST THREE

POINTS

G. GarcíaA. Ruiz

P.E. LópezA.L. RodríguezL. Fernández

3DFP’2008ANCHORAGEJUNE, 2008

Experimental Results (5/7)Experimental Results (5/7)

• Range of working angles…

• Approx. ±20º in pitch and ±40º in yaw.

• The 2D tracker is not explicitly prepared for profile faces!

Page 28: Estimating 3D Facial Pose in Video with Just Three Points

28

ESTIMATING3D FACIAL

POSE IN VIDEO WITH JUST THREE

POINTS

G. GarcíaA. Ruiz

P.E. LópezA.L. RodríguezL. Fernández

3DFP’2008ANCHORAGEJUNE, 2008

Experimental Results (6/7)Experimental Results (6/7)

• With glasses and without glasses

Page 29: Estimating 3D Facial Pose in Video with Just Three Points

29

ESTIMATING3D FACIAL

POSE IN VIDEO WITH JUST THREE

POINTS

G. GarcíaA. Ruiz

P.E. LópezA.L. RodríguezL. Fernández

3DFP’2008ANCHORAGEJUNE, 2008

Experimental Results (7/7)Experimental Results (7/7)• When fixed-body assumption does not hold

• Body/shoulder tracking could be used to compensate body movement.

Page 30: Estimating 3D Facial Pose in Video with Just Three Points

30

ESTIMATING3D FACIAL

POSE IN VIDEO WITH JUST THREE

POINTS

G. GarcíaA. Ruiz

P.E. LópezA.L. RodríguezL. Fernández

3DFP’2008ANCHORAGEJUNE, 2008

Conclusions (1/3)Conclusions (1/3)

• Our purpose was to design a fast, robust, generic and approximate 3D pose estimation method:– Separation of 2D tracking and 3D pose.

– Fixed-body assumption.

– Robotic head model.

• 3D position is computed in the first frame.• 3D orientation is estimated in the rest of

frames.• Estimation process is very simple, and

avoids inaccuracies in the 2D tracker.

Page 31: Estimating 3D Facial Pose in Video with Just Three Points

31

ESTIMATING3D FACIAL

POSE IN VIDEO WITH JUST THREE

POINTS

G. GarcíaA. Ruiz

P.E. LópezA.L. RodríguezL. Fernández

3DFP’2008ANCHORAGEJUNE, 2008

Conclusions (2/3)Conclusions (2/3)

• Future work: using the 3D pose estimator in a perceptual interface.

Page 32: Estimating 3D Facial Pose in Video with Just Three Points

32

ESTIMATING3D FACIAL

POSE IN VIDEO WITH JUST THREE

POINTS

G. GarcíaA. Ruiz

P.E. LópezA.L. RodríguezL. Fernández

3DFP’2008ANCHORAGEJUNE, 2008

Conclusions (3/3)Conclusions (3/3)

• The simplifications introduced lead to several limitations of our system, but in general…

• Human anatomy of the head/neck system could be used in 3D face trackers.

• The human head cannot move independently of the body!

• Taking advantage of these anatomical limitations could simplify and improve current trackers.

Page 33: Estimating 3D Facial Pose in Video with Just Three Points

33

ESTIMATING3D FACIAL

POSE IN VIDEO WITH JUST THREE

POINTS

G. GarcíaA. Ruiz

P.E. LópezA.L. RodríguezL. Fernández

3DFP’2008ANCHORAGEJUNE, 2008

LastLast

• This work has been supported by the project Consolider Ingenio-2010 CSD2006-00046, and TIN2006-15516-C04-03.

• Sample videos:

http://dis.um.es/~ginesgm/fip

• Grupo PARP web page:

http://perception.inf.um.es/

Thank you very much