m. harville 1, a. rahimi 2, t. darrell 2, g. gordon 3, j. woodfill 3 3d head pose tracking with...

34
M. Harville 1 , A. Rahimi 2 , T. Darrell 2 , G. Gordon 3 , J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard Labs; 2: MIT AI Lab; 3: Tyzx Inc. Part of work was done while all authors were employed by Interval Research.

Upload: cody-reeves

Post on 18-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard

M. Harville1, A. Rahimi2, T. Darrell2,

G. Gordon3, J. Woodfill3

3D Head Pose Tracking with Linear Depth and Brightness

Constraints

1: Hewlett-Packard Labs; 2: MIT AI Lab; 3: Tyzx Inc.Part of work was done while all authors were employed by Interval Research.

Page 2: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard

The Basic Problem to be Solved

We want to know the rotation (3 DOF) and translation (3 DOF) that a rigid object undergoes from one frame in a video to the next.

In this case, the inter-frame motion can be expressed as rotation about a vertical axis, followed by rightward translation

t t + t

Page 3: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard

The Basic Problem to be Solved (cont.)

• Add up these incremental motions to get cumulative motion since start of video

• Motion estimation is equivalent to the tracking of object “pose”: position and orientation in some reference coordinate system.

• One way to visualize pose estimate: render axes in image as if they were rigidly affixed to object.

t t + t

Page 4: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard

Applications - Lots!

• Perceptual user interface: understanding of head gaze, gestures

• Virtual reality: avatars; prosthetic input devices• Camera ego-motion: robot or mobile vehicle self-

localization; panoramic scene-reconstruction from video

• Augmented reality: make rendered object in a scene move with scene even as camera turns

• Object-tracking: pick-and-place assembly machines; surveillance; automobile collision avoidance

Page 5: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard

Example: Head pose estimation

• Approximate head as a rigid body.• Want to know which way head is turned, and where

it is in space.

Page 6: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard

The Inspiration

In most situations, all you have is color or grayscale video from a single camera, and most prior methods have focused on how to solve the problem under these conditions => very difficult!

Suppose you had a little more information:

a registered, companion video of

dense (per-pixel) depth.

Now what would be the best thing to do, and how good is it?

Page 8: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard

The Sales Pitch for Our Solution

Under the assumption that, in addition to intensity and/or color information, you have dense depth from some source (e.g. stereo, laser, structured light), here is a method that...

• Is designed for speed (single linear system of equations) => good for real-time applications

• Does not require approximation of shape model or prior knowledge of object shape

• Provides superior or comparable accuracy to other methods

Page 9: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard

Prior Work:Feature-Based Methods

• Common approaches• General feature-tracking + Structure-from-Motion• Eye / Nose / Mouth tracking + Rigid Head model• State-of-the-art: Zelisky et. al. (Australia)

• Common problems• Features disappear• Rotation appears as Translation• Depth change must be inferred from scale change• Data are noisy: need to integrate information optimally

over entire observation

Page 10: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard

An Alternative:Direct Motion Estimation

• Use measurements based on change in image values rather than tracked features

-> More robust -- doesn’t discard uncertainty information

• Express constraints directly on image values• Pool information with least squares estimate over

all pixels

-> Not dependent on small set of key features

• Lots of prior work: Horn and Weldon ‘88, Bergen et al. ‘92, Black and Yacoob ‘95, Bregler and Malik ‘98, Stein and Shashua ‘98, ...

Page 11: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard

Some Variable Definitions

z

y

x

T

T

T

T

z

y

x

||||

Z

Y

X

P

X

Y

Z

Camera Center of Projection

y

xp

Points in Space and Points in Image

3D Coordinate System and Motion Parameters

System Input: I(x,y) and Z(x,y) at times t, t+1

System Output: inter-frame motion T and

O

Page 12: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard

Direct Motion Estimation Using BCCE

• Brightness Change Constraint Equation (BCCE):

)1,,(),,( tvyvxItyxI yx

0dt

dIv

dy

dIv

dx

dIyx

y

x

v

v

dy

dI

dx

dI

dt

dI

• First-order Taylor series expansion:

• Matrix formulation:

Page 13: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard

Direct Motion Estimation Using BCCE

Relate 2D velocities to 3D velocities via a camera projection model:

Orthographic Perspective

z

y

x

y

x

V

V

V

v

v

010

001

yVyvxVxv

YyXx

,

,

22 ,

,

Z

fYV

Z

fV

yvZ

fXV

Z

fVxv

Z

fYy

Z

fXx

zyzx

z

y

x

y

x

V

V

V

Z

y

Z

fZ

x

Z

f

v

v

0

0

OR OR

Page 14: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard

Direct Motion Estimation Using BCCE

Constrain 3D velocities to be consistent with rotation and translation of a single rigid body:

For small angle rotations,

0

0

0ˆ where

,ˆˆ

XY

XZ

YZ

TTPTV

V

V

V

z

y

x

P

PIP

Page 15: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard

Direct Motion Estimation Using BCCE

Chain these relations together to get one constraint equation per pixel:• Orthographic

• Perspective

Combine across pixels into one linear system and solve for [ T, ] via QR or SVD.

T

0100

0010

0001

)(1

XY

XZ

YZ

dy

dIy

dx

dIx

dy

dIf

dx

dIf

Zdt

dI

T

0100

0010

0001

0

XY

XZ

YZ

dy

dI

dx

dI

dt

dI

Page 16: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard

Direct Motion Estimation Using BCCE

• Z unknown !

• Past solutions:• Assume approximate shape: planar (Black and

Yacoob), ellipsoidal (Basu and Pentland; Bregler and Malik), polygonal (Essa et.al.), hyperquadrics, etc.

• Laser-scanned 3D model of object to be tracked• Estimate depth and motion successively via linear or

non-linear methods, or together with non-linear optimization => “open loop” issues

T

0100

0010

0001

)(1

XY

XZ

YZ

dy

dIy

dx

dIx

dy

dIf

dx

dIf

Zdt

dI

Page 17: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard

“Direct Depth”: two new ideas

1. Use (independently measured) Z directly in BCCE• Believe it or not, this appears to be novel.• Frees us from shape model that is either approximate

(e.g. planar, ellipsoidal, etc.) or which is known a priori.• Shape model can change (slowly) over time: allows for

360 degree rotations, better handles non-rigidity.• Related to Direct Motion Stereo of [Shieh et al.] and [Stein

and Shashua], but their methods assume infinitesimal camera baselines and require coarse-to-fine solution if disparities >1 pixel are generated. Also, they compute motion before depth; we use depth directly.

Page 18: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard

“Direct Depth”: two new ideas

2. Express a direct constraint on the depth gradient.• It operates on depth image very similarly to how the

classic Brightness Change Constraint Equation (BCCE) applies to the intensity image.

• We call this the “Depth Change Constraint Equation”, or “DCCE”.

zyx VtvyvxZtyxZ )1,,(),,(

)1,,(),,( tvyvxItyxI yx

0 zyx Vdt

dZv

dy

dZv

dx

dZ

Page 19: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard

The DCCE

Add in perspective projection and constrain to a single rigid motion:

Very similar to our result for BCCE:

T

0100

0010

0001

)(1

XY

XZ

YZ

dy

dZy

dx

dZxZ

dy

dZf

dx

dZf

Zdt

dZ

T

0100

0010

0001

)(1

XY

XZ

YZ

dy

dIy

dx

dIx

dy

dIf

dx

dIf

Zdt

dI

Page 20: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard

DCCE vs. BCCE

• Advantages of DCCE over BCCE• Depth information is more robust to lighting changes in

space and time.• The BCCE is an assumption that is true only for perfectly

uniform illumination and Lambertian surfaces, whereas the DCCE is just a linearization of a generic description of motion in 3D.

• But…real-time depth data tends to be very noisy and full of holes!• Smoothing seems to help.

Page 21: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard

Joint Constraint on Rigid Motion

Our proposal: combine the BCCE and DCCE constraint equations into a single linear system:

T

0100

0010

0001

)(

)(

XY

XZ

YZ

dy

dZy

dx

dZxZ

dy

dZf

dx

dZf

dy

dIy

dx

dIx

dy

dIf

dx

dIf

dt

dZdt

dI

b

b

THHTH

H1

Least squares problem, solve for six-parameter vector via QR or SVD.

Page 22: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard

Some Important Practical Details

• Support maps• Only use constraint equations where depth and all depth

derivatives are valid.

• Ignore locations of very high depth gradient (due to self-occlusion/disocclusion)

• Coordinate shift• If center of coordinate system is far from object, it is easy to

confuse translation with rotation about a distant axis, and vice versa -> numerical instability.

• Solution: At each time step, find object centroid, compute motion in coordinate system centered there, then transform motion parameters back to world coordinate system.

Page 23: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard

Experiments

• Synthetic and real sequences of moving heads• Synthetic sequences provide us with ground truth for

quantitative analysis• Real sequences show it’s not just theory.

• Hard cases: translation in Z, rotation out-of-plane• Compare four motion estimation methods

• BCCE only with planar depth -> representative of standard methods

• BCCE only with measured depth• DCCE only• BCCE + DCCE

Page 24: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard

Synthetic Image Sequences

Generated color and depth image sequences by rendering a laser-scanned model of a human face with a standard graphics package.

Rotation sequence Z-translation sequence

Page 25: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard

Synthetic Results - Rotation Sequence

Page 26: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard

Synthetic Results - Z-Trans Sequence

Page 28: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard

Real Results: Still-Frame Comparison

Select Frames from BCCE+planar depth

Select Frames from BCCE+

DCCE

=>

=>

=>

=>

Frame 68 Frame 111 Frame 162

Page 29: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard

Real Results: Still-Frame Comparison

Select Frames from BCCE+planar depth

Select Frames from BCCE+

DCCE

=>

=>

=>

=>

Frame 211 Frame 293

Page 30: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard

Real Results: BCCE with Planar Depth

Page 31: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard

Real Results: BCCE + DCCE

Page 32: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard

Extensions and Future Work

• Complement it with a slower, non-differential approach that helps detect and remove gross errors

• Real-time implementation!• Experiment with some mathematical tweaks:

• Constrained or weighted least squares• Use a second iteration per frame

• Add coarse-to-fine to handle large motions, if needed

• More ambitious tests: 360 degree rotation, slow non-rigidity, etc. => things few or no other methods can do

Page 33: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard

Extensions & Future Work

• Apply direct depth and brightness constraint without rigid model: 3-D direct optic flow.

• Ego-motion: use joint depth and brightness constraint to recover camera motion.

• Articulated bodies: extend to use exponential twist formalism, a la Bregler and Malik.

M. Covell, A. Rahimi, M. Harville, T. Darrell. "Articulated-pose estimation using brightness- and depth-constancy constraints.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Hilton Head S.C., June 2000.

Page 34: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard

The End