visual recognition: objects, actions and sceneslaptev/teaching/trento14/trento14_lecture06.pdf ·...
TRANSCRIPT
![Page 1: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/1.jpg)
Ivan Laptev
Lecture 6:
Human action recognition I
University of Trento
July 7-10, 2014
Trento, Italy
Visual Recognition:
Objects, Actions and Scenes
![Page 2: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/2.jpg)
Lecture overview
MotivationHistoric review
Modern applications
Appearance-based methodsMotion history images
Active shape models
Tracking and motion priors
Motion-based methodsGeneric and parametric Optical Flow
Motion templates
Space-time methodsLocal space-time features
Action classification and detection
Weakly-supervised action learning
![Page 3: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/3.jpg)
Motivation I: Artistic Representation
Leonardo da Vinci (1452–1519): A man going upstairs, or up a ladder.
Early studies were motivated by human representations in Arts
Da Vinci: “it is indispensable for a painter, to become totally familiar with the
anatomy of nerves, bones, muscles, and sinews, such that he understands
for their various motions and stresses, which sinews or which muscle
causes a particular motion”
“I ask for the weight [pressure] of this man for every segment of motion
when climbing those stairs, and for the weight he places on b and on c.
Note the vertical line below the center of mass of this man.”
![Page 4: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/4.jpg)
Giovanni Alfonso Borelli (1608–1679)
The emergence of biomechanics
Borelli applied to biology the
analytical and geometrical methods,
developed by Galileo Galilei
He was the first to understand that
bones serve as levers and muscles
function according to mathematical
principles
His physiological studies included
muscle analysis and a mathematical
discussion of movements, such as
running or jumping
Motivation II: Biomechanics
![Page 5: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/5.jpg)
Motivation III: Motion perception
Etienne-Jules Marey:
(1830–1904) made
Chronophotographic
experiments influential
for the emerging field of
cinematography
Eadweard Muybridge
(1830–1904) invented a
machine for displaying
the recorded series of
images. He pioneered
motion pictures and
applied his technique to
movement studies
![Page 6: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/6.jpg)
Gunnar Johansson [1973] pioneered studies on the use of image
sequences for a programmed human motion analysis
Gunnar Johansson, Perception and Psychophysics, 1973
“Moving Light Displays” (LED) enable identification of familiar people
and the gender and inspired many works in computer vision.
Motivation III: Motion perception
![Page 7: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/7.jpg)
![Page 8: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/8.jpg)
Human actions: Historic overview
19th century
emergence of
cinematography
1973
studies of human
motion perception
17th century
emergence of
biomechanics
15th century
studies of
anatomy
Modern computer vision
![Page 9: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/9.jpg)
Modern applications: Motion capture
and animation
Avatar (2009)
![Page 10: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/10.jpg)
Avatar (2009)Leonardo da Vinci (1452–1519)
Modern applications: Motion capture
and animation
![Page 11: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/11.jpg)
Modern applications: Video editing
Space-Time Video Completion
Y. Wexler, E. Shechtman and M. Irani, CVPR 2004
![Page 12: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/12.jpg)
Space-Time Video Completion
Y. Wexler, E. Shechtman and M. Irani, CVPR 2004
Modern applications: Video editing
![Page 13: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/13.jpg)
Recognizing Action at a Distance
Alexei A. Efros, Alexander C. Berg, Greg Mori, Jitendra Malik, ICCV 2003
Modern applications: Video editing
![Page 14: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/14.jpg)
Recognizing Action at a Distance
Alexei A. Efros, Alexander C. Berg, Greg Mori, Jitendra Malik, ICCV 2003
Modern applications: Video editing
![Page 15: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/15.jpg)
Applications: Unusual Activity Detection
e.g. for surveillance
Detecting Irregularities in
Images and in Video
Boiman & Irani, ICCV 2005
![Page 16: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/16.jpg)
Why automatic video understanding?
Huge amount of video is available and growing
>34K hours of video
upload every day
TV-channels recorded
since 60’s
~30M surveillance cameras in US
=> ~700K video hours/day
![Page 17: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/17.jpg)
First appearance of
N. Sarkozy on TV
Predicting crowd behavior
Counting people
Sociology research:
Influence of character
smoking in movies
Where is my cat? Motion capture and animation
Education: How do I
make a pizza?
Why video analysis?
Applications:
![Page 18: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/18.jpg)
Movies TV
YouTube
Why human actions?
How many person-pixels are in the video?
![Page 19: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/19.jpg)
Movies TV
YouTube
Why human actions?
How many person-pixels are in the video?
40%
35% 34%
![Page 20: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/20.jpg)
How many person pixels
in our daily life?
Wearable camera data: Microsoft SenseCam dataset
![Page 21: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/21.jpg)
How many person pixels
in our daily life?
Wearable camera data: Microsoft SenseCam dataset
~4%
![Page 22: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/22.jpg)
Large variations in appearance:occlusions, non-rigid motion, view-point changes, clothing…
Challenges
Manual collection of training samples is prohibitive: many action classes, rare occurrence
Action vocabulary is not well-defined: What is action granularity?
…
Action Open:
…
…
Action Hugging:
![Page 23: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/23.jpg)
What is action granularity?
Do we want to learn person-throws-cat-into-trash-bin classifier?
Source: http://www.youtube.com/watch?v=eYdUZdan5i8
![Page 24: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/24.jpg)
How action recognition is related
to computer vision?
Car
Car Car CarCar
Car
Road
SkyStreet sign
![Page 25: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/25.jpg)
We can recognize cars and roads,
What’s next?
12,184,113 images, 17624 synsets
![Page 26: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/26.jpg)
![Page 27: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/27.jpg)
Airplane
A plain has crashed, the
cabin is broken, somebody is
likely to be injured or dead.
![Page 28: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/28.jpg)
trash bin
woman
cat
![Page 29: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/29.jpg)
Lecture overview
MotivationHistoric review
Modern applications
Appearance-based methodsMotion history images
Active shape models
Tracking and motion priors
Motion-based methodsGeneric and parametric Optical Flow
Motion templates
Space-time methodsLocal space-time features
Action classification and detection
Weakly-supervised action learning
![Page 30: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/30.jpg)
How to recognize actions?
![Page 31: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/31.jpg)
Action understanding: Key components
Foreground
segmentation
Image
gradients
Optical flow
Local space-
time features
Image measurements
Association
Prior knowledge
Deformable contour
models
2D/3D body models
Automatic
inference
Learning
associations from
strong / weak
supervision
Motion priors
Background models
Action labels
![Page 32: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/32.jpg)
Lecture overview
MotivationHistoric review
Modern applications
Appearance-based methodsMotion history images
Active shape models
Tracking and motion priors
Motion-based methodsGeneric and parametric Optical Flow
Motion templates
Space-time methodsLocal space-time features
Action classification and detection
Weakly-supervised action learning
![Page 33: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/33.jpg)
Lecture overview
MotivationHistoric review
Modern applications
Appearance-based methodsMotion history images
Active shape models
Tracking and motion priors
Motion-based methodsGeneric and parametric Optical Flow
Motion templates
Space-time methodsLocal space-time features
Action classification and detection
Weakly-supervised action learning
![Page 34: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/34.jpg)
Action understanding: Key components
Foreground
segmentation
Image
gradients
Optical flow
Local space-
time features
Image measurements
Association
Prior knowledge
Deformable contour
models
2D/3D body models
Automatic
inference
Learning
associations from
strong / weak
supervision
Motion priors
Background models
Action labels
![Page 35: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/35.jpg)
Foreground segmentation
Image differencing: a simple way to measure motion/change
- > Const
Better Background / Foreground separation methods exist:
Modeling of color variation at each pixel with Gaussian Mixture
Dominant motion compensation for sequences with moving camera
Motion layer separation for scenes with non-static backgrounds
![Page 36: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/36.jpg)
Temporal Templates
[A.F. Bobick and J.W. Davis, PAMI 2001]
Idea: summarize motion in the video in a
Motion History Image (MHI):
Descriptor: Hu moments of different orders
![Page 37: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/37.jpg)
Aerobics dataset
Nearest Neighbor classifier: 66% accuracy
![Page 38: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/38.jpg)
Not all shapes are valid
Restrict the space
of admissible silhouettes
Temporal Templates: Summary
+ Simple and fast
+ Works in controlled settings
Pros:
- Prone to errors of background subtraction
- Does not capture interior
motion and shape
Cons:
Variations in light, shadows, clothing… What is the background here?
Silhouette
tells little
about actions
![Page 39: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/39.jpg)
Active Shape Models of Cootes et al.
Point Distribution Model
Represent the shape of samples by a set
of corresponding points or landmarks
Assume each shape can be represented
by the linear combination of basis shapes
such that
for mean shape
and some parameters
![Page 40: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/40.jpg)
Active Shape Models of Cootes et al.
Basis shapes can be found as the main modes of variation of
in the training data.
Principle Component Analysis (PCA):
Covariance matrix
Eigenvectors eigenvalues
2D
Example: (each point can be
thought as a
shape in N-Dim
space)
![Page 41: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/41.jpg)
Active Shape Models of Cootes et al.
Back-project from shape-space to image space
Three main modes of lips-shape variation:
Distribution of eigenvalues:
A small fraction of basis
shapes (eigenvecors)
accounts for the most of shape
variation (=> landmarks are
redundant)
![Page 42: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/42.jpg)
Active Shape Models of Cootes et al.
is orthonormal basis, therefore
Given estimate of we can recover shape parameters
Projection onto the shape-space serves as a regularization
![Page 43: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/43.jpg)
Given initial guess of model points estimate new positions
using local image search, e.g. locate the closest edge point
How to use Active Shape Models for shape estimation?
Active Shape Models of Cootes et al.
Re-estimate shape parameters
![Page 44: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/44.jpg)
To handle translation, scale and rotation, it is useful to normalize
prior to shape estimation:
Active Shape Models of Cootes et al.
using similarity transformation
A simple way to estimate is to assign and to the
mean position and the standard deviation of points in
respectively and set . For more sophisticated
normalization techniques see:
Note: model parameters have to be computed using
normalized image point coordinates
http://www.isbe.man.ac.uk/~bim/Models/app_model.ps.gz
![Page 45: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/45.jpg)
Active Shape Models: Their Training and Application
T.F. Cootes, C.J. Taylor, D.H. Cooper, and J. Graham, CVIU 1995
Active Shape Models of Cootes et al.
Example: face alignment Illustration of face shape space
Iterative ASM alignment algorithm
1. Initialize with the reasonable guess of and
2. Estimate from image measurements
3. Re-estimate
4. Unless converged, repeat from step 2
![Page 46: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/46.jpg)
Active Shape Model tracking
Aim: to track ASM of time-varying shapes, e.g. human silhouettes
Impose time-continuity constraint on model parameters.
For example, for shape parameters :
Update model parameters at each time frame using e.g.
Kalman filter
For similarity transformation
More complex dynamical models possible
Gaussian noise
![Page 47: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/47.jpg)
Person Tracking
Learning flexible models from image sequences
A. Baumberg and D. Hogg, ECCV 1994
![Page 48: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/48.jpg)
Person Tracking
Learning flexible models from image sequences
A. Baumberg and D. Hogg, ECCV 1994
![Page 49: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/49.jpg)
Active Shape Models: Summary
+ Shape prior helps overcoming segmentation errors
+ Fast optimization
+ Can handle interior/exterior dynamics
Pros:
- Optimization gets trapped in local minima
- Re-initialization is problematic
Cons:
Possible improvements:
Learn and use motion priors, possibly specific to
different actions
![Page 50: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/50.jpg)
Motion priors
Accurate motion models can be used both to:
Goal: formulate motion models for different types of actions
and use such models for action recognition
Help accurate tracking
Recognize actions
Example:
line drawing
scribbling
idle
[M. Isard and A. Blake, ICCV 1998]
Drawing with 3 action
modes
![Page 51: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/51.jpg)
Incorporating motion priors
Foreground
segmentation
Image gradient
Image measurements Data Association Prior knowledge
Learning motion
models for
different actions
Particle filters
Optical Flow
![Page 52: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/52.jpg)
Bayesian Tracking
General framework: recognition by synthesis;
generative models;
finding best explanation of the data
Notation:
image data at time
model parameters at time (e.g. shape and its dynamics)
prior density for
likelihood of data for the given model configuration
We search posterior defined by the Bayes’ rule
For tracking the Markov assumption gives the prior
Temporal update rule:
![Page 53: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/53.jpg)
Kalman Filtering
If all probability densities are uni-modal, specifically Gussians,
the posterior can be evaluated in closed form
![Page 54: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/54.jpg)
Particle Filtering
In reality probability densities are almost always multi-modal
![Page 55: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/55.jpg)
Particle Filtering
In reality probability densities are almost always multi-modal
Approximate distributions with weighted particles
![Page 56: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/56.jpg)
Particle Filtering
Tracking examples:
describes leave shape describes head shape
CONDENSATION - conditional density propagation for visual tracking
A. Blake and M. Isard IJCV 1998
![Page 57: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/57.jpg)
Learning dynamic prior
Dynamic model: 2nd order Auto-Regressive Process
State
Update rule:
Model parameters:
Learning scheme:
![Page 58: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/58.jpg)
Learning dynamic prior
Statistical models of visual shape and motion
A. Blake, B. Bascle, M. Isard and J. MacCormick, Phil.Trans.R.Soc. 1998
Learning point sequenceRandom simulation of the
learned dynamical model
![Page 59: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/59.jpg)
Learning dynamic prior
Random simulation of the learned gate dynamics
![Page 60: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/60.jpg)
Dynamics with discrete states
Introduce “mixed” state Continuous state
space (as before)
Discrete variable
identifying dynamical
modelTransition probability matrix
or more generally
Incorporation of the mixed-state model into a particle filter is
straightforward, simply use instead of and the
corresponding update rules
![Page 61: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/61.jpg)
Dynamics with discrete states
Example: Drawing
line idle
line
idle
scribbling
line drawing
scribbling
idle
scribbling
Transition
probability
matrix
Result: simultaneously
improved tracking and
gesture recognition
A mixed-state Condensation tracker with automatic model-switching
M. Isard and A. Blake, ICCV 1998
![Page 62: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/62.jpg)
Dynamics with discrete states
[M.J. Black and A.D. Jepson, ECCV 1998]
Similar illustrated on
gesture recognition in
the context of a visual
black-board interface
![Page 63: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/63.jpg)
Motion priors & Trackimg: Summary
+ more accurate tracking using specific motion models
+ Simultaneous tracking and motion recognition with
discrete state dynamical models
Pros:
- Local minima is still an issue
- Re-initialization is still an issue
Cons:
![Page 64: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/64.jpg)
Class overview
MotivationHistoric review
Modern applications
Appearance-based methodsMotion history images
Active shape models
Tracking and motion priors
Motion-based methodsGeneric and parametric Optical Flow
Motion templates
Space-time methodsLocal space-time features
Action classification and detection
Weakly-supervised action learning
![Page 65: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/65.jpg)
Class overview
MotivationHistoric review
Modern applications
Appearance-based methodsMotion history images
Active shape models
Tracking and motion priors
Motion-based methodsGeneric and parametric Optical Flow
Motion templates
Space-time methodsLocal space-time features
Action classification and detection
Weakly-supervised action learning
![Page 66: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/66.jpg)
Shape and Appearance vs. Motion
Shape and appearance in images depends on many factors:
clothing, illumination contrast, image resolution, etc…
Motion field (in theory) is invariant to shape and can be used
directly to describe human actions
[Efros et al. 2003]
![Page 67: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/67.jpg)
Motion estimation: Optical Flow
Classic problem of computer vision [Gibson 1955]
Goal: estimate motion field
How? We only have access to image pixels
Estimate pixel-wise correspondence
between frames = Optical Flow
Brightness Change assumption: corresponding pixels
preserve their intensity (color)
Physical and visual
motion may be different
Useful assumption in many cases
Breaks at occlusions and
illumination changes
![Page 68: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/68.jpg)
Generic Optical Flow
Brightness Change Constraint Equation (BCCE)
Image gradient
Optical flow
One equation, two unknowns => cannot be solved directly
Integrate several measurements in the local neighborhood
and obtain a Least Squares Solution [Lucas & Kanade 1981]
Denotes integration over a spatial (or spatio-temporal)
neighborhood of a point
Second-moment
matrix, the same
one used to
compute Harris
interest points!
![Page 69: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/69.jpg)
Generic Optical Flow
The solution of assumes
1. Brightness change constraint holds in
2. Sufficient variation of image gradient in
3. Approximately constant motion in
Motion estimation becomes inaccurate if any of assumptions
1-3 is violated.
(2) Insufficient gradient variation
known as aperture problem
Solutions:
Increase integration neighborhood
(3) Non-constant motion in
Use more sophisticated motion model
![Page 70: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/70.jpg)
Parameterized Optical Flow
Constant velocity model:
Upgrade to affine motion model:
Now motion depends on the position inside the neighborhood
Examples of Affine motion models for different parameters:
Can be formulated as Least Squares approach to estimate
as before!
![Page 71: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/71.jpg)
Parameterized Optical Flow
Another extension of the constant motion model is to compute
PCA basis flow fields from training examples
Learning Parameterized Models of Image Motion
M.J. Black, Y. Yacoob, A.D. Jepson and D.J. Fleet, CVPR 1997
Training samples PCA flow bases
1. Compute standard Optical Flow for many examples
2. Put velocity components into one vector
3. Do PCA on and obtain most informative PCA flow basis vectors
![Page 72: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/72.jpg)
Parameterized Optical Flow
Use PCA flow bases to regularize solution of motion estimation
Learning Parameterized Models of Image Motion
M.J. Black, Y. Yacoob, A.D. Jepson and D.J. Fleet, CVPR 1997
Motion estimation for test samples can be computed without
explicit computation of optical flow
Solution formulation e.g. in terms of Least Squares
Direct flow recovery:
![Page 73: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/73.jpg)
Parameterized Optical Flow
Learning Parameterized Models of Image Motion
M.J. Black, Y. Yacoob, A.D. Jepson and D.J. Fleet, CVPR 1997
Estimated coefficients of PCA flow bases can be used as action
descriptors
Frame numbers
Frame numbers
![Page 74: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/74.jpg)
Parameterized Optical Flow
Estimated coefficients of PCA flow bases can be used as action
descriptors
Frame numbers
Optical flow seems to be an interesting descriptor for
motion/action recognition
![Page 75: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/75.jpg)
Image frame Optical flow yxF ,
yx FF ,
yyxx FFFF ,,, blurred
yyxx FFFF ,,,
Spatial Motion Descriptor
![Page 76: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/76.jpg)
t
…
…
…
…
SSequence A
Sequence B
Temporal extent E
B
frame-to-frame
similarity matrix
A
motion-to-motion
similarity matrix
A
B
I matrix
E
E
blurry I
E
E
Spatio-Temporal Motion Descriptor
Slide credit: A. Efros
![Page 77: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/77.jpg)
Input
Sequence
Matched
Frames
input matched
Football Actions: matching
Slide credit: A. Efros
![Page 78: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/78.jpg)
10 actions; 4500 total frames; 13-frame motion descriptor
Football Actions: classification
![Page 79: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/79.jpg)
16 Actions; 24800 total frames; 51-frame motion descriptor. Men
used to classify women and vice versa.
Classifying Ballet Actions
![Page 80: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/80.jpg)
6 actions; 4600 frames; 7-frame motion descriptor
Woman player used as training, man as testing.
Classifying Tennis Actions
[Alexei A. Efros, Alexander C. Berg, Greg Mori, Jitendra Malik, ICCV 2003]
![Page 81: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/81.jpg)
What we have seen so far ?
Temporal templates:
+ simple, fast
- sensitive to
segmentation errors
Active shape models:
+ shape regularization
- sensitive to
initialization and
tracking failures
Tracking with motion priors:
+ improved tracking and
simultaneous action recognition
- sensitive to initialization and
tracking failures
Motion-based recognition:
+ generic descriptors;
less depends on
appearance
- sensitive to
localization/tracking
errors
![Page 82: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/82.jpg)
Lecture overview
MotivationHistoric review
Modern applications
Appearance-based methodsMotion history images
Active shape models
Tracking and motion priors
Motion-based methodsGeneric and parametric Optical Flow
Motion templates
Space-time methodsLocal space-time features
Action classification and detection
Weakly-supervised action learning
![Page 83: Visual Recognition: Objects, Actions and Sceneslaptev/teaching/trento14/trento14_lecture06.pdf · Appearance-based methods Motion history images Active shape models Tracking and motion](https://reader031.vdocuments.site/reader031/viewer/2022041019/5ece00e7e612f130492ec9c7/html5/thumbnails/83.jpg)
Lecture overview
MotivationHistoric review
Modern applications
Appearance-based methodsMotion history images
Active shape models
Tracking and motion priors
Motion-based methodsGeneric and parametric Optical Flow
Motion templates
Space-time methodsLocal space-time features
Action classification and detection
Weakly-supervised action learning