Transcript
Page 1: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

Video Tracking

Prof. Didier Stricker

Dr. Alain Pagani

Kaiserlautern University

http://ags.cs.uni-kl.de/

DFKI – Deutsches Forschungszentrum für Künstliche Intelligenz

http://av.dfki.de

Computer Vision

Object and People Tracking

Page 2: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

2

Video Tracking Definition Applications Bayesian filters for video tracking Motion models Appearance models Dynamic Appearance Models

Template Update Problem Machine learning based models

Occlusion handling Fusion Evaluation Redetection

Outline

Page 3: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

Known as Visual Tracking

Visual Object Tracking

Vision-based Tracking

Defined as

“the process of estimating over time

the location of one or more objects using a

camera” 1

„Objects“: Person, head, hands, animal, ball,

car, plane, semi-finished products,

commercial products, cell…

Video Tracking: definitions

1E. Maggio and A. Cavallaro „Video Tracking, theory and practice“ – John Wiley & Sons, Ltd, 2011 3

Page 4: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

Sensor: a video camera

Input: One (continuous) video sequence

Images: unique source of data for measurement

Visible variable (in the Hidden Markov Model)

Camera might be moving (non-static

background)

State is known in the first frame only

Online (i.e. sequential) Tracking

Video Tracking: details

4

Page 5: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

Sequential (recursive, online)

+ Inexpensive → real-time

- no future information

- cannot revisit past errors

Approaches to tracking

t

t-1

t+1

t=1,…,T

Batch Processing (offline)

- Expensive → not real-time

+ considers all information

+ can correct past errors

5

Slide from 1st lecture

Page 6: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

Tracking applications

6

Slide from 1st lecture

Page 7: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

Same input: video sequence

Different aims: Video Tracking: track an object inside the video

sequence over time

Camera Tacking: track the pose (rotation +

translation) of the camera over time

Typical application of Camera Tracking:

Augmented Reality

Video tracking vs. Camera tracking

7

Page 8: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

Tracking is an essential step in many computer vision based

applications

Video Tracking applications

Detection + Feature

Extraction Tracking

Activity Recognition

+ Event Recognition

Behavior Analysis +

Social Models

Prithwijit Guha , Amitabha Mukerjee and K. S. Venkatesh, Spatio-temporal Discovery: Appearance + Behavior = Agent,

Computer Vision, Graphics and Image Processing 4338 516-527, 2007 8

Slide from 1st lecture

Page 9: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

[1] E. Maggio and A. Cavallaro „Video Tracking, theory and practice“ – John Wiley & Sons, Ltd, 2011

(Slide from [1])

9

Page 10: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

[1] E. Maggio and A. Cavallaro „Video Tracking, theory and practice“ – John Wiley & Sons, Ltd, 2011

(Slide from [1])

10

Page 11: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

[1] E. Maggio and A. Cavallaro „Video Tracking, theory and practice“ – John Wiley & Sons, Ltd, 2011

(Slide from [1]) Sport analysis

11

Page 12: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

Automatic Sports Coverage

F. Chen, C. De Vleeschouwer, "Autonomous production of basketball videos from multi-sensored data with personalized viewpoints," WIAMIS, 2009 12

Page 13: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

Medical application and biological research

cell tracking and mitosis detection

Video Tracking applications

Ryoma Bise, Zhaozheng Yin, and Takeo Kanade, Reliable Cell Tracking by Global Data Association, ISBI 2011 13

Page 14: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

Video Tracking applications Robotics, Industrial production and unmanned

vehicles

Adept Quattro s650

https://www.youtube.com/watch?v=8unkXTtfEBQ

14

Page 15: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

Video Tracking applications Interactive video, product placement

Clickable video

15

Page 16: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

16

Components of a Bayesian Filter: State 𝑥𝑡 Measurement 𝑧𝑡 Control input 𝑢𝑡 Motion model 𝑝(𝑥𝑡|𝑥𝑡−1, 𝑢𝑡) Measurement model 𝑝(𝑧𝑡|𝑥𝑡)

State is hidden, only measurement is

observed

In visual tracking: no control input

Bayesian Filters for Video Tracking

Page 17: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

17

State 𝑥𝑡 (hidden state)

Measurement 𝑧𝑡 (observation)

Motion model (State Transition probability, dynamic model)

Measurement model (Appearance model, likelihood)

Measurement model

Motion model

𝑝 𝑥𝑡 𝑧1:𝑡 = 𝜂 𝑝 𝑧𝑡 𝑥𝑡 𝑝 𝑥𝑡 𝑥𝑡−1 𝑝 𝑥𝑡−1 𝑧1:𝑡−1 𝑑𝑥𝑡−1

Bayesian Filter: generic equation,

components

Page 18: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

18

State 𝑥𝑡 : 1D (position on the floor)

Measurement 𝑧𝑡 : “I see a door/ I see a wall” (binary)

Motion model : 𝑝 𝑥𝑡 𝑥𝑡−1 = 𝛿(𝑥𝑡−1 + 𝑢𝑡) (linear 1D + zero noise)

Appearance model : 𝑝 𝑧𝑡 𝑥𝑡 certain (defined by the map)zero noise

Appearance model

(Measurement model)

Motion model

𝑝 𝑥𝑡 𝑧1:𝑡 = 𝜂 𝑝 𝑧𝑡 𝑥𝑡 𝑝 𝑥𝑡 𝑥𝑡−1 𝑝 𝑥𝑡−1 𝑧1:𝑡−1 𝑑𝑥𝑡−1

Bayesian Filters: robot example

Page 19: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

State 𝑥𝑡 : Point in image (2D)

Bounding box

Ellipse

Shape

Articulated body

Bayesian Filters for Video Tracking: state

Appearance model

(Measurement model)

Motion model

𝑝 𝑥𝑡 𝑧1:𝑡 = 𝜂 𝑝 𝑧𝑡 𝑥𝑡 𝑝 𝑥𝑡 𝑥𝑡−1 𝑝 𝑥𝑡−1 𝑧1:𝑡−1 𝑑𝑥𝑡−1

19

Page 20: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

20

Bayesian Filters for Video Tracking: state

State 𝑥𝑡 :

Appearance model

(Measurement model)

Motion model

𝑝 𝑥𝑡 𝑧1:𝑡 = 𝜂 𝑝 𝑧𝑡 𝑥𝑡 𝑝 𝑥𝑡 𝑥𝑡−1 𝑝 𝑥𝑡−1 𝑧1:𝑡−1 𝑑𝑥𝑡−1

Page 21: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

21

Exemplary state: bounding box

𝑥𝑡 =

𝑥𝑐𝑦𝑐𝑤ℎ

(𝑥𝑐 , 𝑦𝑐): center of the box

(𝑤, ℎ): size of the box

(optional)

Page 22: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

22

Exemplary state: 3D bounding box

𝑥𝑡 =𝑥𝑟𝑦𝑟𝜃

(𝑥𝑟 , 𝑦𝑟): position on the map (road)

𝜃: angular rotation (yaw angle)

(fixed bounding box size)

𝜃

(𝑥𝑟 , 𝑦𝑟)

2D map

Video sequence

Zoomed view

Page 23: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

23

2D scene interpretation

position of centroid (fixed bounding box size): 2D

Centroid + scale (fixed bounding box aspect ratio): 3D

Free Bounding Box: 4D (x, y, w, h)

Free Bounding Box + rotation: 5D

3D scene interpretation

Position in the „real world“ on a map + orientation: 3D

3D Bounding Box: up to 6D

3D BBox + rotations: up to 9D

Video Tracking: usual states

Page 24: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

24

Motion model : 𝑝 𝑥𝑡 𝑥𝑡−1

Depends on how the scene is interpreted:

In 3D, it is possible to define constraints like Constant acceleration

Constant velocity…

In 2D tracking: the camera projection breaks the linearity 𝑝 𝑥𝑡 𝑥𝑡−1 very difficult to express

often extremely simplified:

Zero velocity + large Gaussian noise (very common)

Constant velocity + large Gaussian noise

Bayesian Filters for Video Tracking: motion model

Appearance model

(Measurement model)

Motion model

𝑝 𝑥𝑡 𝑧1:𝑡 = 𝜂 𝑝 𝑧𝑡 𝑥𝑡 𝑝 𝑥𝑡 𝑥𝑡−1 𝑝 𝑥𝑡−1 𝑧1:𝑡−1 𝑑𝑥𝑡−1

Page 25: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

25

Motion model : 𝑝 𝑥𝑡 𝑥𝑡−1

Zero velocity + large Gaussian noise Constant velocity + large Gaussian noise

Motion model Appearance model

(Measurement model)

𝑝 𝑥𝑡 𝑧1:𝑡 = 𝜂 𝑝 𝑧𝑡 𝑥𝑡 𝑝 𝑥𝑡 𝑥𝑡−1 𝑝 𝑥𝑡−1 𝑧1:𝑡−1 𝑑𝑥𝑡−1

Bayesian Filters for Video Tracking: motion model

Page 26: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

26

Measurement 𝒛𝒕 : The entire image at time 𝑡 Full HD resolution, 3 color channels: 6.3 Mo values

One measurement 𝑧𝑡is a point in a 6.3 Mo-dimensional space…

Compared with a range-finder or a radar intractable

Fortunately, we do not need to consider measurement, but only

the measurement model

Appearance model

(Measurement model)

Motion model

𝑝 𝑥𝑡 𝑧1:𝑡 = 𝜂 𝑝 𝑧𝑡 𝑥𝑡 𝑝 𝑥𝑡 𝑥𝑡−1 𝑝 𝑥𝑡−1 𝑧1:𝑡−1 𝑑𝑥𝑡−1

Bayesian Filters for Video Tracking: measurement

Page 27: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

27

Measurement model 𝑝 𝑧𝑡 𝑥𝑡 : Interpretation of the measurement model:

„Suppose the state is 𝑥𝑡, what is the probability that the

entire image looks like the one I have?“

Without statistical description of the background, we can reduce

this probability to a local neighborhood of the object position

defined by 𝑥𝑡.

Often interpreted as „how similar are the image part covered by

state 𝑥𝑡 and the object I expect to see?“

𝑝 𝑧𝑡 𝑥𝑡 = 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦(𝑠𝑢𝑏𝑖𝑚𝑎𝑔𝑒 𝐼, 𝑥𝑡 , 𝑜𝑏𝑗𝑒𝑐𝑡_𝑚𝑜𝑑𝑒𝑙) this explains the term „appearance model“

𝑝 𝑧𝑡 𝑥𝑡

Video Tracking: measurement model

Appearance model

(Measurement model)

Page 28: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

Simple Example: PacMan tracking

Track the pink ghost – Frame 0

Object model: reference template

Frame 0

Simple object model based on

• Expected color (hue and saturation)

• Approximate shape

28

Page 29: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

Simple Example: PacMan tracking

Track the pink ghost – Frame 1

Motion prediction

Measurement step

Object model: reference template

Frame 1

Motion prediction centered on last position

Measurement step based on object model

New belief 29

Page 30: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

Simple Example: PacMan tracking

Track the pink ghost – Frame 2

Motion prediction

Measurement step

Object model: reference template

Frame 2

Motion prediction centered on last position

Measurement step based on object model

New belief

?

? ?

?

30

Page 31: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

PacMan tracking failure example:

For a human it is still easy to follow Sudden appearance changes are more likely

than teleportation (at least in that game)

In the tracking algorithm, this was not

modelled

Computed belief is correct

What about real scenes?

Importance of precise modelling

31

Page 32: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

In a real sequence, the appearance

of tracked objects changes

permanently: Object‘s out-of-plane rotations

Non-rigid objects

Face expression

Changing illumination

Camera motion

Robust appearance models are required!

Appearance changes in a real video

32

Page 33: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

Two components:

Visual representation:

constructs robust objects descriptors using

different types of visual features

Statistical modeling:

constructs mathematical models for object

identification using statistical learning techniques

Choice for the components depends on the

specific context/environment of the tracking

task

Appearance Modeling

Xi Li et. al., A Survey of Appearance Models in Visual Object Tracking, ACM Trans. Intell. Syst. Tech., Sept 2013 33

Page 34: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

Visual features for object description

Uses low-level computer vision techniques

(see first lectures): edges, blobs, corners…

Can be divided into two major classes:

Global Visual representation:

the full image is used for representing the object

Local visual representation:

subparts of the image are used (keypoints,

patches…)

Visual representation

Xi Li et. al., A Survey of Appearance Models in Visual Object Tracking, ACM Trans. Intell. Syst. Tech., Sept 2013 34

Page 35: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

Raw pixel representation

Distance: SAD, SSD, NCC (lecture 4)

Global visual representation

35

1 2 3

4 5 6

7 8 9

10 11 12

1 2 3 4 5 6 7 8 9 10 11 12

Vector of dimension (w.h)

Page 36: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

• Distribution of pixel-based features in the region

• Possible features:

• Color information (RGB or Hue/Sat.)

• Gradient (magnitude + orientation)

• Laplacian

• …

• distribution of the pixels inside the bounding box in the space S

Non-parametric representation

Space S of dimension d

),,,,( bgrf

S

36

Page 37: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

• Which model for the representation of the distribution?

• d-dimensional Gaussian

• Mixture of Gaussians

• Histograms

S

37

Non-parametric representation

Page 38: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

Color Histogram

LAB color histogram LAB space:

designed to

approximate

human vision

L lightness

A,B color

Distance: KL-

divergence

(lecture 10)

Measurements are obtained by

converting pixel values within

bounding boxes to LAB color

space, and concatenating to

form an AB channel histogram 38

Page 39: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

Covariance representation

Global visual representation

Fatih Porikli Using covariance improves computer detection and tracking of humans, SPIE Newsroom, 2006

39

E. Erdem and A. Erdem Visual saliency estimation by nonlinearly integrating features using region covariances, Journal of

Vision, 2013

Reduced descriptor size

Distance: Förstner Distance

W. Förstner and B. Moonen A Metric for Covariance, Quo vadis geodesia, 1999

Page 40: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

Other representations: Wavelet filtering based representation

Active Contour representation

Global visual representation

Xi Li et. al., A Survey of Appearance Models in Visual Object Tracking, ACM Trans. Intell. Syst. Tech., Sept 2013 40

Page 41: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

Local Template based Set of smaller local templates for object‘s parts

Segmentation based Based on meaningful segments (using

segmentation algorithms)

Local visual representation

Wang et. al., Superpixel tracking, ICCV 2011 41

Page 42: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

SIFT-, SURF- or MSER- Based (see first

lectures)

Local visual representation

Xi Li et. al., A Survey of Appearance Models in Visual Object Tracking, ACM Trans. Intell. Syst. Tech., Sept 2013 42

Page 43: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

Local Feature-Pool based Use a huge pool of simple features

HOGs

Gabor features

Haar-like features

The most discriminative features are selected from

the pool in a learning phase

Local visual representation

Viola and Jones, Rapid object detection using a boosted cascade of simple features, CVPR 2001 43

Page 44: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

Build a mathematical model for object

identification using the selected visual

features

Provides a similarity measure between a

proposed object position (image part) and a

model of the expected object

Different approaches: Generative models

Discriminative models

Hybrid approaches

Statistical modeling

Xi Li et. al., A Survey of Appearance Models in Visual Object Tracking, ACM Trans. Intell. Syst. Tech., Sept 2013 44

Page 45: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

Attempts to fit the provided data to the model

Uses parameter estimation (e.g. Expectation

maximization)

Can use online update mechanisms to

incrementally learn visual representations for

the foreground object

Generative appearance model

Xi Li et. al., A Survey of Appearance Models in Visual Object Tracking, ACM Trans. Intell. Syst. Tech., Sept 2013 45

Page 46: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

Object tracking as binary classification

Maximize the separability between object and

non-object regions

Focus on discovering informative features for

visual tracking

Can incrementally learn discriminative

classification functions

Depends on training sample selection

(drawback)

Discriminative appearance model

Xi Li et. al., A Survey of Appearance Models in Visual Object Tracking, ACM Trans. Intell. Syst. Tech., Sept 2013 46

Page 47: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

47

Algorithm Idea: Learn to distinguish the object from its direct

(background) neighborhood

Feature space Pixel-based features (color, gradient...)

Higher level information (wavelets, gradient

histograms...)

Classifiers For Tracking

From: S. Avidan. Ensemble Tracking CVPR 2005

Page 48: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

SVM-based DAM

Boosting-based DAM

Randomized learning-based DAM

Discriminant analysis-based DAM

Codebook learning-based DAM

Types of discriminative appearance models

Xi Li et. al., A Survey of Appearance Models in Visual Object Tracking, ACM Trans. Intell. Syst. Tech., Sept 2013 48

Page 49: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

Modern models are based on learning from examples

Typical (online) tracking task: only the appearance in the

first frame is provided (single instance model)

For learning-based methods, examples are taken while

tracking from the new frames

Consequences:

model of expected appearance evolves with time

𝑝 𝑧𝑡 𝑥𝑡 is not constant

depending on the tracking quality, this can lead to drift

Dynamic appearance models and drift

49

Page 50: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

Drift example

Yu Liu et. al., Feature-driven motion model-based particle-filter tracking method with abrupt motion handling, Opt. Eng, Apr. 2012

50

Page 51: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

51

Simple template-based tracker

using optical flow (Lukas Kanade -

see lecture 6)

Given: one (warped) bounding box

geometry 𝑞0 (quadrangle) and its

unwarped content 𝑇0 (the template)

in the first frame

Computes: the new position 𝑞𝑖 in

the following frames

From 𝑖 to 𝑖 + 1: Geometry

optimization using Taylor expansion

of the local image intensities

Classical example of drift: Lukas-Kanade

Page 52: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

52

Question: what should be the

template in the frame 𝑖 > 0?

Strategy 1: no updates

Keep first template 𝑇0

and take geometry 𝑞𝑖

Strategy 2: naive update

Take new unwarped content 𝑇𝑖 with geometry 𝑞𝑖

Choice of the new template

Page 53: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

53

The Template Update Problem Iain Matthews, Takahiro Ishikawa, and Simon Baker

Proceedings of the British Machine Vision Conference, September, 2003.

The template update problem

Strategy 1: No update

Keep the same

reference template

Template deprecates

Strategy 2: Naive Update

Update the template

at each frame

drift

Strategy 3:

Update the template

at each frame, then

correct for drift with first

template

Page 54: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

54

First assumption: object appearance does not change Does generally not hold

Template not fully representative of the object

tracking loss

Update the appearance at every frame Error accumulates

generates drift

Solution: 2-step tracking First compute a new position 𝑞𝑖 using the last template 𝑇𝑖 Then correct for drift: compute 𝑞𝑖 starting from 𝑞𝑖 with template 𝑇0

The Template Update Problem Iain Matthews, Takahiro Ishikawa, and Simon Baker

Proceedings of the British Machine Vision Conference, September, 2003.

The template update problem

Page 55: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

55

Works well for gradient trackers / template trackers

Not adapted for more complicated objects

Other strategies are necessary Use advance machine learning techniques

Create a pool of object appearances

Combine Tracking and Detection

The template update problem:

limitations

Page 56: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

56 Tracking-Learning-Detection, Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas, PAMI 2010

Example: Tracking-Learning-Detection

Page 57: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

57

Even if the appearance of the tracked object is well modelled,

the object can be occluded in the vision field of the camera

Handling Occlusion in Visual Tracking

Page 58: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

58

Two strategies for coping with occlusions

Taking partial occlusion into account: Divide the object into parts

Compute similarity of parts

Discard the worst part

Detecting loss of tracks When the object is not visible:

missing measurement

Kalman Filter propagates

uncertainty while meas.

is missing

Handling Occlusion in Visual Tracking

Page 59: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

Although the image sequence is the unique

source of information, different visual features

can provide different types of measurement

A visual tracker can fuse different features

Different strategies: Fusion at tracker level

Fusion at measurement level

Fusion of multiple source for tracking

59

Page 60: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

Parallel fusion of trackers Multiplication of the posterior

probabilities (indep. Trackers)

Trackers used as observables

of a Markov Network

Sequential fusion of trackers Features are considered

available at subsequent time-

instant

Uses an EKF approach

Tracker-level fusion

[E. Maggio and A. Cavallaro „Video Tracking, theory and practice“ – John Wiley & Sons, Ltd, 2011 60

Page 61: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

Measurements are combined internally by the

tracking algorithm

Mechanisms for fusion: Voting procedure

Saliency maps

Bayesian networks

Mutual Information

Multi-feature particle filter: linear combination

Measurement-level fusion

[E. Maggio and A. Cavallaro „Video Tracking, theory and practice“ – John Wiley & Sons, Ltd, 2011

𝑝 𝑧𝑡|𝑥𝑡 = 𝛼𝑗𝑝𝑗

𝑁

𝑗=1

𝑧𝑡|𝑥𝑡

61

Page 62: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

Requires an evaluation sequence with ground truth

Precision: distance in pixels between centers of

bounding boxes

Overlap: relative overlap between bounding boxes

Performance evaluation

precision overlap

A

B

C

𝑜 =𝐶

𝐴 + 𝐵 − 𝐶

𝑐𝐴

𝑐𝐵

𝑝 = 𝑐𝐴 − 𝑐𝐵 62

Page 63: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

Visualization for a full sequence through plots: Precision plots: % of frames below a given precision threshold

Success plots: % of frames above a given overlap threshold

Performance evaluation plots

63

Page 64: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

64

How good are current trackers?

Test with 50

sequences and 29

state-of-the-art

trackers

Results are

sequence-specific

Still no generic

object tracker

C. Bailer, A. Pagani, D. Stricker A Superior Tracking Approach: Building a strong Tracker through Fusion, ECCV 2014

Page 65: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

Detection and loss detection: (re-)initialize the system

Tracking: measurement processing and state update

Models: all useful prior information about objects, sensors, and environment

t It Visual

Processing Measurement Prediction Output

Tracking Update

Loss It Object Detection

Initial estimate

Models

Shape Appearance Degrees of

freedom Dynamics Sensors Context

Detection

65

Redetection – real tracker A complete tracking framework needs to know when it fails

Page 66: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

66

Visual Tracking is a very specific kind of

Bayesian tracking

Two major difficulties Difficult to establish a correct motion model

Measurement model needs an appearance model

One possible solution: track and detect object at

the same time

Conclusion

Page 67: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

67

„Video Tracking“ by E. Maggio

and A. Cavallaro

Online resources:

http://www.videotracking.org/

Literature and online sources

Page 68: Computer Vision Object and People Tracking. Alain Pagani Kaiserlautern ... Computer Vision Object and People Tracking . 2 Video Tracking ... Video tracking vs. Camera tracking 7

Thank You!

68


Top Related