real-time tracking of multiple people using stereo david beymerbob bolles kurt konolige chris...
TRANSCRIPT
Real-time Tracking of MultiplePeople Using Stereo
David Beymer Bob Bolles
Kurt Konolige Chris Eveland
Artificial Intelligence Center SRI International
Problem: people tracking for surveillance
• return coarse 3D locations of people
• real-time on standard hardware
• multiple people in scene
• stationary camera
• consider: template-based tracking– maintain template of object
– correlation used to update object position
– template is recursively updated to handle changing object appearance
• limitations/problems1) object initialization/detection2) template drift
Approach
),( yxT
),( pp yx
),()1(),(),( pp yyxxIyxTyxT
Goal: add modality of stereo
• segmentation: background subtraction on stereo disparities to detect foreground
• detection: person templates encoding head and torso shape
• tracking:– person templates used to avoid drift– stereo segmentation used to add “support” template
left background disparities foreground
Approach
• detection– segment foreground into
depth layers
– correlate with person templates
• tracking– intensity and "support"
templates are recursively updated
– Kalman filtering on person location in 3D
– person templates used to avoid drift
backgroundsubtraction
backgroundinit
foreground
stereo
detection tracking
leftintensity
persontemplates
Related Work
• Companies– Teleos Research/Autodesk, People Tracker
– DEC/Compac, Smart Kiosk [Rehg, et al, 1997]
– Interval, Morphin' Mirror [Darrell, et al, 1998]
– Sarnoff [IUW, 1998]
– Texas Instruments [Flinchbaugh, 1998]
– Electric Planet
• Universities– MIT, Pfinder [Wren, et al, 1997]
– Toronto, [Fieguth and Terzopoulos, 1997
– Maryland, W S [Haritaoglu, et al., 1998]
– MIT, Forest of Sensors [Grimson, et al., 1998]
– CMU [Kanade, et al, 1998]
– Columbia/Lehigh [Nayar and Boult, 1998]
– Boston Univ., [Rosales and Sclaroff, 1998]
4
Stereo module: SRI's Small Vision System (SVS)
• Hardware– two CMOS cameras
– low power (150mW), inexpensive ($100 components)
– adjustable baseline: 2.7'' to 6.2'' in 1'' increments
– another version with DSP processing onboard
• Software– stereo algorithm is area
correlation based
– optimized C and MMX code
– 20 Hz on 320x240 image, 24 disparities, 400 MHz Pentium II
SVS Stereo Results
left right
disparities
notation:
),( yxd
),(0 yxd
current disparities
background estimate
• look for disparities closer than background
• using stereo disparities versus intensities
Background subtraction
),( yxfotherwise 0
undefined),( and defined
orthresh,),(),(if),(
0
0
yx(x,y)
yxyxyx
dd
ddd
left ),(background 0 yxd ),(sdisparitie yxd ),(foreground yxf
+less sensitive to lighting changes, shadows
+can segment people at different depths
–more computationally expensive
–tends to blur & expand object boundaries
• idea: range info from stereo can be used to fix scale of processing avoid search over scale parameter
– person width is proportional to disparity
– from similar triangles:
– stereo equation:
Handling scale
image
COP
f
z
ww'
'w
f
w
z
const' wfwz
z
bfd
dKw '
d: disparityb: baseline
K: constant
Detection
foregroundf(x,y)
histogram anotherpeak?
thresholddisparities
correlate withpersontemplate
foundperson?
remove personfrom layer(x,y)
exit
no
yes
no
yes
disparity
count
layer(x,y)
Tracking Steps• prediction
– predict Kalman filter (X, Z)
– predict person disparity
• segmentation– select foreground layer around predicted disparity
• localization– correlate gray level template against left image, weighted by support
template [coarse localization]
– correlate head/torso shape template against segmented foreground layer [re-centering step that addresses template drift]
• update– Kalman filter
– recursive update of intensity and support templates
Tracking Videos
• recursive template update
walking figure eight running
Please click on image to start video. Once finished viewing the video, use the “back” button on your browser to return.
Tracking Videos
visualizing tracks from map view
tracking under multiple occlusions
Please click on image to start video. Once finished viewing the video, use the “back” button on your browser to return.
Tracking: quantitative results
Sequence # people # occlusions TR FP MTD1 1 0 96% 0% 6.02 1 0 98% 0% 4.03 1 0 96% 0% 10.04 2 0 89% 10% 2.55 2 0 92% 6% 11.06 2 1 86% 0% 9.07 3 2 79% 3% 7.78 4 2 85% 2% 5.09 3 6 84% 4% 5.810 5 10 78% 1.3% 6.611 4 9 69% 5.6% 7.012 5 20 68% 3.2% 5.413 5 28 70% 6.7% 6.2
TR = tracking rate FP = false positive rate
MTD = mean time to detect
Evaluating use of stereo in tracker
• Experiment: disable stereo in tracker– code modifications:
• disable re-centering step
• weighted intensity correlation unweighted correlation
– results:• mean tracking rate (TR) drops 4%
• mean false positive rate (FP) increases from 3% to 10%
• (qualitative) template drift causes people to be lost and re-detected
Conclusion
• Stereo is an effective segmentation tool:– detection: provides a foreground layer divided into different depth
layers
– tracking: helps to avoid template drift by focusing on foreground pixels at object’s depth
• Combine segmentation with priors on person shape (i.e. head/torso templates) for person localization.