multi-person multi-camera tracking for easyliving john krumm steve harris brian meyers barry brumitt...
TRANSCRIPT
Multi-Person Multi-Camera Tracking for EasyLiving
John Krumm Steve Harris Brian Meyers
Barry Brumitt Michael Hale Steve Shafer
Vision Technology Research GroupMicrosoft ResearchRedmond, WA USA
What Is EasyLiving?EasyLiving is a prototype architecture and technologies for building intelligent environments that facilitate the unencumbered interaction of people with other people, with computers, and with devices.
Example Behaviors
• Adjust lights as you move around a space
• Route video to best display
• Move your Windows session as you move
• Deliver e-messages to where you are
• Monitor a young child or old person
EasyLiving Demo (7 min.)
Self-Aware SpaceEasyLiving must know about people, computers, software, devices, and geometry to work right.
Who’s Where?
Person-TrackingSystem
5 Triclops stereo cameras
5 PCs running “Stereo Module”(and Microsoft Windows 2000)
1 PC running “Person Tracker”
(only U.S. $319)
(includes Internet Explorer)
(as part of the OS)
TriclopsColor Stereo
Cameras
Stereo Processingand Person Detection
PersonTracking
(for a limited time only?)
Triclops Cameras
Now superceded by “Digiclops” digital IEEE-1394 version
Typical Images
Color image from Triclops Disparity image from Triclops
RequirementsTo work in a real-life intelligent environment, our tracking system must …
1. Maintain location & identity of people
2. Run at reasonable speeds (we get 3.5 Hz)
3. Work with multiple people (we handle up to three)
4. Create and delete people instances
5. Work with multiple cameras (we’re up to five)
6. Use cameras in the room
7. Work for extended periods
8. Tolerate partial occlusions and variable postures
Other SystemsNon-Vision• Olivetti Research (’92) & Xerox PARC (’93) – IR badges• AT&T Laboratories (Cambridge) (’97) – Ultrasonic badges• PinPoint, Ascension, Polhemus – commercial RF badges
Vision (for multiple people)• Haritaoglu & Davis (’98-’99)• Darrell et al. (’98)• Orwell et al. (’99)• Collins et al. (’99)• Rosales & Sclaroff (’99)• Kettnaker & Zabih (’99)• Intille et al. (’95, ’97)• Rehg et al. (’97)• Boult et al. (‘99)• Stiefelhagen et al. (’99)• MacCormick & Blake (’99)• Cai & Aggarwal (’98)• Halevi & Weinshall (’97)• Gavrila & Davis (’96)
“I see by the current issue of ‘Lab News,’ Ridgeway, that you’ve been working for the last twenty years on the same problem I’ve been working on for the last twenty years.”
Why Use Vision?Alternative sensors:• Active badges• Pressure-sensitive floors• Motion sensors• Localized sensors, e.g. on door, chair
But …• Cameras are getting cheap• Cameras are easy to install• Cameras give location and identity• Cameras can find other objects, e.g. video screens• Cameras can be use to model room geometry
(active badge)
Person Detection Steps
1. Background subtraction2. Blob clustering3. Histogram identification
Camera calibrationBackground modeling
Camera Calibration
0
0.5
1
1.5
2
2.5
3
3.5
4
-2 -1 0 1 2
0
0.5
1
1.5
2
2.5
3
3.5
4
-2 -1 0 1 2
• All tracking done in ground plane• Record path of single person walking around room• Compute (x,y,) that best aligns paths• Requires robust alignment to deal with outliers
Paths before calibration Paths after calibration
Background Modeling
View of space Combined color & disparity background image
Background SubtractionForeground if:
• valid depth over invalid depth- OR -
• depth difference > Td
- OR -• any (R,G,B) difference > Tc
• Color takes over when person sinks into couch cushions• Potential problem when person walks in front of moving video
(thus turn on moving video when acquiring background)
Person Detection
Region-growing on foreground pixels gives fragmented blobs
Group blobs into people-shaped clusters
Blob Clustering• Minimum spanning tree• Break really long links• Find five remaining longest links• Break all combinations of these five:
1 2 3 4 5
1 0 0 0 0 0
2 0 0 0 0 1
3 0 0 0 1 0
30 1 1 1 0 1
31 1 1 1 1 0
32 1 1 1 1 1
• Covariance matrices of 3D coordinates of linked blobs• Eigenvalues of covariance matrices• Compare eigenvalues to person model
Color Histograms• Identify people with RGB color histograms, 16x16x16• Each camera PC maintains its own histograms• Space-variant histograms built as person moves around room• Person tracker uses histogram to resolve ambiguities
windowwindowBluish tint
Regular color
So FarTriclops
Color StereoCameras
Stereo Processingand Person Detection
PersonTracking
calibration, background
• background subtraction (color & depth)• blob clustering• histogram maintenance
Person Tracking• Takes reports from stereo modules• Transforms to common coordinate frame
(common coordinate frame)
Person Tracking – Steady State
One “track” for each person
Predicted location
Resolve with color histograms
Feed back results to stereo modules for histogram updating
Person Tracking – Bad Data
Measurement Noise:• Computed position based on predicted position from many reports
Occlusions:• Multiple cameras• Long timeout on unsupported tracks
Person Creation Zone
• Tracks begin and end here• Initial tracks are provisional• Makes remainder of room more robust
Summary• Live demos, 20 minutes long• Person tracker runs at 3.5 Hz• Up to three people in room• People can:
• enter• leave• walk around• stop moving• sit• collide
Recent Efforts• Stop breaking the vision system!
• Moved chairs & changing lights bad background model• Special behavior, e.g. slow through person creation zone• Lots of people, e.g. around conference table
• Find other objects to enable interesting behaviors, e.g. “Where’s that book?”
• Easier method to model room geometry
Workshop on Multi-Object Tracking