actions in video
DESCRIPTION
Actions in video. Monday, April 25 Kristen Grauman UT-Austin. Today. Optical flow wrapup Activity in video Background subtraction Recognition of actions based on motion patterns Example applications. Using optical flow: recognizing facial expressions. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/1.jpg)
Actions in videoMonday, April 25Kristen Grauman
UT-Austin
![Page 2: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/2.jpg)
Today
• Optical flow wrapup• Activity in video
– Background subtraction– Recognition of actions based on motion patterns– Example applications
![Page 3: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/3.jpg)
Using optical flow:recognizing facial expressions
Recognizing Human Facial Expression (1994)by Yaser Yacoob, Larry S. Davis
![Page 4: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/4.jpg)
Using optical flow:recognizing facial expressions
![Page 5: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/5.jpg)
Example use of optical flow: facial animation
http://www.fxguide.com/article333.html
![Page 6: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/6.jpg)
Example use of optical flow: Motion Paint
http://www.fxguide.com/article333.html
Use optical flow to track brush strokes, in order to animate them to follow underlying scene motion.
![Page 7: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/7.jpg)
Video as an “Image Stack”
Can look at video data as a spatio-temporal volume• If camera is stationary, each line through time corresponds
to a single ray in space
t0
255time
Alyosha Efros, CMU
![Page 8: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/8.jpg)
Input Video
Alyosha Efros, CMU
![Page 9: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/9.jpg)
Average Image
Alyosha Efros, CMU
![Page 10: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/10.jpg)
Slide credit: Birgi Tamersoy
![Page 11: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/11.jpg)
Background subtraction
• Simple techniques can do ok with static camera• …But hard to do perfectly
• Widely used:– Traffic monitoring (counting vehicles, detecting &
tracking vehicles, pedestrians),– Human action recognition (run, walk, jump, squat),– Human-computer interaction– Object tracking
![Page 12: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/12.jpg)
Slide credit: Birgi Tamersoy
![Page 13: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/13.jpg)
Slide credit: Birgi Tamersoy
![Page 14: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/14.jpg)
Slide credit: Birgi Tamersoy
![Page 15: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/15.jpg)
Slide credit: Birgi Tamersoy
![Page 16: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/16.jpg)
Frame differencesvs. background subtraction
• Toyama et al. 1999
![Page 17: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/17.jpg)
Slide credit: Birgi Tamersoy
![Page 18: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/18.jpg)
Average/Median Image
Alyosha Efros, CMU
![Page 19: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/19.jpg)
Background Subtraction
-
=
Alyosha Efros, CMU
![Page 20: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/20.jpg)
Pros and consAdvantages:• Extremely easy to implement and use!• All pretty fast.• Corresponding background models need not be constant, they
change over time.
Disadvantages:• Accuracy of frame differencing depends on object speed and
frame rate• Median background model: relatively high memory requirements.• Setting global threshold Th…
When will this basic approach fail?Slide credit: Birgi Tamersoy
![Page 21: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/21.jpg)
Background mixture models
• Adaptive Background Mixture Models for Real-Time Tracking, Chris Stauer & W.E.L. Grimson
Idea: model each background pixel with a mixture of Gaussians; update its parameters over time.
![Page 22: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/22.jpg)
Background subtraction with depth
How can we select foreground pixels based on depth information?
![Page 23: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/23.jpg)
Today
• Optical flow wrapup• Activity in video
– Background subtraction– Recognition of action based on motion patterns– Example applications
![Page 24: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/24.jpg)
Human activity in video
No universal terminology, but approximately:• “Actions”: atomic motion patterns -- often gesture-
like, single clear-cut trajectory, single nameable behavior (e.g., sit, wave arms)
• “Activity”: series or composition of actions (e.g., interactions between people)
• “Event”: combination of activities or actions (e.g., a football game, a traffic accident)
Adapted from Venu Govindaraju
![Page 25: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/25.jpg)
Surveillance
http://users.isr.ist.utl.pt/~etienne/mypubs/Auvinetal06PETS.pdf
![Page 26: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/26.jpg)
2011
Interfaces
![Page 27: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/27.jpg)
2011W. T. Freeman and C. Weissman, Television control by hand gestures, International Workshop on Automatic Face- and Gesture- Recognition, IEEE Computer Society, Zurich, Switzerland, June, 1995, pp. 179--183. MERL-TR94-24
1995
Interfaces
![Page 28: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/28.jpg)
• Model-based action/activity recognition:– Use human body tracking and pose estimation
techniques, relate to action descriptions (or learn)– Major challenge: accurate tracks in spite of occlusion,
ambiguity, low resolution
• Activity as motion, space-time appearance patterns– Describe overall patterns, but no explicit body tracking– Typically learn a classifier– We’ll look at some specific instances…
Human activity in video:basic approaches
![Page 29: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/29.jpg)
Motion and perceptual organization• Even “impoverished” motion data can evoke
a strong percept
![Page 30: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/30.jpg)
Motion and perceptual organization• Even “impoverished” motion data can evoke
a strong percept
![Page 31: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/31.jpg)
Motion and perceptual organization• Even “impoverished” motion data can evoke
a strong percept
Video from Davis & Bobick
![Page 32: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/32.jpg)
Using optical flow:action recognition at a distance
• Features = optical flow within a region of interest• Classifier = nearest neighbors
[Efros, Berg, Mori, & Malik 2003]http://graphics.cs.cmu.edu/people/efros/research/action/
The 30-Pixel Man
Challenge: low-res data, not going to be able to track each limb.
![Page 33: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/33.jpg)
Correlation-based trackingExtract person-centered frame window
Using optical flow:action recognition at a distance
[Efros, Berg, Mori, & Malik 2003]http://graphics.cs.cmu.edu/people/efros/research/action/
![Page 34: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/34.jpg)
Extract optical flow to describe the region’s motion.
Using optical flow:action recognition at a distance
[Efros, Berg, Mori, & Malik 2003]http://graphics.cs.cmu.edu/people/efros/research/action/
![Page 35: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/35.jpg)
InputSequence
Matched Frames
Use nearest neighbor classifier to name the actions occurring in new video frames.
Using optical flow:action recognition at a distance
[Efros, Berg, Mori, & Malik 2003]http://graphics.cs.cmu.edu/people/efros/research/action/
![Page 36: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/36.jpg)
Using optical flow:action recognition at a distance
InputSequence
Matched NN Frame
Use nearest neighbor classifier to name the actions occurring in new video frames.
[Efros, Berg, Mori, & Malik 2003]http://graphics.cs.cmu.edu/people/efros/research/action/
![Page 37: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/37.jpg)
Do as I do: motion retargeting
[Efros, Berg, Mori, & Malik 2003]http://graphics.cs.cmu.edu/people/efros/research/action/
![Page 38: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/38.jpg)
Motivation• Even “impoverished” motion data can evoke
a strong percept
![Page 39: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/39.jpg)
Motion Energy Images
D(x,y,t): Binary image sequence indicating motion locations
Davis & Bobick 1999: The Representation and Recognition of Action Using Temporal Templates
![Page 40: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/40.jpg)
Motion Energy Images
Davis & Bobick 1999: The Representation and Recognition of Action Using Temporal Templates
![Page 41: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/41.jpg)
Motion History Images
Davis & Bobick 1999: The Representation and Recognition of Action Using Temporal Templates
![Page 42: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/42.jpg)
Image momentsUse to summarize shape given image I(x,y)
Central moments are translation invariant:
![Page 43: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/43.jpg)
Hu moments
• Set of 7 moments• Apply to Motion History Image for global
space-time “shape” descriptor• Translation and rotation invariant• See handout
],,,,,,[ 7654321 hhhhhhh
![Page 44: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/44.jpg)
Pset 5
Nearest neighbor action classification with Motion History Images + Hu moments
Depth map sequence Motion History Image
![Page 45: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/45.jpg)
Summary
• Background subtraction: – Essential low-level processing tool to segment
moving objects from static camera’s video• Action recognition:
– Increasing attention to actions as motion and appearance patterns
– For instrumented/constrained environments, relatively simple techniques allow effective gesture or action recognition
![Page 46: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/46.jpg)
![Page 47: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/47.jpg)
1h
2h3h
4h5h
6h
Hu moments
![Page 48: Actions in video](https://reader035.vdocuments.site/reader035/viewer/2022062410/56816641550346895dd9afaf/html5/thumbnails/48.jpg)
7h