using grammars for action recognition · aniket bera. video analysis with cfgs the “inverse...
TRANSCRIPT
Using Grammars for Action Recognition
Aniket Bera
Video analysis with CFGs
The “Inverse Hollywood problem”:
From video to scripts and storyboards via causal analysis.
Brand 1997
Action Recognition using Probabilistic Parsing.Bobick and Ivanov 1998
Recognizing Multitasked Activities from Video using
Stochastic Context-Free Grammar.
Moore and Essa 2001
13
CFG for human activities
enter detach leave enter detach attach touch touch detach attach leave
M. Brand. The "Inverse Hollywood Problem":From video to scripts and storyboards
via causal analysis. AAAI 1997.
14
Parse treeSCENE (Open up a PC)
IN ACTION (Open PC)
OUT IN
ADD ADD
enter detach leave enter
ACTION (unscrew) OUT
MOVE REMOVE
MOTION MOTION
detach attach touch touch detach attach leave
• Deterministic low-level primitive detection• Deterministic parsing
M. Brand. The "Inverse Hollywood Problem": From video to scripts and storyboards via causal analysis. AAAI 1997.
15
Stochastic CFGs
Action Recognition using Probabilistic Parsing.Bobick and Ivanov 1998
16
Gesture analysis with CFGs
Primitive recognition with HMMs
Action Recognition using Probabilistic Parsing. Bobick and Ivanov 1998 17
left-right
Action Recognition using Probabilistic Parsing. Bobick and Ivanov 1998 18
up-down
Action Recognition using Probabilistic Parsing. Bobick and Ivanov 1998 19
right-left
Action Recognition using Probabilistic Parsing. Bobick and Ivanov 1998 20
down-up
Action Recognition using Probabilistic Parsing. Bobick and Ivanov 1998 21
Parse Tree
S
RH
TOP UD BOT DU
LR RL
left-right up-down right-left down-up
22
Errors
Likelihood value over time (not discrete symbols)
HMM a
HMM b
Errors are inevitable…
but the grammar acts as a top-down constraint
Action Recognition using Probabilistic Parsing. Bobick and Ivanov 1998 23
Dealing with uncertainty & errors
Stolcke-Early (probabilistic) parser
SKIP rules to deal with insertion errors
HMM a
HMM b
HMM c
Action Recognition using Probabilistic Parsing. Bobick and Ivanov 1998 24
SCFG for Blackjack
Recognizing Multitasked Activities from Video usingStochastic Context-Free Grammar.
Moore and Essa 2001
• Deals with more complex activities• Deals with more error types
25
Stochastic Grammars: Overview
• Representation: Stochastic grammar• Terminals: object interactions• Context-sensitive due to internal scene models
• Domain: Towers of Hanoi• Requires activities with
strong temporal constraints
• Contributions• Showed recognition &
decomposition with veryweak appearance models
• Demonstrated usefulnessof feedback from high tolow-level reasoning components
Expectation Grammars(CVPR 2003)
• Analyze video of a person physically solving the Towers of Hanoi task
• Recognize valid activity
• Identify each move
• Segment objects
• Detect distracters / noise
System Overview
ToH: Low-Level Vision
Raw VideoBackground
Model
ForegroundComponents
Foreground andshadow detection
Low-Level Features• Explanation-based symbols
• Blob interaction events
• merge, split, enter, exit, tracked, noise
• Future Work: hidden, revealed, blob-part, coalesce
• All possible explanations generated• Inconsistent explanations heuristically pruned
Enter
Merge
Contributions
• Showed activity recognition and decomposition without appearance models
• Demonstrated usefulness of feedback from high-level, long-term interpretations to low-level, short-term decisions