stochastic grammars: overview representation: stochastic grammar representation: stochastic grammar...
DESCRIPTION
System OverviewTRANSCRIPT
Stochastic Grammars: Stochastic Grammars: OverviewOverview
• Representation: Stochastic grammarRepresentation: Stochastic grammar• Terminals: object interactionsTerminals: object interactions• Context-sensitive due to internal scene modelsContext-sensitive due to internal scene models
• Domain: Towers of HanoiDomain: Towers of Hanoi• Requires activities withRequires activities with
strong temporal constraintsstrong temporal constraints• ContributionsContributions
• Showed recognition &Showed recognition &decomposition with verydecomposition with veryweak appearance modelsweak appearance models
• Demonstrated usefulnessDemonstrated usefulnessof feedback from high toof feedback from high tolow-level reasoning componentslow-level reasoning components
• Extended SCFG: parameters and abstract scene modelsExtended SCFG: parameters and abstract scene models
Expectation GrammarsExpectation Grammars(CVPR 2003)(CVPR 2003)
• Analyze video of a Analyze video of a person physically person physically solving the Towers of solving the Towers of Hanoi taskHanoi task• Recognize valid activityRecognize valid activity• Identify each moveIdentify each move• Segment objectsSegment objects• Detect distracters / noiseDetect distracters / noise
System OverviewSystem Overview
Low-Level VisionLow-Level Vision• Foreground/background segmentationForeground/background segmentation
• Automatic shadow removalAutomatic shadow removal• Classification based onClassification based on
chromaticity andchromaticity andbrightness differencesbrightness differences
• Background ModelBackground Model• Per pixel RGB meansPer pixel RGB means• Fixed mapping from CDFixed mapping from CD
and BD to foregroundand BD to foregroundprobabilityprobability
ToH: Low-Level VisionToH: Low-Level VisionRaw Video Background
Model
ForegroundComponents
Foreground andshadow detection
Low-Level FeaturesLow-Level Features• Explanation-based symbolsExplanation-based symbols
• Blob interaction eventsBlob interaction events• mergemerge, , splitsplit, , enterenter, , exit, tracked, noiseexit, tracked, noise• Future WorkFuture Work: hidden, revealed, blob-part, : hidden, revealed, blob-part,
coalescecoalesce
• All possible explanations generatedAll possible explanations generated• Inconsistent explanations heuristically prunedInconsistent explanations heuristically pruned
EnterEnter
MergeMerge
Expectation GrammarsExpectation Grammars
• RepresentationRepresentation::• Stochastic Stochastic
grammargrammar• Parser augmented Parser augmented
with parameters with parameters and internal scene and internal scene modelmodel
ToH -> Setup, enter(hand), Solve, exit(hand);ToH -> Setup, enter(hand), Solve, exit(hand);Setup -> TowerPlaced, exit(hand);Setup -> TowerPlaced, exit(hand);TowerPlaced -> enter(hand, red, green, blue), TowerPlaced -> enter(hand, red, green, blue),
Put_1(red, green, blue);Put_1(red, green, blue);
Solve -> state(InitialTower), MakeMoves, Solve -> state(InitialTower), MakeMoves, state(FinalTower);state(FinalTower);
MakeMoves -> Move(block) [0.1]MakeMoves -> Move(block) [0.1]| Move(block), MakeMoves [0.9];| Move(block), MakeMoves [0.9];
Move -> Move_1-2 | Move_1-3 | Move_2-1 | Move_2-3 Move -> Move_1-2 | Move_1-3 | Move_2-1 | Move_2-3 | Move_3-1 | Move_3-2;| Move_3-1 | Move_3-2;
Move_1-2 -> Grab_1, Put_2;Move_1-2 -> Grab_1, Put_2;Move_1-3 -> Grab_1, Put_3;Move_1-3 -> Grab_1, Put_3;Move_2-1 -> Grab_2, Put_1;Move_2-1 -> Grab_2, Put_1;Move_2-3 -> Grab_2, Put_3;Move_2-3 -> Grab_2, Put_3;Move_3-1 -> Grab_3, Put_1;Move_3-1 -> Grab_3, Put_1;Move_3-2 -> Grab_3, Put_2;Move_3-2 -> Grab_3, Put_2;
Grab_1 -> touch_1, remove_1(hand,~) | touch_1(~), Grab_1 -> touch_1, remove_1(hand,~) | touch_1(~), remove_last_1(~);remove_last_1(~);
Grab_2 -> touch_2, remove_2(hand,~) | touch_2(~), Grab_2 -> touch_2, remove_2(hand,~) | touch_2(~), remove_last_2(~);remove_last_2(~);
Grab_3 -> touch_3, remove_3(hand,~) | touch_3(~), Grab_3 -> touch_3, remove_3(hand,~) | touch_3(~), remove_last_3(~);remove_last_3(~);
Put_1 -> release_1(~) | touch_1, release_1;Put_1 -> release_1(~) | touch_1, release_1;Put_2 -> release_2(~) | touch_2, release_2;Put_2 -> release_2(~) | touch_2, release_2;Put_3 -> release_3(~) | touch_3, release_3;Put_3 -> release_3(~) | touch_3, release_3;
Forming the Symbol Forming the Symbol StreamStream
• Domain independent blob interactions Domain independent blob interactions converted to terminals of grammar via heuristic converted to terminals of grammar via heuristic domain knowledgedomain knowledge• Examples: Examples: mergemerge + (x ≈ 0.33) → + (x ≈ 0.33) → touch_1touch_1
splitsplit + (x ≈ 0.50) → + (x ≈ 0.50) → remove_2remove_2• Grammar rule can only fire if internal scene Grammar rule can only fire if internal scene
model is consistentmodel is consistentwith terminalwith terminal• Examples: can’tExamples: can’t
remove_2 remove_2 if noif nodiscs on peg 2 (B)discs on peg 2 (B)
• Can’t move disc toCan’t move disc tobe on top of smallerbe on top of smallerdisc (C)disc (C)
ToH: Example FramesToH: Example Frames
Explicit noisedetection
Objects recognizedby behavior, not
appearance
ToH: Example FramesToH: Example Frames
Grammar canfill in for occluded
observations
Detection ofdistracterobjects
Finding the Most Likely Finding the Most Likely ParseParse
• Terminals and rules are probabilisticTerminals and rules are probabilistic• Each parse has a total probabilityEach parse has a total probability
• Computed by Earley-Stolcke algorithmComputed by Earley-Stolcke algorithm• Probabilistic penalty for insertion and Probabilistic penalty for insertion and
deletion errorsdeletion errors• Highest probability parse chosen as Highest probability parse chosen as
best interpretation of videobest interpretation of video
ContributionsContributions• Showed activity recognition and Showed activity recognition and
decomposition without appearance modelsdecomposition without appearance models
• Demonstrated usefulness of feedback from Demonstrated usefulness of feedback from high-level, long-term interpretations to low-high-level, long-term interpretations to low-level, short-term decisionslevel, short-term decisions
• Extended SCFG representational power with Extended SCFG representational power with parameters and abstract scene modelsparameters and abstract scene models
LessonsLessons• Efficient error recover important for realistic Efficient error recover important for realistic
domainsdomains
• All sources of information should be included (i.e., All sources of information should be included (i.e., appearance models)appearance models)
• Concurrency and partial-ordering are common, Concurrency and partial-ordering are common, thus should be easily representablethus should be easily representable
• Temporal constraints are not the only kind of Temporal constraints are not the only kind of action relationship (e.g., causal, statistical)action relationship (e.g., causal, statistical)
Representational IssuesRepresentational Issues• Extend temporal relationsExtend temporal relations
• ConcurrencyConcurrency• Partial-orderingPartial-ordering• Quantitative relationshipsQuantitative relationships
• Causal (not just temporal) Causal (not just temporal) relationshipsrelationships
• Parameterized activitiesParameterized activities