seeing action: part 2 statistics and/or structure

47
Seeing Action: Part 2 Statistics and/or Structure Aaron Bobick [email protected] School of Interactive Computing College of Computing Georgia Tech

Upload: donna-tran

Post on 02-Jan-2016

15 views

Category:

Documents


0 download

DESCRIPTION

Seeing Action: Part 2 Statistics and/or Structure. Aaron Bobick [email protected] School of Interactive Computing College of Computing Georgia Tech. Context. Continuing from the lower middle???. Three levels of understanding motion or behavior: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Seeing Action: Part 2 Statistics and/or Structure

Seeing Action: Part 2

Statistics and/or Structure

Aaron [email protected]

School of Interactive Computing College of Computing

Georgia Tech

Page 2: Seeing Action: Part 2 Statistics and/or Structure

Continuing from the lower middle???

Three levels of understanding motion or behavior:Movement - atomic behaviors defined by motion

"Bending down", "(door) rising up", Swinging a hammer

Action – a single, semantically meaningful "event""Opening a door", "Lifting a package"Typically short in timeMight be definable in terms of motion; especially so in a

particular context.

Activity – a behavior or collection of actions with a purpose/intention.

"Delivering packages"Typically has causal underpinningsCan be thought of as statistically structured events

Maybe Actions are movements in context??

Context

Page 3: Seeing Action: Part 2 Statistics and/or Structure

Structure and Statistics (Old and new)

Grammar-based representation and parsing– Highly expressive for activity description – Easy to build higher level activity from reused low level

vocabulary.

P-Net (Propagation nets) – really stochastic Petri nets– Specify the structure – with some annotation can learn

detectors and triggering probabilities

Statistics of events – Low level events are statistically sequenced – too hard to

learn full model.– N-grams or suffix trees

Page 4: Seeing Action: Part 2 Statistics and/or Structure

"Higher-level" Activities: Known structure, uncertain elements

Many activities are comprised of a priori defined sequences of primitive elements.– Dancing, conducting, pitching, stealing a car from a

parking lot.– The states are not hidden.

The activities can be described by a set of grammar-like rules; often ad hoc approaches taken.

But, the sequences are uncertain:– Uncertain performance of elements– Uncertain observation of elements

Page 5: Seeing Action: Part 2 Statistics and/or Structure

The basic idea and approach

Low-level primitives with uncertain feature detection (individual elements might be HMMs)

High-level description found by parsing input stream of uncertain primitives.

Extend Stochastic Context Free Grammars to handle perceptually relevant uncertainty.

Idea: split the problem into:

Approach:

Page 6: Seeing Action: Part 2 Statistics and/or Structure

Stochastic CFGs

Traditional SCFGs have probabilities associated with the production rules. Traditional parsing yields most likely parse given a known set of input symbols.PIECE -> BAR PIECE | [0.5]

BAR [0.5]

BAR -> TWO | [0.5]

THREE [0.5]

THREE -> down3 right3 up3 [1.0]

TWO -> down2 up2 [1.0]

Thanks to Andreas Stolcke’spriori work on parsing SCFGsusing efficient Earley parser.

Page 7: Seeing Action: Part 2 Statistics and/or Structure

Extending SCFGs (Ivanov and Bobick, PAMI)

Within the parser we handle:– Uncertainty about input symbols

Input is multi-valued string (vector of likelihoods)– Deletion, substitution, and insertion errors

Introduce error rules – Individually recognized primitives typically temporally

inconsistentIntroduce penalty for overlap.Spatial and temporal consistency enforced.

Need to define when a symbol has been generated. How do we learn production probabilities? (Not many

examples.) Make sure not too sensitive to them.

Page 8: Seeing Action: Part 2 Statistics and/or Structure

Enforcing temporal consistency

Output of one HMM parsing backwards

Time

P(p

rimiti

ve) - Output event

Page 9: Seeing Action: Part 2 Statistics and/or Structure

Video Sample

Page 10: Seeing Action: Part 2 Statistics and/or Structure

Event Grammar and Parsing

Tracker generates events: ENTER, LOST, FOUND, EXIT, STOP. Tracks have properties (e.g. size) and trajectories.

Tracker assigns class to each event, though only probabilistically.

Parser parses single stream that contains interleaved events: (CAR-ENTER, CAR-STOP, PERSON-FOUND, CAR-EXIT, PERSON-EXIT)

Parser enforces spatial and temporal consistency for each object class and interactions (e.g. to be a PICK-UP, the PERSON-FOUND event must be close to CAR-STOP)

Spatial and temporal consistency eliminates symbolic ambiguity.

Page 11: Seeing Action: Part 2 Statistics and/or Structure

Advantages of SCFGs

What grammar can do (simplified):CAR_PASS -> CAR_ENTER CAR_EXIT |

CAR_ENTER CAR_HIDDEN CAR_EXIT

CAR_HIDDEN -> CAR_LOST CAR_FOUND | CAR_LOST CAR_FOUND CAR_HIDDEN

Skip allows concurrency (and junk):PERSON_LOST -> person_lost | SKIP person_lost

Concurrent parse:Events: ce pe cl cf cs px pl cxPICKUP -> ce pe cl cf cs px pl cxP_PASS -> ce pe cl cf cs px pl cx

Page 12: Seeing Action: Part 2 Statistics and/or Structure

Parsing System

Page 13: Seeing Action: Part 2 Statistics and/or Structure

Parse 1: Person-pass- through

Page 14: Seeing Action: Part 2 Statistics and/or Structure

Parse 2: Drive-in

Page 15: Seeing Action: Part 2 Statistics and/or Structure

Parse 3: Car-pass-through

Page 16: Seeing Action: Part 2 Statistics and/or Structure

Parse 4: Drop-off

Page 17: Seeing Action: Part 2 Statistics and/or Structure

Advantages of STCFG approach

Structure and components of activities defined a priori and are the right levels of annotation to recover (compare to HMMs).

FSM vs CFG is not the point. Rather explicit representation of structural elements and uncertainties.

Often many (enough) examples of each primitive to support training, but not of higher level activity.

Allows for integration of heterogeneous primitive detectors; only assumes likelihood generation.

More robust than ad-hoc rule based techniques: handles errors through probability.

No notion of causality, or anything other than (multi-stream) sequencing.

Page 18: Seeing Action: Part 2 Statistics and/or Structure

Advantages of STCFG approach

Structure and components of activities defined a priori and are the right levels of annotation to recover (compare to HMMs).

FSM vs CFG is not the point. Rather explicit representation of structural elements and uncertainties.

Often many (enough) examples of each primitive to support training, but not of higher level activity.

Allows for integration of heterogeneous primitive detectors; only assumes likelihood generation.

More robust than ad-hoc rule based techniques: handles errors through probability.

No notion of causality, or anything other than (multi-stream) sequencing.

Page 19: Seeing Action: Part 2 Statistics and/or Structure

Some Q's about Representations…

Scope and Range:– thoughts???

"Language" of the representation– Grammar of explicit symbols

Computability of an instance:– Quite easy. Given the input string the parsing is both the

computation and the matching Learnability of the "class":

– Inside-outside algorithm for learning CFGs but lets be serious…

Stability in face of perceptual uncertainty– Explicitly designed to handle this uncertainty

Inference-support– Depends on what you mean by inference. No notion of

real semantics or explicit time.

Page 20: Seeing Action: Part 2 Statistics and/or Structure

P-Nets (Propagation Networks) (Shi and Bobick, ’04 and ’06)

Nodes represent activation intervals

– Active vs. inactive: Token propagation

More than one node can be active at a time!Links represent partial order as well logical constraintDuration model on each link and node:

–Explicit model on length of activation –Explicit model on length between successive intervals

Observation model on each node

Page 21: Seeing Action: Part 2 Statistics and/or Structure

Conceptual Schema

Logical relation– Autonomous assumption: logic constraint only exists at

start/end points of any intervals– Condition probability function can represent any logical function

Examples of logic constraint

Page 22: Seeing Action: Part 2 Statistics and/or Structure

Propagation Net – Computing

Computational SchemaA DBN style rollout to compute corresponding

conceptual schema

Page 23: Seeing Action: Part 2 Statistics and/or Structure

Experiment: Glucose Project

Task: monitor an user to calibrate a glucose meter and point out operating error as feedback.

Constructed 16 node P-Net as representation 3 subjects with total of 21 perfect sequences,

10 missing_1_step sequences and 10 missing_6_steps sequences

Page 24: Seeing Action: Part 2 Statistics and/or Structure

D-Condensation

Initiate 1 particle at dummy starting nodeRepeat

For each particlegenerate all possible consequent statescalculate the probability for each states

EndSelect n particles to survive

Until the final time steps is reachedOutput the path represented by the particle with

highest probability

Page 25: Seeing Action: Part 2 Statistics and/or Structure

Experiment: Glucose Meter Calibration

Page 26: Seeing Action: Part 2 Statistics and/or Structure

Experiment: Classification Performance

Page 27: Seeing Action: Part 2 Statistics and/or Structure

Experiment: Label individual frames

Labeling individual nodes Labels on Node J: Insert

Page 28: Seeing Action: Part 2 Statistics and/or Structure

And now some statistics…

Problem: the higher level world of activity is not usually a P-Net or an FSM or an HMM or …

Two possible solutions:1. Understand what's really going on…

…another time.

2. Lose the structure

Page 29: Seeing Action: Part 2 Statistics and/or Structure

Stochastic sequencing (Hamid and Bobick)

A priori define some low-level "actions"/events that can be stochastically detected in context – e.g. Door opening

Collect training data (streams of events) of activities – making a delivery, UPS pick-up, trash collection

Collect histograms of N-tuples and do both activity discovery and recognition

–Later can focus on anomalies

Advantages: cheat where easy, learn the hard stuff, exploit the context

Page 30: Seeing Action: Part 2 Statistics and/or Structure

Barnes & Nobles Loading Dock

Page 31: Seeing Action: Part 2 Statistics and/or Structure

Barnes & Nobles Loading Dock

Page 32: Seeing Action: Part 2 Statistics and/or Structure

Barnes & Nobles Loading Dock

Page 33: Seeing Action: Part 2 Statistics and/or Structure

Barnes & Nobles Loading Dock

Page 34: Seeing Action: Part 2 Statistics and/or Structure

Two levels in the representation

Low Level: Events (computer vision problem)– Background subtraction and Foreground

extraction (better “modeling”)– Classifying (per frame) each foreground object

as either• Person• Vehicle (what type if possible)• Package• Tool used to move packages• Miscellaneous object

– Tracking people, vehicles, packages, tools, and miscellaneous objects over multiple frames

Page 35: Seeing Action: Part 2 Statistics and/or Structure

Two levels in the representation

Higher Level: Statistical characterization of subsequences– Instances of same activity class have certain common

subsequences.– But, partially ordered will typically rearrange subsequences

within the sequence. – Find a “soft” characterization of the statistics of the

subsequences – Deifne similarity measure for such characterization.

• Allows discovery of activity classes • Allows for detection of anomalous examples

Caveats: – We provide the events – whether it’s manually or specifying

the detector doesn’t really matter (except for publication)– Training needs pre-segmentation

Page 36: Seeing Action: Part 2 Statistics and/or Structure

Stochastic sequencing: n-grams

Page 37: Seeing Action: Part 2 Statistics and/or Structure

Experimental Setup – Loading Dock

Barnes & Noble Loading Dock Area

One month worth of data:–5 days a week–9 a.m. till 5 p.m.

Event Vocabulary – 61 events

–Hand-labeled for testing activity labeling, noise sensitivity.–Training detectors for these events

Bird’s Eye View of Experimental Setup

Page 38: Seeing Action: Part 2 Statistics and/or Structure

B&N Processing Video

Page 39: Seeing Action: Part 2 Statistics and/or Structure

Activity-Class Discovery

Treating activities as individual instances

Activity-class discovery – finding maximal cliques in edge weighted graphs

Need to come up with:– Activity Similarity Metric– Procedure to group similar activities

Page 40: Seeing Action: Part 2 Statistics and/or Structure

Activity Similarity

Two types of differences–structural differences–frequency differences

Sim(A,B) = 1 – normalized difference between the counts of non-zeros event n-grams

Properties–additive identity–is commutative–does not follow triangular in-equality

Page 41: Seeing Action: Part 2 Statistics and/or Structure

Activity-Class Discovery

A graphic theoretic problem of finding maximal cliques in edge-weighted graphs [Pavan, Pelillo ‘03]

Sequentially find maximal cliques in edge weighted graph of activities

Activities different enough from all the regular activities are anomalies

Page 42: Seeing Action: Part 2 Statistics and/or Structure

Activity-Class Discovery – Dominant Sets

Page 43: Seeing Action: Part 2 Statistics and/or Structure

Anomaly Detection

Compute the within-Class similarity of the test activity w.r.t. previous class members

Learn the detection threshold from training data – can be done using an R.O.C.

Page 44: Seeing Action: Part 2 Statistics and/or Structure

Anomaly "Explanation"

Explanatory features – their frequency has high mean and low variance

Explanation based on features that were:

– Missing from an anomaly but were frequently and consistently present in regular members

– Extraneous in an anomaly but consistently absent from the regular members

Page 45: Seeing Action: Part 2 Statistics and/or Structure

Results

UPS Delivery VehiclesFed Ex Delivery VehiclesDelivery Trucks – multiple packages deliveredCars and vans, only 1 or 2 packages deliveredMotorized cart used to pick and drop packagesVan deliveries – no use of motorized cartDelivery trucks – multiple people

General Characteristics ofDiscovered Activity Classes

Few of the detected Anomalies

(a) Back door of delivery not closed(b) More than usual number of people involved in unloading(c) Very few vocabulary events performed

Page 46: Seeing Action: Part 2 Statistics and/or Structure

Results

Are the detected anomalous activities ‘interesting’ from human view-point?

Anecdotal Validation:– Studied 7 users– Showed each user 8 regular activities selected

randomly– Showed each user 10 test activities, 5 regular and 5

detected anomalous activities– 8 out of 10 activity-labels of the users matched the

labels of our system– Probability of this match happening by chance is 4.4%

Page 47: Seeing Action: Part 2 Statistics and/or Structure

Some Q's about Representations… (more discussion)

Scope and Range: – A monitored scene with pre-designed detectors

"Language" of the representation– Histograms and other statistics of feature n-gram occurrences

Computability of an instance:– Given detectors, easy to compute

Learnability of the "class":– Full power of statistical learning. Even allowing notion

of outlier detector. Stability in face of perceptual uncertainty

– Fair. Needs to be better. Inference-support

– Distance-in-feature space reasoning only.