petri-nets for video event understanding gal lavee and ehud rivlin technion – israel institute of...

Petri-Nets for Video Event Understanding

Gal Lavee and Ehud Rivlin

Technion – Israel Institute of Technology

Motivation

• Huge volume of ‘interesting’ video data to process• Growing visual surveillance demands

– Control of the security sensitive areas like stores, airports, parking lots, banks and other public places

• Video database analysis demands– Offline surveillance data analysis (e.g. abnormal behaviour detection, specific event

querying)

• Well-understood supporting work– Feature extraction– Object detection– Object tracking– Object recognition– “Low-level” event recognition– Event domain knowledge specification using ontology languages

Its an Important Problem!!

Video EventsActivities- Bobick

Composite Events- Bremond

Known Formalisms• FSMs( Sequence)• Bayesian Networks

(Uncertainty, Factorize State Space)• HMMs (Sequence , Uncertainty)• CRFS

( Sequence, Uncertainty, relax Independence Assumptions)

• Grammars ( Hierarchy, Partial Ordering)

• What About?• Long Term Temporal Dependencies?• Concurrency?• Temporal, spatial, logical relations?• “Incomplete” Events ?

What About Semantics?

Petri Net Solution for Event Understanding• Petri-Net formalism (defined shortly)

• Represents the dynamic evolution of the video sequence

• Encodes semantic knowledge of the domain

• Formalism naturally captures inherent properties of video events – Logical, Temporal and Spatial composition– Concurrency and Partial Ordering– Long-term temporal dependence

• Recursive Reuse of Fragments (Hierarchy)

• Recognizes events and…..

• Snapshot of the state of the system at any time before or after an event has occurred

• How far away is event of interest?

Petri Nets- Defined• A graphical tool for formal description of system dynamics.

• What graph components does Petri net comprise?– Place nodes ()- describe possible local system states.– Transition nodes () - describe events that may modify the state.– Arcs () - specify the relation between

system states and events.– Tokens - markers that reside in place nodes and are used to specify the

PN state

Puser_active

Tuser_requestPrequesting

Presource_idle

Tstart Tend

Presource_busy

Puser_access

Shared Resource Example:

Petri Nets ctd.

Petri Nets ctd.

• The instant location of the tokens (called marking) defines the current state of the model

• Enabling Rule

– Enabled if all input places have tokens.

– Additional requirements may be added (conditional transitions)

• When transition fires:

– Tokens from input places deleted

– Tokens placed in output places

Nothing to do with Petri Dish

Petri Net Properties

• Petri Net formalism is useful for modeling

• Logical Relations



• Temporal Relations



• Concurrency and Partial Ordering

Reachability set and Reachability graph• Given a PN model and an initial marking

– Compute Reachabillty Set (set of reachable markings)– Reachabillity graph visualizes the path taken to each particular state

MarkingM0M1M2

Puser_active1

Prequesting1

Puser_access1

Presource_busy1

Presource_idle11

M0

M1 M2

Reachability Set: Reachability Graph:

Petri Net Extensions• Priority Transitions

– Each transition is associated with a natural number – Allows resolution of conflicts (e.g. multiple transitions which share an input)

• Timed Transitions– Extends priority concept– Associates a real interval of time with each transition– Must elapse between enabling and firing– Model real world phenomenon

Petri Net Extensions

• Stochastic Petri Nets – Stochastic timed transition delays (discrete-state stochastic process)– Exponential distribution for transition delays

• Negative exponential probability density function (PDF)

• Generalized Stochastic Petri Net is PN where:– Immediate transitions coexist with stochastic timed transitions

nntn eD /1

PN Video Event Models

• PNs in the literature have not been built in an agreed upon fashion

• However two distinct classes of building PN event models have emerged

Plan Petri-Nets Object Petri-Nets

PN Video Event Models

Object PNPlan PN

TokensObjectsPlan Progress

PlacesObject StatesPlan States

TransitionsObject State Change

(Events)

Plan Advancement

Enabling Rules

Conditioned on Object Properties

Conditioned on Scene Properties

EventFiring of Transition

(not all events are interesting)

End of Plan (firing of “sink” transitions)

One PN for multiple objects/events

One PN per event

Plan PNs

• Plan Petri-Nets– Castel (1996)– Natural extension of PN plans in other domains– In this work a number of “plans” are kept track of– At each observation (knowledge received from the “numerical layer”) the plans in

progress are checked to see if they are consistent with the observation.

Parking Lot Example

Plan Prototypes

Vehicle DepartureVehicle ArrivalArsonist ActionPedestrian Movement

Explain Observation Using Plans

Cars Parked

Current Plans:1 .Vehicle

Departure

Consistent with Vehicle Departure– New plans created

Not consistent with vehicle arrival , Pedestrian Movement, Arsonist action

Pedestrian Appears

Consistent with Vehicle Departure, Pedestrian Movement, Arsonist action – New plans created

Not consistent with vehicle arrival

Current Plans:

1 .Vehicle Departure

2 .Pedestrian Movement

3 .Arsonist Action

Pedestrian Disappears

Consistent with existing plans Vehicle Departure, Pedestrian Movement – plans maintained

Pedestrian Movement reaches end of plan- Pedestrian movement event can be said to have occurred

Not consistent with arsonist action – plan rejected

Current Plans:


2 .Pedestrian Movement (Terminated)

Vehicle Starts Moving

Consistent with existing plan Vehicle Departure

Current Plans:


2 .Pedestrian Movement (Terminated)

Parking Lot Example

• After the observation sequence…

• The only consistent event is vehicle departure…. Even though it hasn’t concluded

• Using this system we can make assertions such as a certain event is possible/ not possible before having seen the complete event.

• Joined with information on the duration of the various sub-events we can conjecture on when a possible event might occur in terms of an offset from the current state

• Feedback loop with observation is possible

Object PNs

• Ghanem 2004 & 2007• Tokens are objects• Multiple objects can be represented within same net• Snapshot of the state of the system at any time before or after an event has

occurred

• Feedback Loop

Traffic Monitoring Domain Example

Negative Event- Security Guard has Not Returned to post after 15 minutes

Video event modeling with GSPN

• Object PNs

• Borzin et al 2007

• Capture multiple events of interest in a domain within a single net

• Stochastic Timed Transitions (GSPN)

• The parameters of these distributions may be learned from training data

• Also allows estimates on reachabillity of certain events (Using Marking Analysis discussed shortly)

Video event modeling with GSPN• Basic representation concepts

– Each token represents a detected object in a specific state– A place represents a possible state of one or more objects– Transitions represent an event or a satisfied relation

• Logical relations and Temporal Relations enforced by appropriate PN fragments

• Spatial relations– Topological, directional or distance – Defined by the enabling rules attached to transitions

• Enabling Rules– Define conditions on tokens (objects) that must be met for the associated

transition to become enabled and fire

Surveillance System

• Intermediate video processing unit– Motion detection– Object detection and classification – Object tracking– Can be replaced with format compliant datasets – Supports CAVIAR ground truth format

Video Input

User Interface

User Interface

Results

Behavior ModelingUnit

Intermediate Video Processing Unit

Video Event Recognition Unit

AbstractedVideo

Representation

GSPN Based Behavior Model

Synthesized Dataset

(optional)

Surveillance System

• Behavior modeling unit– Provides a graphical interface for creating GSPN models– Allows splitting an entire graph to small fragments– Supports various templates that can be edited or extended by user.

Video Input

User Interface

User Interface

Results




AbstractedVideo

Representation


Synthesized Dataset

(optional)

Surveillance System

Video Input

User Interface

User Interface

Results




AbstractedVideo

Representation


Synthesized Dataset

(optional)

• Video event recognition unit– Receives the intermediate video processing results and the model– Provides textual description of the detected events

Surveillance System InterfaceBehavior Modeling Interface

Surveillance System InterfaceVideo Event Interpretation Interface

The Data• Assume good tracking, detection and recognition

• CAVIAR annotation format

• Synthetic Video

• Corresponds to real video

• Generation of similar video for training

Synthetic Video Animator

• User interface

• Single scene editor

• Scene series editor

Video Event Analysis

• Scenario 1: ‘Security check in a public place’– Every visitor must pass a security check before he enters the place– The following cases are considered to be abnormal and must be reported:

• A visitor enters the place without being checked• The security check is abnormally long

– Example of event of interest:


• GSPN model for ‘Security check in a public place’:


• Interpretation results for ‘Security check in a public place’:

FrameMessage

1'Object_Appeared' fired on objects: 0

20'Object_Appeared' fired on objects :2

25‘Object_Appeared' fired on objects :6

56'Visitor_Entered_the_Hall' fired on objects :2

61'Guard_Met_Visitor' fired on objects :0, 2

61'Visitor_Entered_the_Hall' fired on objects :6

68'Visitor_Was_Not_Checked' fired on objects :6

71'Security_Check_Is_Too_Long_Detected' fired on objects :0, 2

86'Meeting_Is_Over' fired on objects :0, 2

Marking Analysis

• PN structure can be translated into a reachabillity graph

• A training set provides statistics about transitions between adjacent markings

• Using these the probabillity for future transitions can be calculated

• Example:

– Where the probability to move to marking Mk from marking Mn is:

M0

M1 M2

M3 M4 M5

λ0,1

λ1,3 λ1,4 λ1,5 λ2,5

λ3,4

λ4,0 λ5,0

λ0,2

n

knkn NN ,

,

Video Event Analysis• Marking analysis for ‘Security check in a public place’:

– The marking data was collected during the training process– The reachability graph was constructed upon the possible states observed during the training process– Statistic information was used to predict the most probable next system state Marking graph is used for Discrete Time Markov Chain description

• Example:

– The probability to get to ‘Guard_Checked_One_Visitor’ from ‘Visitor_Walking_Towards_Guard’ is (the red path):

Empty Guard_In_Hall

Guard_Met_One_Visitor

Guard_Waiting

One_Visitor_Stopped_Near_Guard

One_Visitor_Appeared

Guard_Checked_One_Visitor

Visitor_Walking_Towards_Guard

1.0 1.0 1.0

1.0

0.72

One_Visitor_has_Evaded_the_Check

One_Visitor_Passed_Near_Guard

1.0

Guard_Met_Two_Visitors

0.22

0.28

0.63

1.00.37

0.66

0.34

0.58

0.12

0.3

0.78

66.0)78.037.063.0(172.0 P


• Scenario 2: ‘Traffic junction control’– Assume a junction without any traffic lights or traffic signs– The Law: car may enter the junction unless there is a car on its right side – Example event of interest:


• GSPN model for ‘Traffic junction control’:


• Interpretation results for ‘Traffic junction control’:

FrameMessage

0 'Car_Appeared' fired on the objects :2

1 'Car_Appeared' fired on the objects: 0


18 'Car_Entered_Z1' fired on the objects: 0

23 'Car_Entered_Z2' fired on the objects :2

34 'Car_Entered_Z1' fired on the objects :1

52 'Car_In_Z1_Breaks_the_Law' fired on the objects: 0

77 'Car_Entered_Z3' fired on the objects: 0


Building PNs from Ontologies

• We have seen Petri Nets are a formalism for describing video events that enables description of a particular event domain,

• Other Formalisms exist (e.g., Hidden Markov Models, Grammar Models….etc.)

• A knowledge engineer can describe the semantic content of a particular event domain in a standard way.

• Such an expert also has to have expert knowledge of the modeling formalism.

• To bridge this gap we propose that there be a process for translating domain knowledge into an event modeling formalism.

Ontology Languages

• To formalize our knowledge of an event domain we require a standard method of knowledge specification – an ontology

• Competing ontology specification standards for video events exist

– VERL (Nevatia et al 2004)

– CASEE (Hakeem et al 2004)

Ontology Languages -VERL• Defines constructs such as Sequence, Change• Allows definition of predicates and entities• Captures temporal and logical relationships

SINGLE-THREAD(tailgate(ent x, ent y, facility f),

AND (portal-of(door, f))

Sequence(

approach(y, door),

unlock(y, door),

open(y, door),

AND(enter(y, f), near(x, y)),

NOT(unlock(x, door)),

enter(x, f)))

Ontology Languages- CASEE

• Extends the concept of case frames hierarchically

[ PRED: Moves, AG: Train, D: Signals, LOC: Zone1, FAC: Towards, AFTER:

[ PRED: Switches, AG: Signals, FAC: On, AFTER:

[ PRED: Moves, AG: Gate, FAC: Down, AFTER: Switches, SUB:

[ PRED: Stops, AG: Vehicle, LOC: Zone2, FAC: Outside, AFTER: Moves ] ] ] ]

• Methodology for constructing PN from ontology descriptions (not automatic or optimal)

• Similar models for similar events

• Each Sub-event is represented by a PN fragment

• Simple Sub-Event Fragment (no Temporal Relations)

• Temporal Sub-Event Fragment

Translating the Ontology to PN

• Next we connect the fragments representing the various sub-events in a manner that corresponds to their relationship in the ontology event description.

• Logical Relations (AND)

• Temporal Relations (OVERLAPS)


Temporal Relations


• It may be possible to represent a temporal relation in more than one way.

• Having a consistent representation avoids having arbitrary model structures

• Nodes shared between fragments may be fused.

Building a Plan PN

• In a plan PN transitions can represent “primitive” events as given by the ontology descriptions

• These primitive events can be projected into our event structure fragment

• Example: “Safe Crossing Event”

– An approaching train causes the signal to change, the gate to lower, and the approaching car to stop.

– Specified using CASEE

– Create temporal sub-event fragments for each of the sub-events

– Connect fragments according to temporal relations

Building a Plan PN

CASEE Ontology Representation:

[ PRED: Moves, AG: Train, D: Signals, LOC: Zone1, FAC: Towards, AFTER:

[ PRED: Switches, AG: Signals, FAC: On, AFTER:

[ PRED: Moves, AG: Gate, FAC: Down, AFTER: Switches, SUB:

[ PRED: Stops, AG: Vehicle, LOC: Zone2, FAC: Outside, AFTER: Moves ] ] ] ]

Resulting Plan PN:

Simplifying the structure

• Redundant nodes are merged to give a simpler structure

• The Label SE indicates each of the sub-events in the ontology specification

• Each transition is now associated with the appropriate “primitive” sub-event

Building an Object PN

• In an Object PN model tokens represent objects in the system

• We define PN fragments that include places for each possible state of an object

• These fragments are then connected to the special fragments representing the structure of the event (made up of sub-event fragments) as appropriate.

• Our objects of interest for the “safe crossing” event are train, car signal and gate

• Not known are object states (may be implicit by event specification or explicit with small extension to the ontology)


• The possible sates of car can be “inscene”, “stopped”, “inzone2”, “stoppedinzone2”.

• Each of these states are represented as a place in the car state transition fragment.

• Conditional transition nodes allow changes of state based on token properties.

• Similar fragments are constructed for train gate and signal

• States are connected appropriately to the event structure fragment

Petri Nets- Summary

• Inference is the propagation of tokens through the PN

• Relies on semantic structure

• ‘Snapshot’ of the system

• ‘How far Away?’, quantified by probabilities or absolute time.

• Naturally capture inherent video event properties such as concurrency and partial ordering

• May be specified subjectively

• Consistent process for specification needs to be implemented

What About Uncertainty??

• Video Events ARE uncertain

• PN model presented is deterministic

• Uncertainty resides at lower levels

• Higher level becomes more qualitative than quantitative

• Propagation Nets (Shi and Bobick 2004) assign a probability to each state transition

• Other work assigns a probability to tokens as they propagate through network

• It is not critical to assign a probability to each event– Prof. Bobick’s .7 vs .00000000000001 example

• If two events are feasible they should both be considered

• We have seen that multiple event explanations can exist

Thank You

Acknowledgements to :

Michael Rudzsky

Artyom Borzin

Questions?

Publications• Surveillance Event Interpretation Using Generalized Stochastic Petri Nets. Artyom Borzin, Ehud

Rivlin, and Michael Rudzsky. WIAMIS 2007, The 8th International Workshop on Image Analysis for Multimedia Interactive Services, 6-8 June 2007, Santorini, Greece.

• Representation and Recognition of Multiagent Interactions by Generalized Stochastic Petri Nets. Artyom Borzin, Ehud Rivlin, and Michael Rudzsky. CBMI-2007, Fifth International Workshop on Content-Based Multimedia Indexing. June 25-27, 2007 Bordeaux, France.

• Building Petri Nets from Video Event Ontologies. Gal Lavee, Artyom Borzin, Ehud Rivlin, and Michael Rudzsky. Advances in Visual Computing. Third International Symposium, ISVC 2007, Lake Tahoe, NV, USA, November 26-28, 2007. LNCS 4841, p. 442-451.

• Recognition of Human Behavior from Surveillance Video using Marking Analysis in Generalized Stochastic Petri Nets, Borzin, Artyom, MsC Thesis link

ftp://ftp.cs.technion.ac.il/pub/misc/gip/incoming/promo/Rudzsky/Gal/Recognition%20of%20Human%20Behavior%20from%20Surveillance%20Video%20using%20Marking%20Analysis%20in%20Generalized%20Stochastic%20Petri%20Nets%20(updated).zip

References• Nagia M. Ghanem, Petri Net Models for Event Recognition in Surveillance Videos, PhD Thesis,

2007,link

• Nagia Ghanem, Daniel DeMenthon, David Doermann, Larry Davis, "Representation and Recognition of Events in Surveillance Video Using Petri Nets," cvprw, p. 112, 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'04) Volume 7, 2004

• C. Castel, L. Chaudron, and C. Tessier, "What is going on? a high level interpretation of sequences of images," in Proceedings of the workshop on conceptual descriptions from images, ECCV, 1996, pp. 13--27.

https://drum.umd.edu/dspace/bitstream/1903/6829/1/umi-umd-4317.pdf

petri-nets for video event understanding gal lavee and ehud rivlin technion – israel institute of...

Documents

petri dish slide

petri net solution

petri net extensions

partial ordering slide

petri nets ctd

video sequence

p resource

reachability set