petri-nets for video event understanding gal lavee and ehud rivlin technion – israel institute of...
TRANSCRIPT
Petri-Nets for Video Event Understanding
Gal Lavee and Ehud Rivlin
Technion – Israel Institute of Technology
Motivation
• Huge volume of ‘interesting’ video data to process• Growing visual surveillance demands
– Control of the security sensitive areas like stores, airports, parking lots, banks and other public places
• Video database analysis demands– Offline surveillance data analysis (e.g. abnormal behaviour detection, specific event
querying)
• Well-understood supporting work– Feature extraction– Object detection– Object tracking– Object recognition– “Low-level” event recognition– Event domain knowledge specification using ontology languages
Its an Important Problem!!
Video EventsActivities- Bobick
Composite Events- Bremond
Known Formalisms• FSMs( Sequence)• Bayesian Networks
(Uncertainty, Factorize State Space)• HMMs (Sequence , Uncertainty)• CRFS
( Sequence, Uncertainty, relax Independence Assumptions)
• Grammars ( Hierarchy, Partial Ordering)
• What About?• Long Term Temporal Dependencies?• Concurrency?• Temporal, spatial, logical relations?• “Incomplete” Events ?
What About Semantics?
Petri Net Solution for Event Understanding• Petri-Net formalism (defined shortly)
• Represents the dynamic evolution of the video sequence
• Encodes semantic knowledge of the domain
• Formalism naturally captures inherent properties of video events – Logical, Temporal and Spatial composition– Concurrency and Partial Ordering– Long-term temporal dependence
• Recursive Reuse of Fragments (Hierarchy)
• Recognizes events and…..
• Snapshot of the state of the system at any time before or after an event has occurred
• How far away is event of interest?
Petri Nets- Defined• A graphical tool for formal description of system dynamics.
• What graph components does Petri net comprise?– Place nodes ()- describe possible local system states.– Transition nodes () - describe events that may modify the state.– Arcs () - specify the relation between
system states and events.– Tokens - markers that reside in place nodes and are used to specify the
PN state
Puser_active
Tuser_requestPrequesting
Presource_idle
Tstart Tend
Presource_busy
Puser_access
Shared Resource Example:
Petri Nets ctd.
Petri Nets ctd.
• The instant location of the tokens (called marking) defines the current state of the model
• Enabling Rule
– Enabled if all input places have tokens.
– Additional requirements may be added (conditional transitions)
• When transition fires:
– Tokens from input places deleted
– Tokens placed in output places
Nothing to do with Petri Dish
Petri Net Properties
• Petri Net formalism is useful for modeling
• Logical Relations
Petri Net Properties
• Petri Net formalism is useful for modeling
• Temporal Relations
Petri Net Properties
• Petri Net formalism is useful for modeling
• Concurrency and Partial Ordering
Reachability set and Reachability graph• Given a PN model and an initial marking
– Compute Reachabillty Set (set of reachable markings)– Reachabillity graph visualizes the path taken to each particular state
MarkingM0M1M2
Puser_active1
Prequesting1
Puser_access1
Presource_busy1
Presource_idle11
M0
M1 M2
Reachability Set: Reachability Graph:
Petri Net Extensions• Priority Transitions
– Each transition is associated with a natural number – Allows resolution of conflicts (e.g. multiple transitions which share an input)
• Timed Transitions– Extends priority concept– Associates a real interval of time with each transition– Must elapse between enabling and firing– Model real world phenomenon
Petri Net Extensions
• Stochastic Petri Nets – Stochastic timed transition delays (discrete-state stochastic process)– Exponential distribution for transition delays
• Negative exponential probability density function (PDF)
• Generalized Stochastic Petri Net is PN where:– Immediate transitions coexist with stochastic timed transitions
nntn eD /1
PN Video Event Models
• PNs in the literature have not been built in an agreed upon fashion
• However two distinct classes of building PN event models have emerged
Plan Petri-Nets Object Petri-Nets
PN Video Event Models
Object PNPlan PN
TokensObjectsPlan Progress
PlacesObject StatesPlan States
TransitionsObject State Change
(Events)
Plan Advancement
Enabling Rules
Conditioned on Object Properties
Conditioned on Scene Properties
EventFiring of Transition
(not all events are interesting)
End of Plan (firing of “sink” transitions)
One PN for multiple objects/events
One PN per event
Plan PNs
• Plan Petri-Nets– Castel (1996)– Natural extension of PN plans in other domains– In this work a number of “plans” are kept track of– At each observation (knowledge received from the “numerical layer”) the plans in
progress are checked to see if they are consistent with the observation.
Parking Lot Example
Plan Prototypes
Vehicle DepartureVehicle ArrivalArsonist ActionPedestrian Movement
Explain Observation Using Plans
Cars Parked
Current Plans:1 .Vehicle
Departure
Consistent with Vehicle Departure– New plans created
Not consistent with vehicle arrival , Pedestrian Movement, Arsonist action
Pedestrian Appears
Consistent with Vehicle Departure, Pedestrian Movement, Arsonist action – New plans created
Not consistent with vehicle arrival
Current Plans:
1 .Vehicle Departure
2 .Pedestrian Movement
3 .Arsonist Action
Pedestrian Disappears
Consistent with existing plans Vehicle Departure, Pedestrian Movement – plans maintained
Pedestrian Movement reaches end of plan- Pedestrian movement event can be said to have occurred
Not consistent with arsonist action – plan rejected
Current Plans:
1 .Vehicle Departure
2 .Pedestrian Movement (Terminated)
Vehicle Starts Moving
Consistent with existing plan Vehicle Departure
Current Plans:
1 .Vehicle Departure
2 .Pedestrian Movement (Terminated)
Parking Lot Example
• After the observation sequence…
• The only consistent event is vehicle departure…. Even though it hasn’t concluded
• Using this system we can make assertions such as a certain event is possible/ not possible before having seen the complete event.
• Joined with information on the duration of the various sub-events we can conjecture on when a possible event might occur in terms of an offset from the current state
• Feedback loop with observation is possible
Object PNs
• Ghanem 2004 & 2007• Tokens are objects• Multiple objects can be represented within same net• Snapshot of the state of the system at any time before or after an event has
occurred
• Feedback Loop
Traffic Monitoring Domain Example
Negative Event- Security Guard has Not Returned to post after 15 minutes
Video event modeling with GSPN
• Object PNs
• Borzin et al 2007
• Capture multiple events of interest in a domain within a single net
• Stochastic Timed Transitions (GSPN)
• The parameters of these distributions may be learned from training data
• Also allows estimates on reachabillity of certain events (Using Marking Analysis discussed shortly)
Video event modeling with GSPN• Basic representation concepts
– Each token represents a detected object in a specific state– A place represents a possible state of one or more objects– Transitions represent an event or a satisfied relation
• Logical relations and Temporal Relations enforced by appropriate PN fragments
• Spatial relations– Topological, directional or distance – Defined by the enabling rules attached to transitions
• Enabling Rules– Define conditions on tokens (objects) that must be met for the associated
transition to become enabled and fire
Surveillance System
• Intermediate video processing unit– Motion detection– Object detection and classification – Object tracking– Can be replaced with format compliant datasets – Supports CAVIAR ground truth format
Video Input
User Interface
User Interface
Results
Behavior ModelingUnit
Intermediate Video Processing Unit
Video Event Recognition Unit
AbstractedVideo
Representation
GSPN Based Behavior Model
Synthesized Dataset
(optional)
Surveillance System
• Behavior modeling unit– Provides a graphical interface for creating GSPN models– Allows splitting an entire graph to small fragments– Supports various templates that can be edited or extended by user.
Video Input
User Interface
User Interface
Results
Behavior ModelingUnit
Intermediate Video Processing Unit
Video Event Recognition Unit
AbstractedVideo
Representation
GSPN Based Behavior Model
Synthesized Dataset
(optional)
Surveillance System
Video Input
User Interface
User Interface
Results
Behavior ModelingUnit
Intermediate Video Processing Unit
Video Event Recognition Unit
AbstractedVideo
Representation
GSPN Based Behavior Model
Synthesized Dataset
(optional)
• Video event recognition unit– Receives the intermediate video processing results and the model– Provides textual description of the detected events
Surveillance System InterfaceBehavior Modeling Interface
Surveillance System InterfaceVideo Event Interpretation Interface
The Data• Assume good tracking, detection and recognition
• CAVIAR annotation format
• Synthetic Video
• Corresponds to real video
• Generation of similar video for training
Synthetic Video Animator
• User interface
• Single scene editor
• Scene series editor
Video Event Analysis
• Scenario 1: ‘Security check in a public place’– Every visitor must pass a security check before he enters the place– The following cases are considered to be abnormal and must be reported:
• A visitor enters the place without being checked• The security check is abnormally long
– Example of event of interest:
Video Event Analysis
• GSPN model for ‘Security check in a public place’:
Video Event Analysis
• Interpretation results for ‘Security check in a public place’:
FrameMessage
1'Object_Appeared' fired on objects: 0
20'Object_Appeared' fired on objects :2
25‘Object_Appeared' fired on objects :6
56'Visitor_Entered_the_Hall' fired on objects :2
61'Guard_Met_Visitor' fired on objects :0, 2
61'Visitor_Entered_the_Hall' fired on objects :6
68'Visitor_Was_Not_Checked' fired on objects :6
71'Security_Check_Is_Too_Long_Detected' fired on objects :0, 2
86'Meeting_Is_Over' fired on objects :0, 2
Marking Analysis
• PN structure can be translated into a reachabillity graph
• A training set provides statistics about transitions between adjacent markings
• Using these the probabillity for future transitions can be calculated
• Example:
– Where the probability to move to marking Mk from marking Mn is:
M0
M1 M2
M3 M4 M5
λ0,1
λ1,3 λ1,4 λ1,5 λ2,5
λ3,4
λ4,0 λ5,0
λ0,2
n
knkn NN ,
,
Video Event Analysis• Marking analysis for ‘Security check in a public place’:
– The marking data was collected during the training process– The reachability graph was constructed upon the possible states observed during the training process– Statistic information was used to predict the most probable next system state Marking graph is used for Discrete Time Markov Chain description
• Example:
– The probability to get to ‘Guard_Checked_One_Visitor’ from ‘Visitor_Walking_Towards_Guard’ is (the red path):
Empty Guard_In_Hall
Guard_Met_One_Visitor
Guard_Waiting
One_Visitor_Stopped_Near_Guard
One_Visitor_Appeared
Guard_Checked_One_Visitor
Visitor_Walking_Towards_Guard
1.0 1.0 1.0
1.0
0.72
One_Visitor_has_Evaded_the_Check
One_Visitor_Passed_Near_Guard
1.0
Guard_Met_Two_Visitors
0.22
0.28
0.63
1.00.37
0.66
0.34
0.58
0.12
0.3
0.78
66.0)78.037.063.0(172.0 P
Video Event Analysis
• Scenario 2: ‘Traffic junction control’– Assume a junction without any traffic lights or traffic signs– The Law: car may enter the junction unless there is a car on its right side – Example event of interest:
Video Event Analysis
• GSPN model for ‘Traffic junction control’:
Video Event Analysis
• Interpretation results for ‘Traffic junction control’:
FrameMessage
0 'Car_Appeared' fired on the objects :2
1 'Car_Appeared' fired on the objects: 0
10 'Car_Appeared' fired on the objects :1
18 'Car_Entered_Z1' fired on the objects: 0
23 'Car_Entered_Z2' fired on the objects :2
34 'Car_Entered_Z1' fired on the objects :1
52 'Car_In_Z1_Breaks_the_Law' fired on the objects: 0
77 'Car_Entered_Z3' fired on the objects: 0
80 'Car_Appeared' fired on the objects :3
Building PNs from Ontologies
• We have seen Petri Nets are a formalism for describing video events that enables description of a particular event domain,
• Other Formalisms exist (e.g., Hidden Markov Models, Grammar Models….etc.)
• A knowledge engineer can describe the semantic content of a particular event domain in a standard way.
• Such an expert also has to have expert knowledge of the modeling formalism.
• To bridge this gap we propose that there be a process for translating domain knowledge into an event modeling formalism.
Ontology Languages
• To formalize our knowledge of an event domain we require a standard method of knowledge specification – an ontology
• Competing ontology specification standards for video events exist
– VERL (Nevatia et al 2004)
– CASEE (Hakeem et al 2004)
Ontology Languages -VERL• Defines constructs such as Sequence, Change• Allows definition of predicates and entities• Captures temporal and logical relationships
SINGLE-THREAD(tailgate(ent x, ent y, facility f),
AND (portal-of(door, f))
Sequence(
approach(y, door),
unlock(y, door),
open(y, door),
AND(enter(y, f), near(x, y)),
NOT(unlock(x, door)),
enter(x, f)))
Ontology Languages- CASEE
• Extends the concept of case frames hierarchically
[ PRED: Moves, AG: Train, D: Signals, LOC: Zone1, FAC: Towards, AFTER:
[ PRED: Switches, AG: Signals, FAC: On, AFTER:
[ PRED: Moves, AG: Gate, FAC: Down, AFTER: Switches, SUB:
[ PRED: Stops, AG: Vehicle, LOC: Zone2, FAC: Outside, AFTER: Moves ] ] ] ]
• Methodology for constructing PN from ontology descriptions (not automatic or optimal)
• Similar models for similar events
• Each Sub-event is represented by a PN fragment
• Simple Sub-Event Fragment (no Temporal Relations)
• Temporal Sub-Event Fragment
Translating the Ontology to PN
• Next we connect the fragments representing the various sub-events in a manner that corresponds to their relationship in the ontology event description.
• Logical Relations (AND)
• Temporal Relations (OVERLAPS)
Translating the Ontology to PN
Temporal Relations
Translating the Ontology to PN
• It may be possible to represent a temporal relation in more than one way.
• Having a consistent representation avoids having arbitrary model structures
• Nodes shared between fragments may be fused.
Building a Plan PN
• In a plan PN transitions can represent “primitive” events as given by the ontology descriptions
• These primitive events can be projected into our event structure fragment
• Example: “Safe Crossing Event”
– An approaching train causes the signal to change, the gate to lower, and the approaching car to stop.
– Specified using CASEE
– Create temporal sub-event fragments for each of the sub-events
– Connect fragments according to temporal relations
Building a Plan PN
CASEE Ontology Representation:
[ PRED: Moves, AG: Train, D: Signals, LOC: Zone1, FAC: Towards, AFTER:
[ PRED: Switches, AG: Signals, FAC: On, AFTER:
[ PRED: Moves, AG: Gate, FAC: Down, AFTER: Switches, SUB:
[ PRED: Stops, AG: Vehicle, LOC: Zone2, FAC: Outside, AFTER: Moves ] ] ] ]
Resulting Plan PN:
Simplifying the structure
• Redundant nodes are merged to give a simpler structure
• The Label SE indicates each of the sub-events in the ontology specification
• Each transition is now associated with the appropriate “primitive” sub-event
Building an Object PN
• In an Object PN model tokens represent objects in the system
• We define PN fragments that include places for each possible state of an object
• These fragments are then connected to the special fragments representing the structure of the event (made up of sub-event fragments) as appropriate.
• Our objects of interest for the “safe crossing” event are train, car signal and gate
• Not known are object states (may be implicit by event specification or explicit with small extension to the ontology)
Building an Object PN
• The possible sates of car can be “inscene”, “stopped”, “inzone2”, “stoppedinzone2”.
• Each of these states are represented as a place in the car state transition fragment.
• Conditional transition nodes allow changes of state based on token properties.
• Similar fragments are constructed for train gate and signal
• States are connected appropriately to the event structure fragment
Building an Object PN
Petri Nets- Summary
• Inference is the propagation of tokens through the PN
• Relies on semantic structure
• ‘Snapshot’ of the system
• ‘How far Away?’, quantified by probabilities or absolute time.
• Naturally capture inherent video event properties such as concurrency and partial ordering
• May be specified subjectively
• Consistent process for specification needs to be implemented
What About Uncertainty??
• Video Events ARE uncertain
• PN model presented is deterministic
• Uncertainty resides at lower levels
• Higher level becomes more qualitative than quantitative
• Propagation Nets (Shi and Bobick 2004) assign a probability to each state transition
• Other work assigns a probability to tokens as they propagate through network
• It is not critical to assign a probability to each event– Prof. Bobick’s .7 vs .00000000000001 example
• If two events are feasible they should both be considered
• We have seen that multiple event explanations can exist
Thank You
Acknowledgements to :
Michael Rudzsky
Artyom Borzin
Questions?
Publications• Surveillance Event Interpretation Using Generalized Stochastic Petri Nets. Artyom Borzin, Ehud
Rivlin, and Michael Rudzsky. WIAMIS 2007, The 8th International Workshop on Image Analysis for Multimedia Interactive Services, 6-8 June 2007, Santorini, Greece.
• Representation and Recognition of Multiagent Interactions by Generalized Stochastic Petri Nets. Artyom Borzin, Ehud Rivlin, and Michael Rudzsky. CBMI-2007, Fifth International Workshop on Content-Based Multimedia Indexing. June 25-27, 2007 Bordeaux, France.
• Building Petri Nets from Video Event Ontologies. Gal Lavee, Artyom Borzin, Ehud Rivlin, and Michael Rudzsky. Advances in Visual Computing. Third International Symposium, ISVC 2007, Lake Tahoe, NV, USA, November 26-28, 2007. LNCS 4841, p. 442-451.
• Recognition of Human Behavior from Surveillance Video using Marking Analysis in Generalized Stochastic Petri Nets, Borzin, Artyom, MsC Thesis link
References• Nagia M. Ghanem, Petri Net Models for Event Recognition in Surveillance Videos, PhD Thesis,
2007,link
• Nagia Ghanem, Daniel DeMenthon, David Doermann, Larry Davis, "Representation and Recognition of Events in Surveillance Video Using Petri Nets," cvprw, p. 112, 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'04) Volume 7, 2004
• C. Castel, L. Chaudron, and C. Tessier, "What is going on? a high level interpretation of sequences of images," in Proceedings of the workshop on conceptual descriptions from images, ECCV, 1996, pp. 13--27.