activity analysis in video spring 2005 computational intelligence seminar series partial review of...
Post on 22-Dec-2015
214 views
TRANSCRIPT
Activity Analysis in Activity Analysis in VideoVideo
Spring 2005 Computational Intelligence Spring 2005 Computational Intelligence Seminar SeriesSeminar Series
Partial Review of the PaperPartial Review of the Paper““Discovery and Segmentation of Activities in VideoDiscovery and Segmentation of Activities in Video””
By Matthew Brand (MIRL)By Matthew Brand (MIRL)
Presented byPresented byDerek AndersonDerek Anderson
TopicsTopics
1.1. TigerPlace ProjectTigerPlace Project
2.2. Monitoring Silhouette ActivityMonitoring Silhouette Activity
3.3. Monitoring Object ActivityMonitoring Object Activity
4.4. Monitoring both (separate or combined)Monitoring both (separate or combined)
5.5. Hidden Markov Models (Brief Introduction)Hidden Markov Models (Brief Introduction)
6.6. Evolutionary Computing for Structure Evolutionary Computing for Structure DiscoveryDiscovery
7.7. Matthew Brands Approach to Activity Matthew Brands Approach to Activity RecognitionRecognition
Context for this Context for this PresentationPresentation
TigerPlace ProjectTigerPlace Project One component of our system will involve analyzing video (in real-One component of our system will involve analyzing video (in real-
time) and recognizing an time) and recognizing an importantimportant set of “short term” activities set of “short term” activities
bed sensors
sensor mat
motion sensor
stove temp. sensor
gait monitor
DataManager
video sensornetwork
Activity Analysis
ActivityAnalysis
Alarm Filter
Behavior Reasoning
sensorevent
anonymizedvideo
videoactivitydescriptor
physicalactivity
descriptor
Alerts
Alerts
Caregiversand
Residents
(a)
(c)
(b)
sensorevent Video
bed sensors
sensor mat
motion sensor
stove temp. sensor
gait monitor
DataManager
video sensornetwork
Activity Analysis
ActivityAnalysis
Alarm Filter
Behavior Reasoning
sensorevent
anonymizedvideo
videoactivitydescriptor
physicalactivity
descriptor
Alerts
Alerts
Caregiversand
Residents
(a)
(c)
(b)
sensorevent Video
Sensor and Video Sensor and Video NetworksNetworks
We are doing the research for the video sensor We are doing the research for the video sensor networknetwork iPAQ hx4700 series PDA with HP PhotoSmart Digital iPAQ hx4700 series PDA with HP PhotoSmart Digital
CamerasCameras The results from the video network can be The results from the video network can be
combined with other sources of information from combined with other sources of information from the sensor network (gait monitor, bed sensors, the sensor network (gait monitor, bed sensors, …) to reduce false alarm rates and help increase …) to reduce false alarm rates and help increase the overall confidence that the activities occurredthe overall confidence that the activities occurred Is this going to be handled inside the behavior Is this going to be handled inside the behavior
reasoning component of the system … (fuzzy rules)?reasoning component of the system … (fuzzy rules)? Fuzzy Integrals? Fuzzy Integrals?
Fuzzy IntegralFuzzy Integral: use each of the sources of information in the sensor : use each of the sources of information in the sensor and video networks, taking into account how reliable each and video networks, taking into account how reliable each individually are (possible for different kinds of tasks), and asses our individually are (possible for different kinds of tasks), and asses our confidence in a particular hypothesis, which is an individual activity? confidence in a particular hypothesis, which is an individual activity?
ImportantImportant Elderly Elderly ActivitiesActivities
What kind of activities to recognize?What kind of activities to recognize? Presently, we are deciding on an initial Presently, we are deciding on an initial
set to studyset to study A few possibilities includeA few possibilities include
Total body motionTotal body motion Falling down (and not being able to get up)Falling down (and not being able to get up) Someone entering and leaving their bedSomeone entering and leaving their bed Sitting and getting up from a chairSitting and getting up from a chair
Partial body motionPartial body motion Taking their medicineTaking their medicine DrinkingDrinking
Monitoring while Monitoring while Ensuring PrivacyEnsuring Privacy
What features for the video system?What features for the video system? Common approach: Silhouette’sCommon approach: Silhouette’s
Silhouette is an image based Silhouette is an image based representation of individual with representation of individual with nearly all personal and nearly all personal and distinguishing information removeddistinguishing information removed
Features from silhouettes will be Features from silhouettes will be used to monitor an individuals used to monitor an individuals activityactivity
These silhouettes will be initially These silhouettes will be initially extracted through image extracted through image subtraction against a known and subtraction against a known and stationary background (cleaned up stationary background (cleaned up with binary morphology, with binary morphology, reconstruction operator)reconstruction operator)
What the Silhouette's really What the Silhouette's really look like look like
(still a (still a veryvery ideal setting) ideal setting)
Conventional Morphological Opening of Extracted Silhouette (Left)Morphological Reconstruction Operation on Extracted Silhouette (Right)
Silhouette motion over timeSilhouette motion over time(identification of activity (identification of activity
regions)regions)
Consecutive Silhouette Subtraction (left) and after additional Erosion Operation (right)
New Application?New Application? Do not necessarily focus on the silhouettes, but rather the objects Do not necessarily focus on the silhouettes, but rather the objects
in the environment (or the co-interaction of the two)in the environment (or the co-interaction of the two) Object or interesting landmark identificationObject or interesting landmark identification
SIFT (Scale Invariant Feature Transform)SIFT (Scale Invariant Feature Transform) Interesting enough texture on everything?Interesting enough texture on everything? Where are the camera’s placed?Where are the camera’s placed? Too complex to apply at first?Too complex to apply at first? Will it run real time (present equation, Bob = NO)Will it run real time (present equation, Bob = NO)
Low level simple image processing techniquesLow level simple image processing techniques Have to see what the resolution and quality of the images areHave to see what the resolution and quality of the images are Use simpler image processing techniques to recognize particular objectsUse simpler image processing techniques to recognize particular objects
How to deal with some occlusion (why co-interaction might be How to deal with some occlusion (why co-interaction might be helpful)helpful) Used the Used the YUV color space to help identify skin regions that helped in to help identify skin regions that helped in
dealing with occlusion for objects the individual would interact with dealing with occlusion for objects the individual would interact with (tracked the hands)(tracked the hands)
NLM Short-Term Fellowship (Summer 2004)NLM Short-Term Fellowship (Summer 2004) At the end of the summer, I used Bob’s SIFT implementation to identify At the end of the summer, I used Bob’s SIFT implementation to identify
key points from a pill bottle (used the minimum spanning tree and key points from a pill bottle (used the minimum spanning tree and density measure)density measure)
Helped reduce some of the false alarms (in the pill taking activity)Helped reduce some of the false alarms (in the pill taking activity)
Activity RecognitionActivity Recognition
I don’t think that we have decided on the I don’t think that we have decided on the exact approach to use yet?exact approach to use yet?
Looks like some form of HMMs might be Looks like some form of HMMs might be as good of place as any to start?as good of place as any to start? Simple Simple
DOHMMs, COHMMs, or MDCOHMMsDOHMMs, COHMMs, or MDCOHMMs HHMMs (Hierarchical)HHMMs (Hierarchical)
Learning Hierarchical Hidden Markov Models for Learning Hierarchical Hidden Markov Models for Video Structure DiscoveryVideo Structure Discovery
Entropic HMMs (Structure discovery)Entropic HMMs (Structure discovery) Discovery and Segmentation of Activities in VideoDiscovery and Segmentation of Activities in Video
Temporal Pattern Temporal Pattern RecognitionRecognition
Hidden Markov Models (HMM) are statistical methods (stochastic Hidden Markov Models (HMM) are statistical methods (stochastic networks) that model sequential patterns that arise from a set of networks) that model sequential patterns that arise from a set of observation sequences which are believed to have come from the observation sequences which are believed to have come from the process of interest.process of interest.
HMMs are known for their application in areas such as natural HMMs are known for their application in areas such as natural speech recognition, word and symbol recognition, etc ...speech recognition, word and symbol recognition, etc ...
HMMs are a doubly embedded stochastic process with an HMMs are a doubly embedded stochastic process with an underlying process that is not observable (hidden), but can only be underlying process that is not observable (hidden), but can only be observed through another set of stochastic processes that produce observed through another set of stochastic processes that produce the sequence of observations.the sequence of observations.
1
2
K…
1
2
K…
1
2
K…
…
…
…
1
2
K…
x1 x2 x3 xK
2
1
K
2
Mixture Density Continuous Mixture Density Continuous Observation HMMObservation HMM
HMM ProblemsHMM Problems
1)1) Given the observation sequence O = Given the observation sequence O = OO11OO22OO33…O…Ott, and a model m = (A, B, p), , and a model m = (A, B, p), how do we efficiently compute P(O | m)?how do we efficiently compute P(O | m)?
2)2) Given the observation sequence O and a Given the observation sequence O and a model m, how do we choose a model m, how do we choose a corresponding state sequence Q = corresponding state sequence Q = qq11qq22qq33…q…qtt which is optimal in some which is optimal in some meaningful sense?meaningful sense?
3)3) How do we adjust the model parameters How do we adjust the model parameters to maximize P(O | m)?to maximize P(O | m)?
Structure DiscoveryStructure Discovery A serious problem related to the deployment of HMMs A serious problem related to the deployment of HMMs
involves how to specify or learn the HMM model structureinvolves how to specify or learn the HMM model structure Matthew Brand has proposed a method based on entropy Matthew Brand has proposed a method based on entropy
to learn an “optimal” model structure to learn an “optimal” model structure We might look at identifying a general way to learn the We might look at identifying a general way to learn the
model structure in a simpler fashion, independent of the model structure in a simpler fashion, independent of the HMM type, since this will be used in not just a “lab” HMM type, since this will be used in not just a “lab” settingsetting
I am presently looking into using Evolutionary Computing I am presently looking into using Evolutionary Computing (EC) techniques to evolve and learn the HMM structure (EC) techniques to evolve and learn the HMM structure automaticallyautomatically
The difference would be related to the “compression” The difference would be related to the “compression” aspect and the few number of observations samples Brand aspect and the few number of observations samples Brand claims works claims works
EP OverviewEP Overview
Generation t+1
S1S2
S3
S1S4
S2S3
S1S2
S3S1S2
S3
S1S4
S2S3
S1S2
Generation t
F(Pi)
F(Pi)
F(Pi)
Generation t
S1
S1S4
S2S3
S1S2
S3
F(Oi)
F(Oi)
F(Oi)
Mutation
{P1, P2, P3, O1, O2, O3}
Selection
HMM
Walk before we start Walk before we start runningrunning
InitiallyInitially Test how well the procedure works on a fully Test how well the procedure works on a fully
connected DOHMM when we only mutate the connected DOHMM when we only mutate the states (add and remove operators)states (add and remove operators)
Test a few different measures of complexity (the Test a few different measures of complexity (the different fitness functions)different fitness functions)
Each chromosome in a generation acts like a Each chromosome in a generation acts like a seed to the next iterations Baum-Welch algorithmseed to the next iterations Baum-Welch algorithm
LaterLater Consider a more complicated MDCOHMM modelConsider a more complicated MDCOHMM model Try to derive a series of equations and mutation Try to derive a series of equations and mutation
operators that can take an initial population operators that can take an initial population estimated by the Baum-Welch and evolve what estimated by the Baum-Welch and evolve what was found (I believe that this would be a was found (I believe that this would be a completely new technique)completely new technique)
Matthew Brands Matthew Brands ApproachApproach
The principle of maximum likelihood is not The principle of maximum likelihood is not valid for small data sets, the training is rarely valid for small data sets, the training is rarely enough to wash out the sampling artifacts (i.e. enough to wash out the sampling artifacts (i.e. noise)noise)
He also leaves out the obvious, related to if we He also leaves out the obvious, related to if we have enough observations to estimate all the have enough observations to estimate all the different parameters in the network (the different parameters in the network (the degrees of freedom)degrees of freedom)
We may only have a few number of We may only have a few number of observations with a few “reflective” sub-observations with a few “reflective” sub-observation sequencesobservation sequences
He advocates replacing the Baum-Welch He advocates replacing the Baum-Welch formulae with parameter estimators based formulae with parameter estimators based that minimize entropythat minimize entropy
Claim is that this exploits the duality between Claim is that this exploits the duality between learning and compressionlearning and compression
Entropy MinimizationEntropy Minimization
First SetupFirst Setup Variety of activity, from picking up the phone (a few Variety of activity, from picking up the phone (a few
seconds) to activities such as writing (could take up to seconds) to activities such as writing (could take up to hours)hours)
Used a “blob” representation consisting of ellipse Used a “blob” representation consisting of ellipse parameters fitting the single largest connected set of parameters fitting the single largest connected set of active pixelsactive pixels
Background subtraction through identifying a Background subtraction through identifying a statistical model of the background and an adaptive statistical model of the background and an adaptive Gaussian color/location model (pixels that have Gaussian color/location model (pixels that have changed and others due to motion)changed and others due to motion)
Cleaned up the “blob” through dilation (he makes Cleaned up the “blob” through dilation (he makes reference to using a seed from the previous frame)reference to using a seed from the previous frame)
Observation vector uses high level geometric features, Observation vector uses high level geometric features, calculated from the mean and eigenvectors of a 2D calculated from the mean and eigenvectors of a 2D Gaussian fitted to the foreground pixelsGaussian fitted to the foreground pixels
30 minutes of data taken at random 30 minutes of data taken at random removed frames when no one is in the videoremoved frames when no one is in the video roughly 21 minutes after thisroughly 21 minutes after this
TrainingTraining
Only three sequences used for Only three sequences used for trainingtraining
Varied from 100 to 1,900 frames in Varied from 100 to 1,900 frames in lengthlength
# states = {12, 16, 20, 25, and 30}# states = {12, 16, 20, 25, and 30}
Procedure 1: Model Procedure 1: Model ActivityActivity
Procedure 2: Monitoring Procedure 2: Monitoring TrafficTraffic
Monitoring Simultaneous Monitoring Simultaneous ProcessesProcesses
HMMs traditionally are used to model a HMMs traditionally are used to model a single hidden processsingle hidden process
Brand modified (don’t know if he is the Brand modified (don’t know if he is the first, he claims this is novel) HMMs to take first, he claims this is novel) HMMs to take a varying number of observations per time a varying number of observations per time step step
The new image representation is a variable The new image representation is a variable length list of flow vectors between two length list of flow vectors between two subsequent imagessubsequent images
Flow vectors that are smaller than some Flow vectors that are smaller than some predefined threshold are disregardedpredefined threshold are disregarded
The model learns the typical locations and The model learns the typical locations and directions of the moving pixels, and the directions of the moving pixels, and the dynamic changes of these patternsdynamic changes of these patterns
InternalsInternals
Brand uses a modified version of a Brand uses a modified version of a multivariate Gaussian mixture modelmultivariate Gaussian mixture model
He deals with multiple observations He deals with multiple observations per time step by treating each per time step by treating each frame’s flow-list as an observation frame’s flow-list as an observation sequence for a mixture model at one sequence for a mixture model at one time steptime step
multi-observation-multi-observation-mixture+countermixture+counter (MOMC) (MOMC)
HMMHMM
First term is a distribution on the obv countFirst term is a distribution on the obv count The mixture Gaussians are 4D observing The mixture Gaussians are 4D observing
flow vectors in (x,y,dx,dy) spaceflow vectors in (x,y,dx,dy) space The mixture components model motion in The mixture components model motion in
particular directions and locationsparticular directions and locations The counter variable essentially models the The counter variable essentially models the
combined surface area of the moving combined surface area of the moving objects objects
Any Questions?Any Questions?
HMM LinksHMM Links Hidden Markov Models (General Introductions)Hidden Markov Models (General Introductions)
http://http://uirvliuirvli..aiai..uiucuiuc..eduedu//dugaddugad/hmm_/hmm_tuttut.html.html http://www.cse.ucsc.edu/research/compbio/html_format_paperhttp://www.cse.ucsc.edu/research/compbio/html_format_paper
s/hughkrogh96/cabios.htmls/hughkrogh96/cabios.html
Baum-Welch algorithm and the EM (Simpler math Baum-Welch algorithm and the EM (Simpler math derivation)derivation) (Bilmes) (Bilmes) http://citeseer.ist.psu.edu/bilmes98gentle.htmlhttp://citeseer.ist.psu.edu/bilmes98gentle.html
Entropic Hidden Markov Models (Matthew Brand)Entropic Hidden Markov Models (Matthew Brand) Discovery and Segmentation of Activities in Video (IEEE Discovery and Segmentation of Activities in Video (IEEE
Transactions on pattern analysis and machine intelligence, Vol Transactions on pattern analysis and machine intelligence, Vol 22, No. 8, Aug 2000)22, No. 8, Aug 2000)
Fuzzy Hidden Markov Models (Gader and Mohammed)Fuzzy Hidden Markov Models (Gader and Mohammed) Generalized Hidden Markov Models – Part I: Theoretical Generalized Hidden Markov Models – Part I: Theoretical
Frameworks (IEEE Transactions on Fuzzy Systems, Vol 8, No Frameworks (IEEE Transactions on Fuzzy Systems, Vol 8, No 1, Feb 2000)1, Feb 2000)