research article affine-invariant feature extraction for...
TRANSCRIPT
Hindawi Publishing CorporationISRNMachine VisionVolume 2013 Article ID 215195 7 pageshttpdxdoiorg1011552013215195
Research ArticleAffine-Invariant Feature Extraction for Activity Recognition
Samy Sadek1 Ayoub Al-Hamadi2 Gerald Krell2 and Bernd Michaelis2
1 Department of Mathematics and Computer Science Faculty of Science Sohag University 82524 Sohag Egypt2 Institute for Information Technology and Communications (IIKT) Otto von Guericke University Magdeburg39106 Magdeburg Germany
Correspondence should be addressed to Samy Sadek samytechnikgmailcom
Received 28 April 2013 Accepted 4 June 2013
Academic Editors A Gasteratos D P Mukherjee and A Torsello
Copyright copy 2013 Samy Sadek et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited
We propose an innovative approach for human activity recognition based on affine-invariant shape representation and SVM-based feature classification In this approach a compact computationally efficient affine-invariant representation of action shapesis developed by using affine moment invariants Dynamic affine invariants are derived from the 3D spatiotemporal action volumeand the average image created from the 3D volume and classified by an SVM classifier On two standard benchmark action datasets(KTH andWeizmann datasets) the approach yields promising results that compare favorably with those previously reported in theliterature while maintaining real-time performance
1 IntroductionVisual recognition and interpretation of human-inducedactions and events are among the most active research areasin computer vision pattern recognition and image under-standing communities [1] Although a great deal of progresshas been made in automatic recognition of human actionsduring the last two decades the approaches proposed inthe literature remain limited in their ability This leadsto a need for much research work to be conducted toaddress the ongoing challenges and develop more efficientapproaches It is clear that developing good algorithms forsolving the problem of human action recognitionwould yieldhuge potential for a large number of potential applicationsfor example the search and the structuring of large videoarchives human-computer interaction video surveillancegesture recognition and robot learning and control Infact the nonrigid nature of human body and clothes invideo sequences resulting from drastic illumination changeschanging in pose and erratic motion patterns presents thegrand challenge to human detection and action recognitionIn addition while the real-time performance is a majorconcern in computer vision especially for embedded com-puter vision systems the majority of state-of-the-art humanaction recognition systems often employ sophisticated featureextraction and learning techniques creating a barrier to
the real-time performance of these systems This suggests atrade-off between accuracy and real-time performance Theremainder of this paper commences by briefly reviewingthe most relevant literature in this area of human actionrecognition in Section 2 Then in Section 3 we describethe details of the proposed method for action recognitionThe experimental results corroborating the proposedmethodeffectiveness are presented and analyzed in Section 4 Finallyin Section 5 we conclude and mention possible future work
2 The Literature Overview
Recent few years have witnessed a resurgence of interestin more research on the analysis and interpretation ofhuman motion motivated by the rise of security concernsand increased ubiquity and affordability of digital mediaproduction equipment Human action can generally be rec-ognized using various visual cues such as motion [2 3]and shape [4 5] Scanning the literature one notices thata significant body of work in human action recognitionfocuses on using spatial-temporal key points and local featuredescriptors [6] The local features are extracted from theregion around each key point detected by the key pointdetection process These features are then quantized toprovide a discrete set of visual words before they are fed
2 ISRNMachine Vision
Figure 1 GMM background subtraction the first and third rows display two sequences of walking and running actions from KTH andWeizmann action datasets respectively while the second and fourth rows show the results of background subtraction where foregroundobjects are shown in cyan color
into the classification module Another thread of research isconcerned with analyzing patterns of motion to recognizehuman actions For instance in [7] periodic motions aredetected and classified to recognize actions Alternativelysome researchers have opted to use both motion and shapecues In [8] the authors detect the similarity between videosegments using a space-time correlation model In [9]Rodriguez et al present a template-based approach usinga Maximum Average Correlation Height (MACH) filter tocapture intraclass variabilities Likewise a significant amountof work is targeted at modelling and understanding humanmotions by constructing elaborated temporal dynamic mod-els [10] There is also an attractive area of research whichfocuses on using generative topic models for visual recogni-tion based on the so-called Bag-of-Words (BoW) model [11]The underlying concept of a BoW is that each video sequenceis represented by counting the number of occurrences ofdescriptor prototypes so-called visual words Topic modelsare built and then applied to the BoW representation Threeexamples of commonly used topic models include Corre-lated Topic Models (CTMs) [11] Latent Dirichlet Alloca-tion (LDA) [12] and probabilistic Latent Semantic Analysis(pLSA) [13]
3 Proposed Methodology
In this section the proposed method for action recognitionis described The main steps of the framework are explainedin detail along the following subsections
31 Background Subtraction In this paper we use GaussianMixture Model (GMM) as a basis to model backgrounddistribution Formally speaking let 119883
119905be a pixel in the
current frame 119868119905 where 119905 is the frame index Then each pixel
can be modeled separately by a mixture of 119870 Gaussians
119875 (119883119905) =
119870
sum
119894=1
120596119894119905120578 (119883119905 120583119894119905 Σ119894119905) (1)
Where 120578 is a Gaussian probability density function 120583119894119905
Σ119894119905 and 120596
119894119905are the mean covariance and an estimate of
the weight of the 119894th Gaussian in the mixture at time 119905respectively 119870 is the number of distributions which is setto 5 in experiments Before the foreground is detected thebackground is updated (see [14] for details about the updatingprocedure) After the updates are done the weights 120596
119894119905are
normalized By applying a threshold 119879 (set to 06 in ourexperiments) the background distribution remains on topwith the lowest variance where
119861 = argmin119887
(sum119887
119894=1120596119894119905
sum119870
119894=1120596119894119905
gt 119879) (2)
Finally all pixels 119883119905that match none of the components
are good candidates to be marked as foreground An exampleof GMM background subtraction can be seen in Figure 1
32 Average Images from 3D Action Volumes The 3D volumein the spatio-temporal (119883119884119879) domain is formed by piling
ISRNMachine Vision 3
Figure 2 2D average image created from the 3D spatio-temporalvolume of a walking sequence
up the target region in the image sequences of one actioncycle which is used to partition the sequences for thespatiotemporal volume An action cycle is a fundamental unitto describe the action In this work we assume that the spatio-temporal volume consists of a number of small voxels Theaverage image 119868
119886V(119909 119910) is defined as
119868119886V (119909 119910) =
1
120591
120591minus1
sum
119905=0
119868 (119909 119910 119905) (3)
where 120591 is the number of frames in action cycle (we use120591 = 25 in our experiments) 119868(119909 119910 119905) represents the densityof the voxels at time 119905 An example of average image createdfrom the 3D spatio-temporal volume of the running sequenceis shown in Figure 2 For characterizing these 2D averageimages the 2D affine moment invariants are considered asfeatures [26]
33 Feature Extraction As is well known the momentsdescribe shape properties of an object as it appears Affinemoment invariants are moment-based descriptors which areinvariant under a general affine transform Six affinemomentinvariants can be conventionally derived from the centralmoments [27] as follows
1198681=
1
1205784
00
[1205782012057802
minus 1205782
11]
1198682=
1
12057810
00
[1205782
031205782
30minus 612057830120578211205781212057803
+ 4120578301205783
12
+4120578031205783
21minus 31205782
211205782
12]
1198683=
1
1205787
00
[12057820
(1205782112057803
minus 1205782
12) minus 12057811
(1205783012057803
minus 1205782112057812)
+12057802
(1205780312057812
minus 1205782
21)]
1198684=
1
12057811
00
[1205783
201205782
03minus 61205782
20120578111205781212057803
minus 61205782
20120578021205782112057803
+ 91205782
20120578021205782
12+ 12120578201205782
111205782112057803
+ 61205782012057811120578021205783012057803
+ 181205782012057811120578021205783012057812
minus 81205783
111205783012057803
minus 6120578201205782
021205783012057812
+ 9120578201205782
021205782
21
+121205782
11120578021205783012057812
minus 6120578111205782
021205783012057812
+ 1205783
021205783
30]
1198685=
1
1205786
00
[1205784012057804
minus 41205783112057813
+ 31205782
22]
1198686=
1
1205789
00
[120578401205780412057822
+ 2120578311205781312057822
minus 120578401205782
13
minus120578041205782
13minus 1205783
22]
(4)
where 120578119901119902
is the central moment of order 119901 + 119902For a spatio-temporal (119883119884119879) space the 3D moment of
order (119901 + 119902 + 119903) of 3D object O is derived using the sameprocedure of the 2D centralized moment
120578119901119902119903
= sumsumsum
(119909119910119905)isinO
(119909 minus 119909119892)119901
(119910 minus 119910119892)119902
(119905 minus 119905119892)119903
119868 (119909 119910 119905)
(5)
Where (119909119892 119910119892 119905119892) is the centroid of object in the spatio-
temporal space Based on the definition of the 3D momentin (5) six 3D affine moment invariants can be defined Thefirst two of these moment invariants are given by
1198691=
1
1205785
000
[120578200
120578020
120578002
+ 2120578110
120578101
120578011
minus 120578200
1205782
011
minus120578020
1205782
101minus 120578002
1205782
110]
1198692=
1
1205787
000
[120578400
(120578040
120578004
+ 31205782
022minus 4120578013
120578031
)
+ 3120578202
(120578040
120578202
minus 4120578112
120578130
+ 41205782
121)
+ 12120578211
(120578022
120578211
+ 120578103
120578130
minus 120578031
120578202
minus120578121
120578112
)
+ 4120578310
(120578031
120578103
minus 120578004
120578220
+3120578013
120578121
minus 3120578022
120578112
)
+ 3120578220
(120578004
120578220
+ 2120578022
120578202
+4120578112
minus 4120578013
120578311
minus 4120578121
120578103
)
+ 4120578301
(120578013
120578130
minus 120578040
120578103
+ 3120578031
120578112
minus3120578022
120578121
) ]
(6)
Due to their long formulae the remaining four momentinvariants are not displayed here (refer to [28]) Figure 3
4 ISRNMachine Vision
Walk Jog Run Box Wave Clap
Walk Jog Run Box Wave ClapWalk Jog Run Box Wave Clap
Walk Jog Run Box Wave Clap
Walk Jog Run Box Wave Clap
Walk Jog Run Box Wave Clap
07
08
09
1
0
02
04
06
08
0
0005
001
0015
002
I 1 I 2
I 3 I 4
I 5 I 6
times10minus4
5
0
minus5
minus10
minus003
minus002
minus001
0
001
minus03
minus02
minus01
0
01
Figure 3 Plots of 2D affine moment invariants (119868119894 119894 = 1 6) computed on the average images of walking jogging running boxing
waving and clapping sequences
shows a series of plots of 2D dynamic affine invariants withdifferent action classes computed on the average images ofaction sequences
34 Action Classification Using SVM In this section we for-mulate the action recognition task as a multiclass learningproblem where there is one class for each action and thegoal is to assign an action to an individual in each videosequence [1 29] There are various supervised learning algo-rithms by which action recognizer can be trained SupportVector Machines (SVMs) are used in this work due to theiroutstanding generalization capability and reputation of ahighly accurate paradigm [30] SVMs that provide a bestsolution to data overfitting in neural networks are basedon the structural risk minimization principle from compu-tational theory Originally SVMs were designed to handledichotomic classes in a higher dimensional space where amaximal separating hyperplane is created On each side ofthis hyperplane two parallel hyperplanes are conductedThen SVM attempts to find the separating hyperplane thatmaximizes the distance between the two parallel hyperplanes(see Figure 4) Intuitively a good separation is achieved bythe hyperplane having the largest distance Hence the largerthemargin the lower the generalization error of the classifierFormally let D = (x
119894 119910119894) | x119894
isin R119889 119910119894
isin minus1 +1 be atraining dataset Vapnik [30] shows that the problem is best
120585i
xi
120585j
xj
120573x+ 1205730
= +1
120573x+ 1205730
= 0
120573x+ 1205730
= minus1
Figure 4 Generalized optimal separating hyperplane
addressed by allowing some examples to violate the marginconstraints These potential violations are formulated withsome positive slack variables 120585
119894and a penalty parameter 119862 ge
0 that penalize the margin violations Thus the generalizedoptimal separating hyperplane is determined by solving thefollowing quadratic programming problem
min1205731205730
1
2
10038171003817100381710038171205731003817100381710038171003817
2
+ 119862sum
119894
120585119894 (7)
subject to (119910119894(⟨x119894120573⟩ + 120573
0) ge 1 minus 120585
119894forall119894) and (120585
119894ge 0 forall119894)
ISRNMachine Vision 5
Geometrically 120573 isin R119889 is a vector going through thecenter and perpendicular to the separating hyperplane Theoffset parameter 120573
0is added to allow the margin to increase
and not to force the hyperplane to pass through the originthat restricts the solution For computational purposes itis more convenient to solve SVM in its dual formulationThis can be accomplished by forming the Lagrangian andthen optimizing over the Lagrangemultiplier 120572The resultingdecision function hasweight vector120573 = sum
119894120572119894x119894119910119894 0 le 120572
119894le 119862
The instances x119894with 120572
119894gt 0 are called support vectors as they
uniquely define the maximummargin hyperplaneIn the current approach several classes of actions are cre-
ated Several one-versus-all SVM classifiers are trained usingaffine moment features extracted from action sequences inthe training dataset For each action sequence a set of six2D affine moment invariants is extracted from the averageimage Also another set of six 3D affine moment invariantsis extracted from the spatio-temporal silhouette sequenceThen SVM classifiers are trained on these features to learnvarious categories of actions
4 Experiments and Results
To evaluate the proposed approach two main experimentswere carried out and the results we achieved were comparedwith those reported by other state-of-the-art methods
41 Experiment 1 We conducted this experiment using KTHaction dataset [31] To illustrate the effectiveness of themethod the obtained results are compared with those ofother similar state-of-the-art methods The KTH datasetcontains action sequences comprised of six types of humanactions (ie walking jogging running boxing handwavingand hand clapping) These actions are performed by a totalof 25 individuals in four different settings (ie outdoorsoutdoors with scale variation outdoors with different clothesand indoors) All sequences were acquired by a static cameraat 25 fps and a spatial resolution of 160 times 120 pixels overhomogeneous backgrounds To the best of our knowledgethere is no other similar dataset already available in theliterature of sequences acquired on different environments Inorder to prepare the experiments and to provide an unbiasedestimation of the generalization abilities of the classificationprocess a set of sequences (75 of all sequences) performedby 18 subjects was used for training and other sequences(the remaining 25) performed by the other 7 subjects wereset aside as a test set SVMs with Gaussian radial basisfunction (RBF) kernel are trained on the training set whilethe evaluation of the recognition performance is performedon the test set
The confusion matrix that shows the recognition resultsachieved on the KTH action dataset is given in Table 1 whilethe comparison of the obtained results with those obtainedby other methods available in the literature is shown inTable 3 As follows from the figures tabulated in Table 1most actions are correctly classified Furthermore there isa high distinction between arm actions and leg actionsMost of the mistakes where confusions occur are betweenldquojoggingrdquo and ldquorunningrdquo actions and between ldquoboxingrdquo and
Table 1 Confusion matrix for the KTH dataset
WalkingRunningJoggingBoxingWavingClapping
Walking
094000004000000000
Running
001096008000000000
Jogging
005004088000000000
Boxing
000000000094002001
Waving
000000000002093003
Clapping
000000000004005096
Action
ldquoclappingrdquo actions This is intuitively plausible due to thefact of high similarity between each pair of these actionsFrom the comparison given by Table 3 it turns out that ourmethod performs competitively with other state-of-the-artmethods It is pertinent to mention here that the state-of-the-art methods with which we compare our method haveused the same dataset and the same experimental conditionstherefore the comparison seems to be quite fair
42 Experiment 2 This second experiment was conductedusing the Weizmann action dataset provided by Blank etal [32] in 2005 which contains a total of 90 video clips(ie 5098 frames) performed by 9 individuals Each videoclip contains one person performing an action There are 10categories of action involved in the dataset namely walkingrunning jumping jumping in place bending jacking skippinggalloping sideways one-hand waving and two-hand wavingTypically all the clips in the dataset are sampled at 25Hz andlast about 2 seconds with image frame size of 180 times 144 Inorder to provide an unbiased estimate of the generalizationabilities of the proposedmethod we have used the leave-one-out cross-validation (LOOCV) technique in the validationprocess As the name suggests this involves using a groupof sequences from a single subject in the original dataset asthe testing data and the remaining sequences as the trainingdata This is repeated such that each group of sequences inthe dataset is used once as the validation Again as with thefirst experiment SVMs with Gaussian RBF kernel are trainedon the training set while the evaluation of the recognitionperformance is performed on the test set
The confusion matrix in Table 2 provides the recognitionresults obtained by the proposed method where correctresponses define the main diagonal From the figures in thematrix a number of points can be drawn The majority ofactions are correctly classified An average recognition rateof 978 is achieved with our proposed method What ismore there is a clear distinction between arm actions andleg actions The mistakes where confusions occur are onlybetween skip and jump actions and between jump and runactions This intuitively seems to be reasonable due to thefact of high closeness or similarity among the actions in eachpair of these actions In order to quantify the effectiveness ofthe method the obtained results are compared qualitativelywith those obtained previously by other investigators Theoutcome of this comparison is presented in Table 3 In thelight of this comparison one can see that the proposedmethod is competitive with the state-of-the-art methods
6 ISRNMachine Vision
Table 2 Confusion matrix for the Weizmann dataset
Action
Bend
Bend
Jump
Jump
Pjump
Pjump
Walk
Walk
Run
Run
Side
Side
Jack
Jack
Skip
Skip
Wave 1
Wave 1
Wave 2
Wave 2
100
000
000
000
000
000
000
000
000
000
000
100
000
000
000
000
000
000
000
000
000
000
100
000
000
000
000
000
000
000
000
000
000
100
000
000
000
000
000
000
000
000
000
000
090
000
000
010
000
000
000
000
000
000
000
100
000
000
000
000
000
000
000
000
000
000
100
000
000
000
000
000
000
000
010
000
000
090
000
000
000
000
000
000
000
000
000
000
000
100
000
000
000
000
000
000
000
000
100
000
Table 3 Comparison with the state of the art on the KTH andWeizmann datasets
Method KTH WeizmannOur method 935 980Liu and Shah [15] 928 mdashWang and Mori [16] 925 mdashJhuang et al [17] 917 mdashRodriguez et al [9] 886 mdashRapantzikos et al [18] 883 mdashDollar et al [19] 812 mdashKe et al [20] 630 mdashFathi and Mori [21] mdash 100Bregonzio et al [22] mdash 966Zhang et al [23] mdash 928Niebles et al [24] mdash 900Dollar et al [19] mdash 852Klaser et al [25] mdash 843
It is worthwhile to mention that all the methods that wecompared our method with except the method proposedin [21] have used similar experimental setups thus thecomparison seems to be meaningful and fair A final remarkconcerns the real-time performance of our approach Theproposed action recognizer runs at 18fps on average (using a28GHz Intel dual core machine with 4GB of RAM running32-bit Windows 7 Professional)
5 Conclusion and Future Work
In this paper we have introduced an approach for activityrecognition based on affine moment invariants for activityrepresentation and SVMs for feature classification On two
benchmark action datasets the results obtained by theproposed approach were compared favorably with thosepublished in the literature The primary focus of our futurework will be to investigate the empirical validation of theapproach on more realistic datasets presenting many techni-cal challenges in data handling such as object articulationocclusion and significant background clutter
References
[1] S Sadek A Al-Hamadi B Michaelis and U Sayed ldquoRecog-nizing human actions a fuzzy approach via chord-length shapefeaturesrdquo ISRN Machine Vision vol 1 pp 1ndash9 2012
[2] A A Efros A C Berg G Mori and J Malik ldquoRecognizingaction at a distancerdquo in Proceedings of the 9th IEEE InternationalConference on Computer Vision (ICCV rsquo03) vol 2 pp 726ndash733October 2003
[3] S Sadek A Al-Hamadi B Michaelis and U Sayed ldquoTowardsrobust human action retrieval in videordquo in Proceedings of theBritish Machine Vision Conference (BMVC rsquo10) AberystwythUK September 2010
[4] S Sadek A Al-Hamadi B Michaelis and U Sayed ldquoHumanactivity recognition a scheme using multiple cuesrdquo in Proceed-ings of the International Symposium on Visual Computing (ISVCrsquo10) vol 1 pp 574ndash583 Las Vegas Nev USA November 2010
[5] S Sadek A AI-Hamadi M Elmezain B Michaelis and USayed ldquoHuman activity recognition via temporal momentinvariantsrdquo in Proceedings of the 10th IEEE International Sym-posiumon Signal Processing and Information Technology (ISSPITrsquo10) pp 79ndash84 Luxor Egypt December 2010
[6] S Sadek A Al-Hamadi B Michaelis and U Sayed ldquoAn actionrecognition scheme using fuzzy log-polar histogram and tem-poral self-similarityrdquo EURASIP Journal on Advances in SignalProcessing vol 2011 Article ID 540375 2011
[7] R Cutler and L S Davis ldquoRobust real-time periodic motiondetection analysis and applicationsrdquo IEEE Transactions on
ISRNMachine Vision 7
Pattern Analysis andMachine Intelligence vol 22 no 8 pp 781ndash796 2000
[8] E Shechtman and M Irani ldquoSpace-time behavior based corre-lationrdquo in Proceedings of the IEEE Computer Society Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo05) vol 1pp 405ndash412 June 2005
[9] M D Rodriguez J Ahmed and M Shah ldquoAction MACH aspatio-temporal maximum average correlation height filter foraction recognitionrdquo in Proceedings of the 26th IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo08) June2008
[10] N Ikizler and D Forsyth ldquoSearching video for complex activ-ities with finite state modelsrdquo in Proceedings of the IEEE Com-puter Society Conference on Computer Vision and Pattern Recog-nition (CVPR rsquo07) June 2007
[11] D M Blei and J D Lafferty ldquoCorrelated topic modelsrdquo inAdvances in Neural Information Processing Systems (NIPS) vol18 pp 147ndash154 2006
[12] D M Blei A Y Ng and M I Jordan ldquoLatent Dirichlet alloca-tionrdquo Journal of Machine Learning Research vol 3 no 4-5 pp993ndash1022 2003
[13] T Hofmann ldquoProbabilistic latent semantic indexingrdquo in Pro-ceedings of the 22nd Annual International ACM SIGIR Con-ference on Research and Development in Information Retrieval(SIGIR rsquo99) pp 50ndash57 1999
[14] S J McKenna Y Raja and S Gong ldquoTracking colour objectsusing adaptive mixture modelsrdquo Image and Vision Computingvol 17 no 3-4 pp 225ndash231 1999
[15] J Liu and M Shah ldquoLearning human actions via informationmaximizationrdquo in Proceedings of the 26th IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo08) June2008
[16] YWang andGMori ldquoMax-Margin hidden conditional randomfields for human action recognitionrdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition Workshops (CVPR rsquo09) pp 872ndash879 June 2009
[17] H Jhuang T Serre L Wolf and T Poggio ldquoA biologicallyinspired system for action recognitionrdquo in Proceedings of the 11thIEEE International Conference on Computer Vision (ICCV rsquo07)pp 257ndash267 October 2007
[18] K Rapantzikos Y Avrithis and S Kollias ldquoDense saliency-based spatiotemporal feature points for action recognitionrdquoin Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition Workshops (CVPRrsquo09) pp 1454ndash1461 June 2009
[19] P Dollar V Rabaud G Cottrell and S Belongie ldquoBehaviorrecognition via sparse spatio-temporal featuresrdquo in Proceedingsof the 2nd Joint IEEE International Workshop on Visual Surveil-lance and Performance Evaluation of Tracking and Surveillance(VS-PETS rsquo05) pp 65ndash72 October 2005
[20] Y Ke R Sukthankar and M Hebert ldquoEfficient visual eventdetection using volumetric featuresrdquo in Proceedings of the 10thIEEE International Conference on Computer Vision (ICCV rsquo05)pp 166ndash173 October 2005
[21] A Fathi and GMori ldquoAction recognition by learning mid-levelmotion featuresrdquo in Proceedings of the 26th IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo08) June2008
[22] M Bregonzio S Gong and T Xiang ldquoRecognising action asclouds of space-time interest pointsrdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition Workshops (CVPR rsquo09) pp 1948ndash1955 June 2009
[23] Z Zhang YHu S Chan and L-T Chia ldquoMotion context a newrepresentation for human action recognitionrdquo in Proceeding ofthe European Conference on Computer Vision (ECCV rsquo08) vol4 pp 817ndash829 2008
[24] J C Niebles H Wang and L Fei-Fei ldquoUnsupervised learningof human action categories using spatial-temporalwordsrdquo Inter-national Journal of Computer Vision vol 79 no 3 pp 299ndash3182008
[25] A Klaser M Marszaek and C Schmid ldquoA spatiotemporaldescriptor based on 3D-gradientsrdquo in Proceedings of the BritishMachine Vision Conference (BMVC rsquo08) 2008
[26] S Sadek A Al-Hamadi B Michaelis and U Sayed ldquoHumanaction recognition via affinemoment invariantsrdquo in Proceedingsof the 21st International Conference on Pattern Recognition(ICPR rsquo12) pp 218ndash221 Tsukuba Science City Japan November2012
[27] J Flusser and T Suk ldquoPattern recognition by affine momentinvariantsrdquo Pattern Recognition vol 26 no 1 pp 167ndash174 1993
[28] D Xu and H Li ldquo3-D affine moment invariants generated bygeometric primitivesrdquo in Proceedings of the 18th InternationalConference on Pattern Recognition (ICPR rsquo06) pp 544ndash547August 2006
[29] S Sadek A Al-Hamadi B Michaelis and U Sayed ldquoAn SVMapproach for activity recognition based on chord-length-function shape featuresrdquo inProceedings of the IEEE InternationalConference on Image Processing (ICIP rsquo12) pp 767ndash770OrlandoFla USA October 2012
[30] VN VapnikTheNature of Statistical LearningTheory SpringerNew York NY USA 1995
[31] C Schuldt I Laptev and B Caputo ldquoRecognizing humanactions a local SVM approachrdquo in Proceedings of the 17thInternational Conference on Pattern Recognition (ICPR rsquo04) pp32ndash36 2004
[32] M Blank L Gorelick E Shechtman M Irani and R BasrildquoActions as space-time shapesrdquo in Proceedings of the 10th IEEEInternational Conference on Computer Vision (ICCV rsquo05) vol 2pp 1395ndash1402 October 2005
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
2 ISRNMachine Vision
Figure 1 GMM background subtraction the first and third rows display two sequences of walking and running actions from KTH andWeizmann action datasets respectively while the second and fourth rows show the results of background subtraction where foregroundobjects are shown in cyan color
into the classification module Another thread of research isconcerned with analyzing patterns of motion to recognizehuman actions For instance in [7] periodic motions aredetected and classified to recognize actions Alternativelysome researchers have opted to use both motion and shapecues In [8] the authors detect the similarity between videosegments using a space-time correlation model In [9]Rodriguez et al present a template-based approach usinga Maximum Average Correlation Height (MACH) filter tocapture intraclass variabilities Likewise a significant amountof work is targeted at modelling and understanding humanmotions by constructing elaborated temporal dynamic mod-els [10] There is also an attractive area of research whichfocuses on using generative topic models for visual recogni-tion based on the so-called Bag-of-Words (BoW) model [11]The underlying concept of a BoW is that each video sequenceis represented by counting the number of occurrences ofdescriptor prototypes so-called visual words Topic modelsare built and then applied to the BoW representation Threeexamples of commonly used topic models include Corre-lated Topic Models (CTMs) [11] Latent Dirichlet Alloca-tion (LDA) [12] and probabilistic Latent Semantic Analysis(pLSA) [13]
3 Proposed Methodology
In this section the proposed method for action recognitionis described The main steps of the framework are explainedin detail along the following subsections
31 Background Subtraction In this paper we use GaussianMixture Model (GMM) as a basis to model backgrounddistribution Formally speaking let 119883
119905be a pixel in the
current frame 119868119905 where 119905 is the frame index Then each pixel
can be modeled separately by a mixture of 119870 Gaussians
119875 (119883119905) =
119870
sum
119894=1
120596119894119905120578 (119883119905 120583119894119905 Σ119894119905) (1)
Where 120578 is a Gaussian probability density function 120583119894119905
Σ119894119905 and 120596
119894119905are the mean covariance and an estimate of
the weight of the 119894th Gaussian in the mixture at time 119905respectively 119870 is the number of distributions which is setto 5 in experiments Before the foreground is detected thebackground is updated (see [14] for details about the updatingprocedure) After the updates are done the weights 120596
119894119905are
normalized By applying a threshold 119879 (set to 06 in ourexperiments) the background distribution remains on topwith the lowest variance where
119861 = argmin119887
(sum119887
119894=1120596119894119905
sum119870
119894=1120596119894119905
gt 119879) (2)
Finally all pixels 119883119905that match none of the components
are good candidates to be marked as foreground An exampleof GMM background subtraction can be seen in Figure 1
32 Average Images from 3D Action Volumes The 3D volumein the spatio-temporal (119883119884119879) domain is formed by piling
ISRNMachine Vision 3
Figure 2 2D average image created from the 3D spatio-temporalvolume of a walking sequence
up the target region in the image sequences of one actioncycle which is used to partition the sequences for thespatiotemporal volume An action cycle is a fundamental unitto describe the action In this work we assume that the spatio-temporal volume consists of a number of small voxels Theaverage image 119868
119886V(119909 119910) is defined as
119868119886V (119909 119910) =
1
120591
120591minus1
sum
119905=0
119868 (119909 119910 119905) (3)
where 120591 is the number of frames in action cycle (we use120591 = 25 in our experiments) 119868(119909 119910 119905) represents the densityof the voxels at time 119905 An example of average image createdfrom the 3D spatio-temporal volume of the running sequenceis shown in Figure 2 For characterizing these 2D averageimages the 2D affine moment invariants are considered asfeatures [26]
33 Feature Extraction As is well known the momentsdescribe shape properties of an object as it appears Affinemoment invariants are moment-based descriptors which areinvariant under a general affine transform Six affinemomentinvariants can be conventionally derived from the centralmoments [27] as follows
1198681=
1
1205784
00
[1205782012057802
minus 1205782
11]
1198682=
1
12057810
00
[1205782
031205782
30minus 612057830120578211205781212057803
+ 4120578301205783
12
+4120578031205783
21minus 31205782
211205782
12]
1198683=
1
1205787
00
[12057820
(1205782112057803
minus 1205782
12) minus 12057811
(1205783012057803
minus 1205782112057812)
+12057802
(1205780312057812
minus 1205782
21)]
1198684=
1
12057811
00
[1205783
201205782
03minus 61205782
20120578111205781212057803
minus 61205782
20120578021205782112057803
+ 91205782
20120578021205782
12+ 12120578201205782
111205782112057803
+ 61205782012057811120578021205783012057803
+ 181205782012057811120578021205783012057812
minus 81205783
111205783012057803
minus 6120578201205782
021205783012057812
+ 9120578201205782
021205782
21
+121205782
11120578021205783012057812
minus 6120578111205782
021205783012057812
+ 1205783
021205783
30]
1198685=
1
1205786
00
[1205784012057804
minus 41205783112057813
+ 31205782
22]
1198686=
1
1205789
00
[120578401205780412057822
+ 2120578311205781312057822
minus 120578401205782
13
minus120578041205782
13minus 1205783
22]
(4)
where 120578119901119902
is the central moment of order 119901 + 119902For a spatio-temporal (119883119884119879) space the 3D moment of
order (119901 + 119902 + 119903) of 3D object O is derived using the sameprocedure of the 2D centralized moment
120578119901119902119903
= sumsumsum
(119909119910119905)isinO
(119909 minus 119909119892)119901
(119910 minus 119910119892)119902
(119905 minus 119905119892)119903
119868 (119909 119910 119905)
(5)
Where (119909119892 119910119892 119905119892) is the centroid of object in the spatio-
temporal space Based on the definition of the 3D momentin (5) six 3D affine moment invariants can be defined Thefirst two of these moment invariants are given by
1198691=
1
1205785
000
[120578200
120578020
120578002
+ 2120578110
120578101
120578011
minus 120578200
1205782
011
minus120578020
1205782
101minus 120578002
1205782
110]
1198692=
1
1205787
000
[120578400
(120578040
120578004
+ 31205782
022minus 4120578013
120578031
)
+ 3120578202
(120578040
120578202
minus 4120578112
120578130
+ 41205782
121)
+ 12120578211
(120578022
120578211
+ 120578103
120578130
minus 120578031
120578202
minus120578121
120578112
)
+ 4120578310
(120578031
120578103
minus 120578004
120578220
+3120578013
120578121
minus 3120578022
120578112
)
+ 3120578220
(120578004
120578220
+ 2120578022
120578202
+4120578112
minus 4120578013
120578311
minus 4120578121
120578103
)
+ 4120578301
(120578013
120578130
minus 120578040
120578103
+ 3120578031
120578112
minus3120578022
120578121
) ]
(6)
Due to their long formulae the remaining four momentinvariants are not displayed here (refer to [28]) Figure 3
4 ISRNMachine Vision
Walk Jog Run Box Wave Clap
Walk Jog Run Box Wave ClapWalk Jog Run Box Wave Clap
Walk Jog Run Box Wave Clap
Walk Jog Run Box Wave Clap
Walk Jog Run Box Wave Clap
07
08
09
1
0
02
04
06
08
0
0005
001
0015
002
I 1 I 2
I 3 I 4
I 5 I 6
times10minus4
5
0
minus5
minus10
minus003
minus002
minus001
0
001
minus03
minus02
minus01
0
01
Figure 3 Plots of 2D affine moment invariants (119868119894 119894 = 1 6) computed on the average images of walking jogging running boxing
waving and clapping sequences
shows a series of plots of 2D dynamic affine invariants withdifferent action classes computed on the average images ofaction sequences
34 Action Classification Using SVM In this section we for-mulate the action recognition task as a multiclass learningproblem where there is one class for each action and thegoal is to assign an action to an individual in each videosequence [1 29] There are various supervised learning algo-rithms by which action recognizer can be trained SupportVector Machines (SVMs) are used in this work due to theiroutstanding generalization capability and reputation of ahighly accurate paradigm [30] SVMs that provide a bestsolution to data overfitting in neural networks are basedon the structural risk minimization principle from compu-tational theory Originally SVMs were designed to handledichotomic classes in a higher dimensional space where amaximal separating hyperplane is created On each side ofthis hyperplane two parallel hyperplanes are conductedThen SVM attempts to find the separating hyperplane thatmaximizes the distance between the two parallel hyperplanes(see Figure 4) Intuitively a good separation is achieved bythe hyperplane having the largest distance Hence the largerthemargin the lower the generalization error of the classifierFormally let D = (x
119894 119910119894) | x119894
isin R119889 119910119894
isin minus1 +1 be atraining dataset Vapnik [30] shows that the problem is best
120585i
xi
120585j
xj
120573x+ 1205730
= +1
120573x+ 1205730
= 0
120573x+ 1205730
= minus1
Figure 4 Generalized optimal separating hyperplane
addressed by allowing some examples to violate the marginconstraints These potential violations are formulated withsome positive slack variables 120585
119894and a penalty parameter 119862 ge
0 that penalize the margin violations Thus the generalizedoptimal separating hyperplane is determined by solving thefollowing quadratic programming problem
min1205731205730
1
2
10038171003817100381710038171205731003817100381710038171003817
2
+ 119862sum
119894
120585119894 (7)
subject to (119910119894(⟨x119894120573⟩ + 120573
0) ge 1 minus 120585
119894forall119894) and (120585
119894ge 0 forall119894)
ISRNMachine Vision 5
Geometrically 120573 isin R119889 is a vector going through thecenter and perpendicular to the separating hyperplane Theoffset parameter 120573
0is added to allow the margin to increase
and not to force the hyperplane to pass through the originthat restricts the solution For computational purposes itis more convenient to solve SVM in its dual formulationThis can be accomplished by forming the Lagrangian andthen optimizing over the Lagrangemultiplier 120572The resultingdecision function hasweight vector120573 = sum
119894120572119894x119894119910119894 0 le 120572
119894le 119862
The instances x119894with 120572
119894gt 0 are called support vectors as they
uniquely define the maximummargin hyperplaneIn the current approach several classes of actions are cre-
ated Several one-versus-all SVM classifiers are trained usingaffine moment features extracted from action sequences inthe training dataset For each action sequence a set of six2D affine moment invariants is extracted from the averageimage Also another set of six 3D affine moment invariantsis extracted from the spatio-temporal silhouette sequenceThen SVM classifiers are trained on these features to learnvarious categories of actions
4 Experiments and Results
To evaluate the proposed approach two main experimentswere carried out and the results we achieved were comparedwith those reported by other state-of-the-art methods
41 Experiment 1 We conducted this experiment using KTHaction dataset [31] To illustrate the effectiveness of themethod the obtained results are compared with those ofother similar state-of-the-art methods The KTH datasetcontains action sequences comprised of six types of humanactions (ie walking jogging running boxing handwavingand hand clapping) These actions are performed by a totalof 25 individuals in four different settings (ie outdoorsoutdoors with scale variation outdoors with different clothesand indoors) All sequences were acquired by a static cameraat 25 fps and a spatial resolution of 160 times 120 pixels overhomogeneous backgrounds To the best of our knowledgethere is no other similar dataset already available in theliterature of sequences acquired on different environments Inorder to prepare the experiments and to provide an unbiasedestimation of the generalization abilities of the classificationprocess a set of sequences (75 of all sequences) performedby 18 subjects was used for training and other sequences(the remaining 25) performed by the other 7 subjects wereset aside as a test set SVMs with Gaussian radial basisfunction (RBF) kernel are trained on the training set whilethe evaluation of the recognition performance is performedon the test set
The confusion matrix that shows the recognition resultsachieved on the KTH action dataset is given in Table 1 whilethe comparison of the obtained results with those obtainedby other methods available in the literature is shown inTable 3 As follows from the figures tabulated in Table 1most actions are correctly classified Furthermore there isa high distinction between arm actions and leg actionsMost of the mistakes where confusions occur are betweenldquojoggingrdquo and ldquorunningrdquo actions and between ldquoboxingrdquo and
Table 1 Confusion matrix for the KTH dataset
WalkingRunningJoggingBoxingWavingClapping
Walking
094000004000000000
Running
001096008000000000
Jogging
005004088000000000
Boxing
000000000094002001
Waving
000000000002093003
Clapping
000000000004005096
Action
ldquoclappingrdquo actions This is intuitively plausible due to thefact of high similarity between each pair of these actionsFrom the comparison given by Table 3 it turns out that ourmethod performs competitively with other state-of-the-artmethods It is pertinent to mention here that the state-of-the-art methods with which we compare our method haveused the same dataset and the same experimental conditionstherefore the comparison seems to be quite fair
42 Experiment 2 This second experiment was conductedusing the Weizmann action dataset provided by Blank etal [32] in 2005 which contains a total of 90 video clips(ie 5098 frames) performed by 9 individuals Each videoclip contains one person performing an action There are 10categories of action involved in the dataset namely walkingrunning jumping jumping in place bending jacking skippinggalloping sideways one-hand waving and two-hand wavingTypically all the clips in the dataset are sampled at 25Hz andlast about 2 seconds with image frame size of 180 times 144 Inorder to provide an unbiased estimate of the generalizationabilities of the proposedmethod we have used the leave-one-out cross-validation (LOOCV) technique in the validationprocess As the name suggests this involves using a groupof sequences from a single subject in the original dataset asthe testing data and the remaining sequences as the trainingdata This is repeated such that each group of sequences inthe dataset is used once as the validation Again as with thefirst experiment SVMs with Gaussian RBF kernel are trainedon the training set while the evaluation of the recognitionperformance is performed on the test set
The confusion matrix in Table 2 provides the recognitionresults obtained by the proposed method where correctresponses define the main diagonal From the figures in thematrix a number of points can be drawn The majority ofactions are correctly classified An average recognition rateof 978 is achieved with our proposed method What ismore there is a clear distinction between arm actions andleg actions The mistakes where confusions occur are onlybetween skip and jump actions and between jump and runactions This intuitively seems to be reasonable due to thefact of high closeness or similarity among the actions in eachpair of these actions In order to quantify the effectiveness ofthe method the obtained results are compared qualitativelywith those obtained previously by other investigators Theoutcome of this comparison is presented in Table 3 In thelight of this comparison one can see that the proposedmethod is competitive with the state-of-the-art methods
6 ISRNMachine Vision
Table 2 Confusion matrix for the Weizmann dataset
Action
Bend
Bend
Jump
Jump
Pjump
Pjump
Walk
Walk
Run
Run
Side
Side
Jack
Jack
Skip
Skip
Wave 1
Wave 1
Wave 2
Wave 2
100
000
000
000
000
000
000
000
000
000
000
100
000
000
000
000
000
000
000
000
000
000
100
000
000
000
000
000
000
000
000
000
000
100
000
000
000
000
000
000
000
000
000
000
090
000
000
010
000
000
000
000
000
000
000
100
000
000
000
000
000
000
000
000
000
000
100
000
000
000
000
000
000
000
010
000
000
090
000
000
000
000
000
000
000
000
000
000
000
100
000
000
000
000
000
000
000
000
100
000
Table 3 Comparison with the state of the art on the KTH andWeizmann datasets
Method KTH WeizmannOur method 935 980Liu and Shah [15] 928 mdashWang and Mori [16] 925 mdashJhuang et al [17] 917 mdashRodriguez et al [9] 886 mdashRapantzikos et al [18] 883 mdashDollar et al [19] 812 mdashKe et al [20] 630 mdashFathi and Mori [21] mdash 100Bregonzio et al [22] mdash 966Zhang et al [23] mdash 928Niebles et al [24] mdash 900Dollar et al [19] mdash 852Klaser et al [25] mdash 843
It is worthwhile to mention that all the methods that wecompared our method with except the method proposedin [21] have used similar experimental setups thus thecomparison seems to be meaningful and fair A final remarkconcerns the real-time performance of our approach Theproposed action recognizer runs at 18fps on average (using a28GHz Intel dual core machine with 4GB of RAM running32-bit Windows 7 Professional)
5 Conclusion and Future Work
In this paper we have introduced an approach for activityrecognition based on affine moment invariants for activityrepresentation and SVMs for feature classification On two
benchmark action datasets the results obtained by theproposed approach were compared favorably with thosepublished in the literature The primary focus of our futurework will be to investigate the empirical validation of theapproach on more realistic datasets presenting many techni-cal challenges in data handling such as object articulationocclusion and significant background clutter
References
[1] S Sadek A Al-Hamadi B Michaelis and U Sayed ldquoRecog-nizing human actions a fuzzy approach via chord-length shapefeaturesrdquo ISRN Machine Vision vol 1 pp 1ndash9 2012
[2] A A Efros A C Berg G Mori and J Malik ldquoRecognizingaction at a distancerdquo in Proceedings of the 9th IEEE InternationalConference on Computer Vision (ICCV rsquo03) vol 2 pp 726ndash733October 2003
[3] S Sadek A Al-Hamadi B Michaelis and U Sayed ldquoTowardsrobust human action retrieval in videordquo in Proceedings of theBritish Machine Vision Conference (BMVC rsquo10) AberystwythUK September 2010
[4] S Sadek A Al-Hamadi B Michaelis and U Sayed ldquoHumanactivity recognition a scheme using multiple cuesrdquo in Proceed-ings of the International Symposium on Visual Computing (ISVCrsquo10) vol 1 pp 574ndash583 Las Vegas Nev USA November 2010
[5] S Sadek A AI-Hamadi M Elmezain B Michaelis and USayed ldquoHuman activity recognition via temporal momentinvariantsrdquo in Proceedings of the 10th IEEE International Sym-posiumon Signal Processing and Information Technology (ISSPITrsquo10) pp 79ndash84 Luxor Egypt December 2010
[6] S Sadek A Al-Hamadi B Michaelis and U Sayed ldquoAn actionrecognition scheme using fuzzy log-polar histogram and tem-poral self-similarityrdquo EURASIP Journal on Advances in SignalProcessing vol 2011 Article ID 540375 2011
[7] R Cutler and L S Davis ldquoRobust real-time periodic motiondetection analysis and applicationsrdquo IEEE Transactions on
ISRNMachine Vision 7
Pattern Analysis andMachine Intelligence vol 22 no 8 pp 781ndash796 2000
[8] E Shechtman and M Irani ldquoSpace-time behavior based corre-lationrdquo in Proceedings of the IEEE Computer Society Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo05) vol 1pp 405ndash412 June 2005
[9] M D Rodriguez J Ahmed and M Shah ldquoAction MACH aspatio-temporal maximum average correlation height filter foraction recognitionrdquo in Proceedings of the 26th IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo08) June2008
[10] N Ikizler and D Forsyth ldquoSearching video for complex activ-ities with finite state modelsrdquo in Proceedings of the IEEE Com-puter Society Conference on Computer Vision and Pattern Recog-nition (CVPR rsquo07) June 2007
[11] D M Blei and J D Lafferty ldquoCorrelated topic modelsrdquo inAdvances in Neural Information Processing Systems (NIPS) vol18 pp 147ndash154 2006
[12] D M Blei A Y Ng and M I Jordan ldquoLatent Dirichlet alloca-tionrdquo Journal of Machine Learning Research vol 3 no 4-5 pp993ndash1022 2003
[13] T Hofmann ldquoProbabilistic latent semantic indexingrdquo in Pro-ceedings of the 22nd Annual International ACM SIGIR Con-ference on Research and Development in Information Retrieval(SIGIR rsquo99) pp 50ndash57 1999
[14] S J McKenna Y Raja and S Gong ldquoTracking colour objectsusing adaptive mixture modelsrdquo Image and Vision Computingvol 17 no 3-4 pp 225ndash231 1999
[15] J Liu and M Shah ldquoLearning human actions via informationmaximizationrdquo in Proceedings of the 26th IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo08) June2008
[16] YWang andGMori ldquoMax-Margin hidden conditional randomfields for human action recognitionrdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition Workshops (CVPR rsquo09) pp 872ndash879 June 2009
[17] H Jhuang T Serre L Wolf and T Poggio ldquoA biologicallyinspired system for action recognitionrdquo in Proceedings of the 11thIEEE International Conference on Computer Vision (ICCV rsquo07)pp 257ndash267 October 2007
[18] K Rapantzikos Y Avrithis and S Kollias ldquoDense saliency-based spatiotemporal feature points for action recognitionrdquoin Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition Workshops (CVPRrsquo09) pp 1454ndash1461 June 2009
[19] P Dollar V Rabaud G Cottrell and S Belongie ldquoBehaviorrecognition via sparse spatio-temporal featuresrdquo in Proceedingsof the 2nd Joint IEEE International Workshop on Visual Surveil-lance and Performance Evaluation of Tracking and Surveillance(VS-PETS rsquo05) pp 65ndash72 October 2005
[20] Y Ke R Sukthankar and M Hebert ldquoEfficient visual eventdetection using volumetric featuresrdquo in Proceedings of the 10thIEEE International Conference on Computer Vision (ICCV rsquo05)pp 166ndash173 October 2005
[21] A Fathi and GMori ldquoAction recognition by learning mid-levelmotion featuresrdquo in Proceedings of the 26th IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo08) June2008
[22] M Bregonzio S Gong and T Xiang ldquoRecognising action asclouds of space-time interest pointsrdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition Workshops (CVPR rsquo09) pp 1948ndash1955 June 2009
[23] Z Zhang YHu S Chan and L-T Chia ldquoMotion context a newrepresentation for human action recognitionrdquo in Proceeding ofthe European Conference on Computer Vision (ECCV rsquo08) vol4 pp 817ndash829 2008
[24] J C Niebles H Wang and L Fei-Fei ldquoUnsupervised learningof human action categories using spatial-temporalwordsrdquo Inter-national Journal of Computer Vision vol 79 no 3 pp 299ndash3182008
[25] A Klaser M Marszaek and C Schmid ldquoA spatiotemporaldescriptor based on 3D-gradientsrdquo in Proceedings of the BritishMachine Vision Conference (BMVC rsquo08) 2008
[26] S Sadek A Al-Hamadi B Michaelis and U Sayed ldquoHumanaction recognition via affinemoment invariantsrdquo in Proceedingsof the 21st International Conference on Pattern Recognition(ICPR rsquo12) pp 218ndash221 Tsukuba Science City Japan November2012
[27] J Flusser and T Suk ldquoPattern recognition by affine momentinvariantsrdquo Pattern Recognition vol 26 no 1 pp 167ndash174 1993
[28] D Xu and H Li ldquo3-D affine moment invariants generated bygeometric primitivesrdquo in Proceedings of the 18th InternationalConference on Pattern Recognition (ICPR rsquo06) pp 544ndash547August 2006
[29] S Sadek A Al-Hamadi B Michaelis and U Sayed ldquoAn SVMapproach for activity recognition based on chord-length-function shape featuresrdquo inProceedings of the IEEE InternationalConference on Image Processing (ICIP rsquo12) pp 767ndash770OrlandoFla USA October 2012
[30] VN VapnikTheNature of Statistical LearningTheory SpringerNew York NY USA 1995
[31] C Schuldt I Laptev and B Caputo ldquoRecognizing humanactions a local SVM approachrdquo in Proceedings of the 17thInternational Conference on Pattern Recognition (ICPR rsquo04) pp32ndash36 2004
[32] M Blank L Gorelick E Shechtman M Irani and R BasrildquoActions as space-time shapesrdquo in Proceedings of the 10th IEEEInternational Conference on Computer Vision (ICCV rsquo05) vol 2pp 1395ndash1402 October 2005
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
ISRNMachine Vision 3
Figure 2 2D average image created from the 3D spatio-temporalvolume of a walking sequence
up the target region in the image sequences of one actioncycle which is used to partition the sequences for thespatiotemporal volume An action cycle is a fundamental unitto describe the action In this work we assume that the spatio-temporal volume consists of a number of small voxels Theaverage image 119868
119886V(119909 119910) is defined as
119868119886V (119909 119910) =
1
120591
120591minus1
sum
119905=0
119868 (119909 119910 119905) (3)
where 120591 is the number of frames in action cycle (we use120591 = 25 in our experiments) 119868(119909 119910 119905) represents the densityof the voxels at time 119905 An example of average image createdfrom the 3D spatio-temporal volume of the running sequenceis shown in Figure 2 For characterizing these 2D averageimages the 2D affine moment invariants are considered asfeatures [26]
33 Feature Extraction As is well known the momentsdescribe shape properties of an object as it appears Affinemoment invariants are moment-based descriptors which areinvariant under a general affine transform Six affinemomentinvariants can be conventionally derived from the centralmoments [27] as follows
1198681=
1
1205784
00
[1205782012057802
minus 1205782
11]
1198682=
1
12057810
00
[1205782
031205782
30minus 612057830120578211205781212057803
+ 4120578301205783
12
+4120578031205783
21minus 31205782
211205782
12]
1198683=
1
1205787
00
[12057820
(1205782112057803
minus 1205782
12) minus 12057811
(1205783012057803
minus 1205782112057812)
+12057802
(1205780312057812
minus 1205782
21)]
1198684=
1
12057811
00
[1205783
201205782
03minus 61205782
20120578111205781212057803
minus 61205782
20120578021205782112057803
+ 91205782
20120578021205782
12+ 12120578201205782
111205782112057803
+ 61205782012057811120578021205783012057803
+ 181205782012057811120578021205783012057812
minus 81205783
111205783012057803
minus 6120578201205782
021205783012057812
+ 9120578201205782
021205782
21
+121205782
11120578021205783012057812
minus 6120578111205782
021205783012057812
+ 1205783
021205783
30]
1198685=
1
1205786
00
[1205784012057804
minus 41205783112057813
+ 31205782
22]
1198686=
1
1205789
00
[120578401205780412057822
+ 2120578311205781312057822
minus 120578401205782
13
minus120578041205782
13minus 1205783
22]
(4)
where 120578119901119902
is the central moment of order 119901 + 119902For a spatio-temporal (119883119884119879) space the 3D moment of
order (119901 + 119902 + 119903) of 3D object O is derived using the sameprocedure of the 2D centralized moment
120578119901119902119903
= sumsumsum
(119909119910119905)isinO
(119909 minus 119909119892)119901
(119910 minus 119910119892)119902
(119905 minus 119905119892)119903
119868 (119909 119910 119905)
(5)
Where (119909119892 119910119892 119905119892) is the centroid of object in the spatio-
temporal space Based on the definition of the 3D momentin (5) six 3D affine moment invariants can be defined Thefirst two of these moment invariants are given by
1198691=
1
1205785
000
[120578200
120578020
120578002
+ 2120578110
120578101
120578011
minus 120578200
1205782
011
minus120578020
1205782
101minus 120578002
1205782
110]
1198692=
1
1205787
000
[120578400
(120578040
120578004
+ 31205782
022minus 4120578013
120578031
)
+ 3120578202
(120578040
120578202
minus 4120578112
120578130
+ 41205782
121)
+ 12120578211
(120578022
120578211
+ 120578103
120578130
minus 120578031
120578202
minus120578121
120578112
)
+ 4120578310
(120578031
120578103
minus 120578004
120578220
+3120578013
120578121
minus 3120578022
120578112
)
+ 3120578220
(120578004
120578220
+ 2120578022
120578202
+4120578112
minus 4120578013
120578311
minus 4120578121
120578103
)
+ 4120578301
(120578013
120578130
minus 120578040
120578103
+ 3120578031
120578112
minus3120578022
120578121
) ]
(6)
Due to their long formulae the remaining four momentinvariants are not displayed here (refer to [28]) Figure 3
4 ISRNMachine Vision
Walk Jog Run Box Wave Clap
Walk Jog Run Box Wave ClapWalk Jog Run Box Wave Clap
Walk Jog Run Box Wave Clap
Walk Jog Run Box Wave Clap
Walk Jog Run Box Wave Clap
07
08
09
1
0
02
04
06
08
0
0005
001
0015
002
I 1 I 2
I 3 I 4
I 5 I 6
times10minus4
5
0
minus5
minus10
minus003
minus002
minus001
0
001
minus03
minus02
minus01
0
01
Figure 3 Plots of 2D affine moment invariants (119868119894 119894 = 1 6) computed on the average images of walking jogging running boxing
waving and clapping sequences
shows a series of plots of 2D dynamic affine invariants withdifferent action classes computed on the average images ofaction sequences
34 Action Classification Using SVM In this section we for-mulate the action recognition task as a multiclass learningproblem where there is one class for each action and thegoal is to assign an action to an individual in each videosequence [1 29] There are various supervised learning algo-rithms by which action recognizer can be trained SupportVector Machines (SVMs) are used in this work due to theiroutstanding generalization capability and reputation of ahighly accurate paradigm [30] SVMs that provide a bestsolution to data overfitting in neural networks are basedon the structural risk minimization principle from compu-tational theory Originally SVMs were designed to handledichotomic classes in a higher dimensional space where amaximal separating hyperplane is created On each side ofthis hyperplane two parallel hyperplanes are conductedThen SVM attempts to find the separating hyperplane thatmaximizes the distance between the two parallel hyperplanes(see Figure 4) Intuitively a good separation is achieved bythe hyperplane having the largest distance Hence the largerthemargin the lower the generalization error of the classifierFormally let D = (x
119894 119910119894) | x119894
isin R119889 119910119894
isin minus1 +1 be atraining dataset Vapnik [30] shows that the problem is best
120585i
xi
120585j
xj
120573x+ 1205730
= +1
120573x+ 1205730
= 0
120573x+ 1205730
= minus1
Figure 4 Generalized optimal separating hyperplane
addressed by allowing some examples to violate the marginconstraints These potential violations are formulated withsome positive slack variables 120585
119894and a penalty parameter 119862 ge
0 that penalize the margin violations Thus the generalizedoptimal separating hyperplane is determined by solving thefollowing quadratic programming problem
min1205731205730
1
2
10038171003817100381710038171205731003817100381710038171003817
2
+ 119862sum
119894
120585119894 (7)
subject to (119910119894(⟨x119894120573⟩ + 120573
0) ge 1 minus 120585
119894forall119894) and (120585
119894ge 0 forall119894)
ISRNMachine Vision 5
Geometrically 120573 isin R119889 is a vector going through thecenter and perpendicular to the separating hyperplane Theoffset parameter 120573
0is added to allow the margin to increase
and not to force the hyperplane to pass through the originthat restricts the solution For computational purposes itis more convenient to solve SVM in its dual formulationThis can be accomplished by forming the Lagrangian andthen optimizing over the Lagrangemultiplier 120572The resultingdecision function hasweight vector120573 = sum
119894120572119894x119894119910119894 0 le 120572
119894le 119862
The instances x119894with 120572
119894gt 0 are called support vectors as they
uniquely define the maximummargin hyperplaneIn the current approach several classes of actions are cre-
ated Several one-versus-all SVM classifiers are trained usingaffine moment features extracted from action sequences inthe training dataset For each action sequence a set of six2D affine moment invariants is extracted from the averageimage Also another set of six 3D affine moment invariantsis extracted from the spatio-temporal silhouette sequenceThen SVM classifiers are trained on these features to learnvarious categories of actions
4 Experiments and Results
To evaluate the proposed approach two main experimentswere carried out and the results we achieved were comparedwith those reported by other state-of-the-art methods
41 Experiment 1 We conducted this experiment using KTHaction dataset [31] To illustrate the effectiveness of themethod the obtained results are compared with those ofother similar state-of-the-art methods The KTH datasetcontains action sequences comprised of six types of humanactions (ie walking jogging running boxing handwavingand hand clapping) These actions are performed by a totalof 25 individuals in four different settings (ie outdoorsoutdoors with scale variation outdoors with different clothesand indoors) All sequences were acquired by a static cameraat 25 fps and a spatial resolution of 160 times 120 pixels overhomogeneous backgrounds To the best of our knowledgethere is no other similar dataset already available in theliterature of sequences acquired on different environments Inorder to prepare the experiments and to provide an unbiasedestimation of the generalization abilities of the classificationprocess a set of sequences (75 of all sequences) performedby 18 subjects was used for training and other sequences(the remaining 25) performed by the other 7 subjects wereset aside as a test set SVMs with Gaussian radial basisfunction (RBF) kernel are trained on the training set whilethe evaluation of the recognition performance is performedon the test set
The confusion matrix that shows the recognition resultsachieved on the KTH action dataset is given in Table 1 whilethe comparison of the obtained results with those obtainedby other methods available in the literature is shown inTable 3 As follows from the figures tabulated in Table 1most actions are correctly classified Furthermore there isa high distinction between arm actions and leg actionsMost of the mistakes where confusions occur are betweenldquojoggingrdquo and ldquorunningrdquo actions and between ldquoboxingrdquo and
Table 1 Confusion matrix for the KTH dataset
WalkingRunningJoggingBoxingWavingClapping
Walking
094000004000000000
Running
001096008000000000
Jogging
005004088000000000
Boxing
000000000094002001
Waving
000000000002093003
Clapping
000000000004005096
Action
ldquoclappingrdquo actions This is intuitively plausible due to thefact of high similarity between each pair of these actionsFrom the comparison given by Table 3 it turns out that ourmethod performs competitively with other state-of-the-artmethods It is pertinent to mention here that the state-of-the-art methods with which we compare our method haveused the same dataset and the same experimental conditionstherefore the comparison seems to be quite fair
42 Experiment 2 This second experiment was conductedusing the Weizmann action dataset provided by Blank etal [32] in 2005 which contains a total of 90 video clips(ie 5098 frames) performed by 9 individuals Each videoclip contains one person performing an action There are 10categories of action involved in the dataset namely walkingrunning jumping jumping in place bending jacking skippinggalloping sideways one-hand waving and two-hand wavingTypically all the clips in the dataset are sampled at 25Hz andlast about 2 seconds with image frame size of 180 times 144 Inorder to provide an unbiased estimate of the generalizationabilities of the proposedmethod we have used the leave-one-out cross-validation (LOOCV) technique in the validationprocess As the name suggests this involves using a groupof sequences from a single subject in the original dataset asthe testing data and the remaining sequences as the trainingdata This is repeated such that each group of sequences inthe dataset is used once as the validation Again as with thefirst experiment SVMs with Gaussian RBF kernel are trainedon the training set while the evaluation of the recognitionperformance is performed on the test set
The confusion matrix in Table 2 provides the recognitionresults obtained by the proposed method where correctresponses define the main diagonal From the figures in thematrix a number of points can be drawn The majority ofactions are correctly classified An average recognition rateof 978 is achieved with our proposed method What ismore there is a clear distinction between arm actions andleg actions The mistakes where confusions occur are onlybetween skip and jump actions and between jump and runactions This intuitively seems to be reasonable due to thefact of high closeness or similarity among the actions in eachpair of these actions In order to quantify the effectiveness ofthe method the obtained results are compared qualitativelywith those obtained previously by other investigators Theoutcome of this comparison is presented in Table 3 In thelight of this comparison one can see that the proposedmethod is competitive with the state-of-the-art methods
6 ISRNMachine Vision
Table 2 Confusion matrix for the Weizmann dataset
Action
Bend
Bend
Jump
Jump
Pjump
Pjump
Walk
Walk
Run
Run
Side
Side
Jack
Jack
Skip
Skip
Wave 1
Wave 1
Wave 2
Wave 2
100
000
000
000
000
000
000
000
000
000
000
100
000
000
000
000
000
000
000
000
000
000
100
000
000
000
000
000
000
000
000
000
000
100
000
000
000
000
000
000
000
000
000
000
090
000
000
010
000
000
000
000
000
000
000
100
000
000
000
000
000
000
000
000
000
000
100
000
000
000
000
000
000
000
010
000
000
090
000
000
000
000
000
000
000
000
000
000
000
100
000
000
000
000
000
000
000
000
100
000
Table 3 Comparison with the state of the art on the KTH andWeizmann datasets
Method KTH WeizmannOur method 935 980Liu and Shah [15] 928 mdashWang and Mori [16] 925 mdashJhuang et al [17] 917 mdashRodriguez et al [9] 886 mdashRapantzikos et al [18] 883 mdashDollar et al [19] 812 mdashKe et al [20] 630 mdashFathi and Mori [21] mdash 100Bregonzio et al [22] mdash 966Zhang et al [23] mdash 928Niebles et al [24] mdash 900Dollar et al [19] mdash 852Klaser et al [25] mdash 843
It is worthwhile to mention that all the methods that wecompared our method with except the method proposedin [21] have used similar experimental setups thus thecomparison seems to be meaningful and fair A final remarkconcerns the real-time performance of our approach Theproposed action recognizer runs at 18fps on average (using a28GHz Intel dual core machine with 4GB of RAM running32-bit Windows 7 Professional)
5 Conclusion and Future Work
In this paper we have introduced an approach for activityrecognition based on affine moment invariants for activityrepresentation and SVMs for feature classification On two
benchmark action datasets the results obtained by theproposed approach were compared favorably with thosepublished in the literature The primary focus of our futurework will be to investigate the empirical validation of theapproach on more realistic datasets presenting many techni-cal challenges in data handling such as object articulationocclusion and significant background clutter
References
[1] S Sadek A Al-Hamadi B Michaelis and U Sayed ldquoRecog-nizing human actions a fuzzy approach via chord-length shapefeaturesrdquo ISRN Machine Vision vol 1 pp 1ndash9 2012
[2] A A Efros A C Berg G Mori and J Malik ldquoRecognizingaction at a distancerdquo in Proceedings of the 9th IEEE InternationalConference on Computer Vision (ICCV rsquo03) vol 2 pp 726ndash733October 2003
[3] S Sadek A Al-Hamadi B Michaelis and U Sayed ldquoTowardsrobust human action retrieval in videordquo in Proceedings of theBritish Machine Vision Conference (BMVC rsquo10) AberystwythUK September 2010
[4] S Sadek A Al-Hamadi B Michaelis and U Sayed ldquoHumanactivity recognition a scheme using multiple cuesrdquo in Proceed-ings of the International Symposium on Visual Computing (ISVCrsquo10) vol 1 pp 574ndash583 Las Vegas Nev USA November 2010
[5] S Sadek A AI-Hamadi M Elmezain B Michaelis and USayed ldquoHuman activity recognition via temporal momentinvariantsrdquo in Proceedings of the 10th IEEE International Sym-posiumon Signal Processing and Information Technology (ISSPITrsquo10) pp 79ndash84 Luxor Egypt December 2010
[6] S Sadek A Al-Hamadi B Michaelis and U Sayed ldquoAn actionrecognition scheme using fuzzy log-polar histogram and tem-poral self-similarityrdquo EURASIP Journal on Advances in SignalProcessing vol 2011 Article ID 540375 2011
[7] R Cutler and L S Davis ldquoRobust real-time periodic motiondetection analysis and applicationsrdquo IEEE Transactions on
ISRNMachine Vision 7
Pattern Analysis andMachine Intelligence vol 22 no 8 pp 781ndash796 2000
[8] E Shechtman and M Irani ldquoSpace-time behavior based corre-lationrdquo in Proceedings of the IEEE Computer Society Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo05) vol 1pp 405ndash412 June 2005
[9] M D Rodriguez J Ahmed and M Shah ldquoAction MACH aspatio-temporal maximum average correlation height filter foraction recognitionrdquo in Proceedings of the 26th IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo08) June2008
[10] N Ikizler and D Forsyth ldquoSearching video for complex activ-ities with finite state modelsrdquo in Proceedings of the IEEE Com-puter Society Conference on Computer Vision and Pattern Recog-nition (CVPR rsquo07) June 2007
[11] D M Blei and J D Lafferty ldquoCorrelated topic modelsrdquo inAdvances in Neural Information Processing Systems (NIPS) vol18 pp 147ndash154 2006
[12] D M Blei A Y Ng and M I Jordan ldquoLatent Dirichlet alloca-tionrdquo Journal of Machine Learning Research vol 3 no 4-5 pp993ndash1022 2003
[13] T Hofmann ldquoProbabilistic latent semantic indexingrdquo in Pro-ceedings of the 22nd Annual International ACM SIGIR Con-ference on Research and Development in Information Retrieval(SIGIR rsquo99) pp 50ndash57 1999
[14] S J McKenna Y Raja and S Gong ldquoTracking colour objectsusing adaptive mixture modelsrdquo Image and Vision Computingvol 17 no 3-4 pp 225ndash231 1999
[15] J Liu and M Shah ldquoLearning human actions via informationmaximizationrdquo in Proceedings of the 26th IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo08) June2008
[16] YWang andGMori ldquoMax-Margin hidden conditional randomfields for human action recognitionrdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition Workshops (CVPR rsquo09) pp 872ndash879 June 2009
[17] H Jhuang T Serre L Wolf and T Poggio ldquoA biologicallyinspired system for action recognitionrdquo in Proceedings of the 11thIEEE International Conference on Computer Vision (ICCV rsquo07)pp 257ndash267 October 2007
[18] K Rapantzikos Y Avrithis and S Kollias ldquoDense saliency-based spatiotemporal feature points for action recognitionrdquoin Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition Workshops (CVPRrsquo09) pp 1454ndash1461 June 2009
[19] P Dollar V Rabaud G Cottrell and S Belongie ldquoBehaviorrecognition via sparse spatio-temporal featuresrdquo in Proceedingsof the 2nd Joint IEEE International Workshop on Visual Surveil-lance and Performance Evaluation of Tracking and Surveillance(VS-PETS rsquo05) pp 65ndash72 October 2005
[20] Y Ke R Sukthankar and M Hebert ldquoEfficient visual eventdetection using volumetric featuresrdquo in Proceedings of the 10thIEEE International Conference on Computer Vision (ICCV rsquo05)pp 166ndash173 October 2005
[21] A Fathi and GMori ldquoAction recognition by learning mid-levelmotion featuresrdquo in Proceedings of the 26th IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo08) June2008
[22] M Bregonzio S Gong and T Xiang ldquoRecognising action asclouds of space-time interest pointsrdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition Workshops (CVPR rsquo09) pp 1948ndash1955 June 2009
[23] Z Zhang YHu S Chan and L-T Chia ldquoMotion context a newrepresentation for human action recognitionrdquo in Proceeding ofthe European Conference on Computer Vision (ECCV rsquo08) vol4 pp 817ndash829 2008
[24] J C Niebles H Wang and L Fei-Fei ldquoUnsupervised learningof human action categories using spatial-temporalwordsrdquo Inter-national Journal of Computer Vision vol 79 no 3 pp 299ndash3182008
[25] A Klaser M Marszaek and C Schmid ldquoA spatiotemporaldescriptor based on 3D-gradientsrdquo in Proceedings of the BritishMachine Vision Conference (BMVC rsquo08) 2008
[26] S Sadek A Al-Hamadi B Michaelis and U Sayed ldquoHumanaction recognition via affinemoment invariantsrdquo in Proceedingsof the 21st International Conference on Pattern Recognition(ICPR rsquo12) pp 218ndash221 Tsukuba Science City Japan November2012
[27] J Flusser and T Suk ldquoPattern recognition by affine momentinvariantsrdquo Pattern Recognition vol 26 no 1 pp 167ndash174 1993
[28] D Xu and H Li ldquo3-D affine moment invariants generated bygeometric primitivesrdquo in Proceedings of the 18th InternationalConference on Pattern Recognition (ICPR rsquo06) pp 544ndash547August 2006
[29] S Sadek A Al-Hamadi B Michaelis and U Sayed ldquoAn SVMapproach for activity recognition based on chord-length-function shape featuresrdquo inProceedings of the IEEE InternationalConference on Image Processing (ICIP rsquo12) pp 767ndash770OrlandoFla USA October 2012
[30] VN VapnikTheNature of Statistical LearningTheory SpringerNew York NY USA 1995
[31] C Schuldt I Laptev and B Caputo ldquoRecognizing humanactions a local SVM approachrdquo in Proceedings of the 17thInternational Conference on Pattern Recognition (ICPR rsquo04) pp32ndash36 2004
[32] M Blank L Gorelick E Shechtman M Irani and R BasrildquoActions as space-time shapesrdquo in Proceedings of the 10th IEEEInternational Conference on Computer Vision (ICCV rsquo05) vol 2pp 1395ndash1402 October 2005
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
4 ISRNMachine Vision
Walk Jog Run Box Wave Clap
Walk Jog Run Box Wave ClapWalk Jog Run Box Wave Clap
Walk Jog Run Box Wave Clap
Walk Jog Run Box Wave Clap
Walk Jog Run Box Wave Clap
07
08
09
1
0
02
04
06
08
0
0005
001
0015
002
I 1 I 2
I 3 I 4
I 5 I 6
times10minus4
5
0
minus5
minus10
minus003
minus002
minus001
0
001
minus03
minus02
minus01
0
01
Figure 3 Plots of 2D affine moment invariants (119868119894 119894 = 1 6) computed on the average images of walking jogging running boxing
waving and clapping sequences
shows a series of plots of 2D dynamic affine invariants withdifferent action classes computed on the average images ofaction sequences
34 Action Classification Using SVM In this section we for-mulate the action recognition task as a multiclass learningproblem where there is one class for each action and thegoal is to assign an action to an individual in each videosequence [1 29] There are various supervised learning algo-rithms by which action recognizer can be trained SupportVector Machines (SVMs) are used in this work due to theiroutstanding generalization capability and reputation of ahighly accurate paradigm [30] SVMs that provide a bestsolution to data overfitting in neural networks are basedon the structural risk minimization principle from compu-tational theory Originally SVMs were designed to handledichotomic classes in a higher dimensional space where amaximal separating hyperplane is created On each side ofthis hyperplane two parallel hyperplanes are conductedThen SVM attempts to find the separating hyperplane thatmaximizes the distance between the two parallel hyperplanes(see Figure 4) Intuitively a good separation is achieved bythe hyperplane having the largest distance Hence the largerthemargin the lower the generalization error of the classifierFormally let D = (x
119894 119910119894) | x119894
isin R119889 119910119894
isin minus1 +1 be atraining dataset Vapnik [30] shows that the problem is best
120585i
xi
120585j
xj
120573x+ 1205730
= +1
120573x+ 1205730
= 0
120573x+ 1205730
= minus1
Figure 4 Generalized optimal separating hyperplane
addressed by allowing some examples to violate the marginconstraints These potential violations are formulated withsome positive slack variables 120585
119894and a penalty parameter 119862 ge
0 that penalize the margin violations Thus the generalizedoptimal separating hyperplane is determined by solving thefollowing quadratic programming problem
min1205731205730
1
2
10038171003817100381710038171205731003817100381710038171003817
2
+ 119862sum
119894
120585119894 (7)
subject to (119910119894(⟨x119894120573⟩ + 120573
0) ge 1 minus 120585
119894forall119894) and (120585
119894ge 0 forall119894)
ISRNMachine Vision 5
Geometrically 120573 isin R119889 is a vector going through thecenter and perpendicular to the separating hyperplane Theoffset parameter 120573
0is added to allow the margin to increase
and not to force the hyperplane to pass through the originthat restricts the solution For computational purposes itis more convenient to solve SVM in its dual formulationThis can be accomplished by forming the Lagrangian andthen optimizing over the Lagrangemultiplier 120572The resultingdecision function hasweight vector120573 = sum
119894120572119894x119894119910119894 0 le 120572
119894le 119862
The instances x119894with 120572
119894gt 0 are called support vectors as they
uniquely define the maximummargin hyperplaneIn the current approach several classes of actions are cre-
ated Several one-versus-all SVM classifiers are trained usingaffine moment features extracted from action sequences inthe training dataset For each action sequence a set of six2D affine moment invariants is extracted from the averageimage Also another set of six 3D affine moment invariantsis extracted from the spatio-temporal silhouette sequenceThen SVM classifiers are trained on these features to learnvarious categories of actions
4 Experiments and Results
To evaluate the proposed approach two main experimentswere carried out and the results we achieved were comparedwith those reported by other state-of-the-art methods
41 Experiment 1 We conducted this experiment using KTHaction dataset [31] To illustrate the effectiveness of themethod the obtained results are compared with those ofother similar state-of-the-art methods The KTH datasetcontains action sequences comprised of six types of humanactions (ie walking jogging running boxing handwavingand hand clapping) These actions are performed by a totalof 25 individuals in four different settings (ie outdoorsoutdoors with scale variation outdoors with different clothesand indoors) All sequences were acquired by a static cameraat 25 fps and a spatial resolution of 160 times 120 pixels overhomogeneous backgrounds To the best of our knowledgethere is no other similar dataset already available in theliterature of sequences acquired on different environments Inorder to prepare the experiments and to provide an unbiasedestimation of the generalization abilities of the classificationprocess a set of sequences (75 of all sequences) performedby 18 subjects was used for training and other sequences(the remaining 25) performed by the other 7 subjects wereset aside as a test set SVMs with Gaussian radial basisfunction (RBF) kernel are trained on the training set whilethe evaluation of the recognition performance is performedon the test set
The confusion matrix that shows the recognition resultsachieved on the KTH action dataset is given in Table 1 whilethe comparison of the obtained results with those obtainedby other methods available in the literature is shown inTable 3 As follows from the figures tabulated in Table 1most actions are correctly classified Furthermore there isa high distinction between arm actions and leg actionsMost of the mistakes where confusions occur are betweenldquojoggingrdquo and ldquorunningrdquo actions and between ldquoboxingrdquo and
Table 1 Confusion matrix for the KTH dataset
WalkingRunningJoggingBoxingWavingClapping
Walking
094000004000000000
Running
001096008000000000
Jogging
005004088000000000
Boxing
000000000094002001
Waving
000000000002093003
Clapping
000000000004005096
Action
ldquoclappingrdquo actions This is intuitively plausible due to thefact of high similarity between each pair of these actionsFrom the comparison given by Table 3 it turns out that ourmethod performs competitively with other state-of-the-artmethods It is pertinent to mention here that the state-of-the-art methods with which we compare our method haveused the same dataset and the same experimental conditionstherefore the comparison seems to be quite fair
42 Experiment 2 This second experiment was conductedusing the Weizmann action dataset provided by Blank etal [32] in 2005 which contains a total of 90 video clips(ie 5098 frames) performed by 9 individuals Each videoclip contains one person performing an action There are 10categories of action involved in the dataset namely walkingrunning jumping jumping in place bending jacking skippinggalloping sideways one-hand waving and two-hand wavingTypically all the clips in the dataset are sampled at 25Hz andlast about 2 seconds with image frame size of 180 times 144 Inorder to provide an unbiased estimate of the generalizationabilities of the proposedmethod we have used the leave-one-out cross-validation (LOOCV) technique in the validationprocess As the name suggests this involves using a groupof sequences from a single subject in the original dataset asthe testing data and the remaining sequences as the trainingdata This is repeated such that each group of sequences inthe dataset is used once as the validation Again as with thefirst experiment SVMs with Gaussian RBF kernel are trainedon the training set while the evaluation of the recognitionperformance is performed on the test set
The confusion matrix in Table 2 provides the recognitionresults obtained by the proposed method where correctresponses define the main diagonal From the figures in thematrix a number of points can be drawn The majority ofactions are correctly classified An average recognition rateof 978 is achieved with our proposed method What ismore there is a clear distinction between arm actions andleg actions The mistakes where confusions occur are onlybetween skip and jump actions and between jump and runactions This intuitively seems to be reasonable due to thefact of high closeness or similarity among the actions in eachpair of these actions In order to quantify the effectiveness ofthe method the obtained results are compared qualitativelywith those obtained previously by other investigators Theoutcome of this comparison is presented in Table 3 In thelight of this comparison one can see that the proposedmethod is competitive with the state-of-the-art methods
6 ISRNMachine Vision
Table 2 Confusion matrix for the Weizmann dataset
Action
Bend
Bend
Jump
Jump
Pjump
Pjump
Walk
Walk
Run
Run
Side
Side
Jack
Jack
Skip
Skip
Wave 1
Wave 1
Wave 2
Wave 2
100
000
000
000
000
000
000
000
000
000
000
100
000
000
000
000
000
000
000
000
000
000
100
000
000
000
000
000
000
000
000
000
000
100
000
000
000
000
000
000
000
000
000
000
090
000
000
010
000
000
000
000
000
000
000
100
000
000
000
000
000
000
000
000
000
000
100
000
000
000
000
000
000
000
010
000
000
090
000
000
000
000
000
000
000
000
000
000
000
100
000
000
000
000
000
000
000
000
100
000
Table 3 Comparison with the state of the art on the KTH andWeizmann datasets
Method KTH WeizmannOur method 935 980Liu and Shah [15] 928 mdashWang and Mori [16] 925 mdashJhuang et al [17] 917 mdashRodriguez et al [9] 886 mdashRapantzikos et al [18] 883 mdashDollar et al [19] 812 mdashKe et al [20] 630 mdashFathi and Mori [21] mdash 100Bregonzio et al [22] mdash 966Zhang et al [23] mdash 928Niebles et al [24] mdash 900Dollar et al [19] mdash 852Klaser et al [25] mdash 843
It is worthwhile to mention that all the methods that wecompared our method with except the method proposedin [21] have used similar experimental setups thus thecomparison seems to be meaningful and fair A final remarkconcerns the real-time performance of our approach Theproposed action recognizer runs at 18fps on average (using a28GHz Intel dual core machine with 4GB of RAM running32-bit Windows 7 Professional)
5 Conclusion and Future Work
In this paper we have introduced an approach for activityrecognition based on affine moment invariants for activityrepresentation and SVMs for feature classification On two
benchmark action datasets the results obtained by theproposed approach were compared favorably with thosepublished in the literature The primary focus of our futurework will be to investigate the empirical validation of theapproach on more realistic datasets presenting many techni-cal challenges in data handling such as object articulationocclusion and significant background clutter
References
[1] S Sadek A Al-Hamadi B Michaelis and U Sayed ldquoRecog-nizing human actions a fuzzy approach via chord-length shapefeaturesrdquo ISRN Machine Vision vol 1 pp 1ndash9 2012
[2] A A Efros A C Berg G Mori and J Malik ldquoRecognizingaction at a distancerdquo in Proceedings of the 9th IEEE InternationalConference on Computer Vision (ICCV rsquo03) vol 2 pp 726ndash733October 2003
[3] S Sadek A Al-Hamadi B Michaelis and U Sayed ldquoTowardsrobust human action retrieval in videordquo in Proceedings of theBritish Machine Vision Conference (BMVC rsquo10) AberystwythUK September 2010
[4] S Sadek A Al-Hamadi B Michaelis and U Sayed ldquoHumanactivity recognition a scheme using multiple cuesrdquo in Proceed-ings of the International Symposium on Visual Computing (ISVCrsquo10) vol 1 pp 574ndash583 Las Vegas Nev USA November 2010
[5] S Sadek A AI-Hamadi M Elmezain B Michaelis and USayed ldquoHuman activity recognition via temporal momentinvariantsrdquo in Proceedings of the 10th IEEE International Sym-posiumon Signal Processing and Information Technology (ISSPITrsquo10) pp 79ndash84 Luxor Egypt December 2010
[6] S Sadek A Al-Hamadi B Michaelis and U Sayed ldquoAn actionrecognition scheme using fuzzy log-polar histogram and tem-poral self-similarityrdquo EURASIP Journal on Advances in SignalProcessing vol 2011 Article ID 540375 2011
[7] R Cutler and L S Davis ldquoRobust real-time periodic motiondetection analysis and applicationsrdquo IEEE Transactions on
ISRNMachine Vision 7
Pattern Analysis andMachine Intelligence vol 22 no 8 pp 781ndash796 2000
[8] E Shechtman and M Irani ldquoSpace-time behavior based corre-lationrdquo in Proceedings of the IEEE Computer Society Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo05) vol 1pp 405ndash412 June 2005
[9] M D Rodriguez J Ahmed and M Shah ldquoAction MACH aspatio-temporal maximum average correlation height filter foraction recognitionrdquo in Proceedings of the 26th IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo08) June2008
[10] N Ikizler and D Forsyth ldquoSearching video for complex activ-ities with finite state modelsrdquo in Proceedings of the IEEE Com-puter Society Conference on Computer Vision and Pattern Recog-nition (CVPR rsquo07) June 2007
[11] D M Blei and J D Lafferty ldquoCorrelated topic modelsrdquo inAdvances in Neural Information Processing Systems (NIPS) vol18 pp 147ndash154 2006
[12] D M Blei A Y Ng and M I Jordan ldquoLatent Dirichlet alloca-tionrdquo Journal of Machine Learning Research vol 3 no 4-5 pp993ndash1022 2003
[13] T Hofmann ldquoProbabilistic latent semantic indexingrdquo in Pro-ceedings of the 22nd Annual International ACM SIGIR Con-ference on Research and Development in Information Retrieval(SIGIR rsquo99) pp 50ndash57 1999
[14] S J McKenna Y Raja and S Gong ldquoTracking colour objectsusing adaptive mixture modelsrdquo Image and Vision Computingvol 17 no 3-4 pp 225ndash231 1999
[15] J Liu and M Shah ldquoLearning human actions via informationmaximizationrdquo in Proceedings of the 26th IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo08) June2008
[16] YWang andGMori ldquoMax-Margin hidden conditional randomfields for human action recognitionrdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition Workshops (CVPR rsquo09) pp 872ndash879 June 2009
[17] H Jhuang T Serre L Wolf and T Poggio ldquoA biologicallyinspired system for action recognitionrdquo in Proceedings of the 11thIEEE International Conference on Computer Vision (ICCV rsquo07)pp 257ndash267 October 2007
[18] K Rapantzikos Y Avrithis and S Kollias ldquoDense saliency-based spatiotemporal feature points for action recognitionrdquoin Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition Workshops (CVPRrsquo09) pp 1454ndash1461 June 2009
[19] P Dollar V Rabaud G Cottrell and S Belongie ldquoBehaviorrecognition via sparse spatio-temporal featuresrdquo in Proceedingsof the 2nd Joint IEEE International Workshop on Visual Surveil-lance and Performance Evaluation of Tracking and Surveillance(VS-PETS rsquo05) pp 65ndash72 October 2005
[20] Y Ke R Sukthankar and M Hebert ldquoEfficient visual eventdetection using volumetric featuresrdquo in Proceedings of the 10thIEEE International Conference on Computer Vision (ICCV rsquo05)pp 166ndash173 October 2005
[21] A Fathi and GMori ldquoAction recognition by learning mid-levelmotion featuresrdquo in Proceedings of the 26th IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo08) June2008
[22] M Bregonzio S Gong and T Xiang ldquoRecognising action asclouds of space-time interest pointsrdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition Workshops (CVPR rsquo09) pp 1948ndash1955 June 2009
[23] Z Zhang YHu S Chan and L-T Chia ldquoMotion context a newrepresentation for human action recognitionrdquo in Proceeding ofthe European Conference on Computer Vision (ECCV rsquo08) vol4 pp 817ndash829 2008
[24] J C Niebles H Wang and L Fei-Fei ldquoUnsupervised learningof human action categories using spatial-temporalwordsrdquo Inter-national Journal of Computer Vision vol 79 no 3 pp 299ndash3182008
[25] A Klaser M Marszaek and C Schmid ldquoA spatiotemporaldescriptor based on 3D-gradientsrdquo in Proceedings of the BritishMachine Vision Conference (BMVC rsquo08) 2008
[26] S Sadek A Al-Hamadi B Michaelis and U Sayed ldquoHumanaction recognition via affinemoment invariantsrdquo in Proceedingsof the 21st International Conference on Pattern Recognition(ICPR rsquo12) pp 218ndash221 Tsukuba Science City Japan November2012
[27] J Flusser and T Suk ldquoPattern recognition by affine momentinvariantsrdquo Pattern Recognition vol 26 no 1 pp 167ndash174 1993
[28] D Xu and H Li ldquo3-D affine moment invariants generated bygeometric primitivesrdquo in Proceedings of the 18th InternationalConference on Pattern Recognition (ICPR rsquo06) pp 544ndash547August 2006
[29] S Sadek A Al-Hamadi B Michaelis and U Sayed ldquoAn SVMapproach for activity recognition based on chord-length-function shape featuresrdquo inProceedings of the IEEE InternationalConference on Image Processing (ICIP rsquo12) pp 767ndash770OrlandoFla USA October 2012
[30] VN VapnikTheNature of Statistical LearningTheory SpringerNew York NY USA 1995
[31] C Schuldt I Laptev and B Caputo ldquoRecognizing humanactions a local SVM approachrdquo in Proceedings of the 17thInternational Conference on Pattern Recognition (ICPR rsquo04) pp32ndash36 2004
[32] M Blank L Gorelick E Shechtman M Irani and R BasrildquoActions as space-time shapesrdquo in Proceedings of the 10th IEEEInternational Conference on Computer Vision (ICCV rsquo05) vol 2pp 1395ndash1402 October 2005
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
ISRNMachine Vision 5
Geometrically 120573 isin R119889 is a vector going through thecenter and perpendicular to the separating hyperplane Theoffset parameter 120573
0is added to allow the margin to increase
and not to force the hyperplane to pass through the originthat restricts the solution For computational purposes itis more convenient to solve SVM in its dual formulationThis can be accomplished by forming the Lagrangian andthen optimizing over the Lagrangemultiplier 120572The resultingdecision function hasweight vector120573 = sum
119894120572119894x119894119910119894 0 le 120572
119894le 119862
The instances x119894with 120572
119894gt 0 are called support vectors as they
uniquely define the maximummargin hyperplaneIn the current approach several classes of actions are cre-
ated Several one-versus-all SVM classifiers are trained usingaffine moment features extracted from action sequences inthe training dataset For each action sequence a set of six2D affine moment invariants is extracted from the averageimage Also another set of six 3D affine moment invariantsis extracted from the spatio-temporal silhouette sequenceThen SVM classifiers are trained on these features to learnvarious categories of actions
4 Experiments and Results
To evaluate the proposed approach two main experimentswere carried out and the results we achieved were comparedwith those reported by other state-of-the-art methods
41 Experiment 1 We conducted this experiment using KTHaction dataset [31] To illustrate the effectiveness of themethod the obtained results are compared with those ofother similar state-of-the-art methods The KTH datasetcontains action sequences comprised of six types of humanactions (ie walking jogging running boxing handwavingand hand clapping) These actions are performed by a totalof 25 individuals in four different settings (ie outdoorsoutdoors with scale variation outdoors with different clothesand indoors) All sequences were acquired by a static cameraat 25 fps and a spatial resolution of 160 times 120 pixels overhomogeneous backgrounds To the best of our knowledgethere is no other similar dataset already available in theliterature of sequences acquired on different environments Inorder to prepare the experiments and to provide an unbiasedestimation of the generalization abilities of the classificationprocess a set of sequences (75 of all sequences) performedby 18 subjects was used for training and other sequences(the remaining 25) performed by the other 7 subjects wereset aside as a test set SVMs with Gaussian radial basisfunction (RBF) kernel are trained on the training set whilethe evaluation of the recognition performance is performedon the test set
The confusion matrix that shows the recognition resultsachieved on the KTH action dataset is given in Table 1 whilethe comparison of the obtained results with those obtainedby other methods available in the literature is shown inTable 3 As follows from the figures tabulated in Table 1most actions are correctly classified Furthermore there isa high distinction between arm actions and leg actionsMost of the mistakes where confusions occur are betweenldquojoggingrdquo and ldquorunningrdquo actions and between ldquoboxingrdquo and
Table 1 Confusion matrix for the KTH dataset
WalkingRunningJoggingBoxingWavingClapping
Walking
094000004000000000
Running
001096008000000000
Jogging
005004088000000000
Boxing
000000000094002001
Waving
000000000002093003
Clapping
000000000004005096
Action
ldquoclappingrdquo actions This is intuitively plausible due to thefact of high similarity between each pair of these actionsFrom the comparison given by Table 3 it turns out that ourmethod performs competitively with other state-of-the-artmethods It is pertinent to mention here that the state-of-the-art methods with which we compare our method haveused the same dataset and the same experimental conditionstherefore the comparison seems to be quite fair
42 Experiment 2 This second experiment was conductedusing the Weizmann action dataset provided by Blank etal [32] in 2005 which contains a total of 90 video clips(ie 5098 frames) performed by 9 individuals Each videoclip contains one person performing an action There are 10categories of action involved in the dataset namely walkingrunning jumping jumping in place bending jacking skippinggalloping sideways one-hand waving and two-hand wavingTypically all the clips in the dataset are sampled at 25Hz andlast about 2 seconds with image frame size of 180 times 144 Inorder to provide an unbiased estimate of the generalizationabilities of the proposedmethod we have used the leave-one-out cross-validation (LOOCV) technique in the validationprocess As the name suggests this involves using a groupof sequences from a single subject in the original dataset asthe testing data and the remaining sequences as the trainingdata This is repeated such that each group of sequences inthe dataset is used once as the validation Again as with thefirst experiment SVMs with Gaussian RBF kernel are trainedon the training set while the evaluation of the recognitionperformance is performed on the test set
The confusion matrix in Table 2 provides the recognitionresults obtained by the proposed method where correctresponses define the main diagonal From the figures in thematrix a number of points can be drawn The majority ofactions are correctly classified An average recognition rateof 978 is achieved with our proposed method What ismore there is a clear distinction between arm actions andleg actions The mistakes where confusions occur are onlybetween skip and jump actions and between jump and runactions This intuitively seems to be reasonable due to thefact of high closeness or similarity among the actions in eachpair of these actions In order to quantify the effectiveness ofthe method the obtained results are compared qualitativelywith those obtained previously by other investigators Theoutcome of this comparison is presented in Table 3 In thelight of this comparison one can see that the proposedmethod is competitive with the state-of-the-art methods
6 ISRNMachine Vision
Table 2 Confusion matrix for the Weizmann dataset
Action
Bend
Bend
Jump
Jump
Pjump
Pjump
Walk
Walk
Run
Run
Side
Side
Jack
Jack
Skip
Skip
Wave 1
Wave 1
Wave 2
Wave 2
100
000
000
000
000
000
000
000
000
000
000
100
000
000
000
000
000
000
000
000
000
000
100
000
000
000
000
000
000
000
000
000
000
100
000
000
000
000
000
000
000
000
000
000
090
000
000
010
000
000
000
000
000
000
000
100
000
000
000
000
000
000
000
000
000
000
100
000
000
000
000
000
000
000
010
000
000
090
000
000
000
000
000
000
000
000
000
000
000
100
000
000
000
000
000
000
000
000
100
000
Table 3 Comparison with the state of the art on the KTH andWeizmann datasets
Method KTH WeizmannOur method 935 980Liu and Shah [15] 928 mdashWang and Mori [16] 925 mdashJhuang et al [17] 917 mdashRodriguez et al [9] 886 mdashRapantzikos et al [18] 883 mdashDollar et al [19] 812 mdashKe et al [20] 630 mdashFathi and Mori [21] mdash 100Bregonzio et al [22] mdash 966Zhang et al [23] mdash 928Niebles et al [24] mdash 900Dollar et al [19] mdash 852Klaser et al [25] mdash 843
It is worthwhile to mention that all the methods that wecompared our method with except the method proposedin [21] have used similar experimental setups thus thecomparison seems to be meaningful and fair A final remarkconcerns the real-time performance of our approach Theproposed action recognizer runs at 18fps on average (using a28GHz Intel dual core machine with 4GB of RAM running32-bit Windows 7 Professional)
5 Conclusion and Future Work
In this paper we have introduced an approach for activityrecognition based on affine moment invariants for activityrepresentation and SVMs for feature classification On two
benchmark action datasets the results obtained by theproposed approach were compared favorably with thosepublished in the literature The primary focus of our futurework will be to investigate the empirical validation of theapproach on more realistic datasets presenting many techni-cal challenges in data handling such as object articulationocclusion and significant background clutter
References
[1] S Sadek A Al-Hamadi B Michaelis and U Sayed ldquoRecog-nizing human actions a fuzzy approach via chord-length shapefeaturesrdquo ISRN Machine Vision vol 1 pp 1ndash9 2012
[2] A A Efros A C Berg G Mori and J Malik ldquoRecognizingaction at a distancerdquo in Proceedings of the 9th IEEE InternationalConference on Computer Vision (ICCV rsquo03) vol 2 pp 726ndash733October 2003
[3] S Sadek A Al-Hamadi B Michaelis and U Sayed ldquoTowardsrobust human action retrieval in videordquo in Proceedings of theBritish Machine Vision Conference (BMVC rsquo10) AberystwythUK September 2010
[4] S Sadek A Al-Hamadi B Michaelis and U Sayed ldquoHumanactivity recognition a scheme using multiple cuesrdquo in Proceed-ings of the International Symposium on Visual Computing (ISVCrsquo10) vol 1 pp 574ndash583 Las Vegas Nev USA November 2010
[5] S Sadek A AI-Hamadi M Elmezain B Michaelis and USayed ldquoHuman activity recognition via temporal momentinvariantsrdquo in Proceedings of the 10th IEEE International Sym-posiumon Signal Processing and Information Technology (ISSPITrsquo10) pp 79ndash84 Luxor Egypt December 2010
[6] S Sadek A Al-Hamadi B Michaelis and U Sayed ldquoAn actionrecognition scheme using fuzzy log-polar histogram and tem-poral self-similarityrdquo EURASIP Journal on Advances in SignalProcessing vol 2011 Article ID 540375 2011
[7] R Cutler and L S Davis ldquoRobust real-time periodic motiondetection analysis and applicationsrdquo IEEE Transactions on
ISRNMachine Vision 7
Pattern Analysis andMachine Intelligence vol 22 no 8 pp 781ndash796 2000
[8] E Shechtman and M Irani ldquoSpace-time behavior based corre-lationrdquo in Proceedings of the IEEE Computer Society Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo05) vol 1pp 405ndash412 June 2005
[9] M D Rodriguez J Ahmed and M Shah ldquoAction MACH aspatio-temporal maximum average correlation height filter foraction recognitionrdquo in Proceedings of the 26th IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo08) June2008
[10] N Ikizler and D Forsyth ldquoSearching video for complex activ-ities with finite state modelsrdquo in Proceedings of the IEEE Com-puter Society Conference on Computer Vision and Pattern Recog-nition (CVPR rsquo07) June 2007
[11] D M Blei and J D Lafferty ldquoCorrelated topic modelsrdquo inAdvances in Neural Information Processing Systems (NIPS) vol18 pp 147ndash154 2006
[12] D M Blei A Y Ng and M I Jordan ldquoLatent Dirichlet alloca-tionrdquo Journal of Machine Learning Research vol 3 no 4-5 pp993ndash1022 2003
[13] T Hofmann ldquoProbabilistic latent semantic indexingrdquo in Pro-ceedings of the 22nd Annual International ACM SIGIR Con-ference on Research and Development in Information Retrieval(SIGIR rsquo99) pp 50ndash57 1999
[14] S J McKenna Y Raja and S Gong ldquoTracking colour objectsusing adaptive mixture modelsrdquo Image and Vision Computingvol 17 no 3-4 pp 225ndash231 1999
[15] J Liu and M Shah ldquoLearning human actions via informationmaximizationrdquo in Proceedings of the 26th IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo08) June2008
[16] YWang andGMori ldquoMax-Margin hidden conditional randomfields for human action recognitionrdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition Workshops (CVPR rsquo09) pp 872ndash879 June 2009
[17] H Jhuang T Serre L Wolf and T Poggio ldquoA biologicallyinspired system for action recognitionrdquo in Proceedings of the 11thIEEE International Conference on Computer Vision (ICCV rsquo07)pp 257ndash267 October 2007
[18] K Rapantzikos Y Avrithis and S Kollias ldquoDense saliency-based spatiotemporal feature points for action recognitionrdquoin Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition Workshops (CVPRrsquo09) pp 1454ndash1461 June 2009
[19] P Dollar V Rabaud G Cottrell and S Belongie ldquoBehaviorrecognition via sparse spatio-temporal featuresrdquo in Proceedingsof the 2nd Joint IEEE International Workshop on Visual Surveil-lance and Performance Evaluation of Tracking and Surveillance(VS-PETS rsquo05) pp 65ndash72 October 2005
[20] Y Ke R Sukthankar and M Hebert ldquoEfficient visual eventdetection using volumetric featuresrdquo in Proceedings of the 10thIEEE International Conference on Computer Vision (ICCV rsquo05)pp 166ndash173 October 2005
[21] A Fathi and GMori ldquoAction recognition by learning mid-levelmotion featuresrdquo in Proceedings of the 26th IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo08) June2008
[22] M Bregonzio S Gong and T Xiang ldquoRecognising action asclouds of space-time interest pointsrdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition Workshops (CVPR rsquo09) pp 1948ndash1955 June 2009
[23] Z Zhang YHu S Chan and L-T Chia ldquoMotion context a newrepresentation for human action recognitionrdquo in Proceeding ofthe European Conference on Computer Vision (ECCV rsquo08) vol4 pp 817ndash829 2008
[24] J C Niebles H Wang and L Fei-Fei ldquoUnsupervised learningof human action categories using spatial-temporalwordsrdquo Inter-national Journal of Computer Vision vol 79 no 3 pp 299ndash3182008
[25] A Klaser M Marszaek and C Schmid ldquoA spatiotemporaldescriptor based on 3D-gradientsrdquo in Proceedings of the BritishMachine Vision Conference (BMVC rsquo08) 2008
[26] S Sadek A Al-Hamadi B Michaelis and U Sayed ldquoHumanaction recognition via affinemoment invariantsrdquo in Proceedingsof the 21st International Conference on Pattern Recognition(ICPR rsquo12) pp 218ndash221 Tsukuba Science City Japan November2012
[27] J Flusser and T Suk ldquoPattern recognition by affine momentinvariantsrdquo Pattern Recognition vol 26 no 1 pp 167ndash174 1993
[28] D Xu and H Li ldquo3-D affine moment invariants generated bygeometric primitivesrdquo in Proceedings of the 18th InternationalConference on Pattern Recognition (ICPR rsquo06) pp 544ndash547August 2006
[29] S Sadek A Al-Hamadi B Michaelis and U Sayed ldquoAn SVMapproach for activity recognition based on chord-length-function shape featuresrdquo inProceedings of the IEEE InternationalConference on Image Processing (ICIP rsquo12) pp 767ndash770OrlandoFla USA October 2012
[30] VN VapnikTheNature of Statistical LearningTheory SpringerNew York NY USA 1995
[31] C Schuldt I Laptev and B Caputo ldquoRecognizing humanactions a local SVM approachrdquo in Proceedings of the 17thInternational Conference on Pattern Recognition (ICPR rsquo04) pp32ndash36 2004
[32] M Blank L Gorelick E Shechtman M Irani and R BasrildquoActions as space-time shapesrdquo in Proceedings of the 10th IEEEInternational Conference on Computer Vision (ICCV rsquo05) vol 2pp 1395ndash1402 October 2005
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
6 ISRNMachine Vision
Table 2 Confusion matrix for the Weizmann dataset
Action
Bend
Bend
Jump
Jump
Pjump
Pjump
Walk
Walk
Run
Run
Side
Side
Jack
Jack
Skip
Skip
Wave 1
Wave 1
Wave 2
Wave 2
100
000
000
000
000
000
000
000
000
000
000
100
000
000
000
000
000
000
000
000
000
000
100
000
000
000
000
000
000
000
000
000
000
100
000
000
000
000
000
000
000
000
000
000
090
000
000
010
000
000
000
000
000
000
000
100
000
000
000
000
000
000
000
000
000
000
100
000
000
000
000
000
000
000
010
000
000
090
000
000
000
000
000
000
000
000
000
000
000
100
000
000
000
000
000
000
000
000
100
000
Table 3 Comparison with the state of the art on the KTH andWeizmann datasets
Method KTH WeizmannOur method 935 980Liu and Shah [15] 928 mdashWang and Mori [16] 925 mdashJhuang et al [17] 917 mdashRodriguez et al [9] 886 mdashRapantzikos et al [18] 883 mdashDollar et al [19] 812 mdashKe et al [20] 630 mdashFathi and Mori [21] mdash 100Bregonzio et al [22] mdash 966Zhang et al [23] mdash 928Niebles et al [24] mdash 900Dollar et al [19] mdash 852Klaser et al [25] mdash 843
It is worthwhile to mention that all the methods that wecompared our method with except the method proposedin [21] have used similar experimental setups thus thecomparison seems to be meaningful and fair A final remarkconcerns the real-time performance of our approach Theproposed action recognizer runs at 18fps on average (using a28GHz Intel dual core machine with 4GB of RAM running32-bit Windows 7 Professional)
5 Conclusion and Future Work
In this paper we have introduced an approach for activityrecognition based on affine moment invariants for activityrepresentation and SVMs for feature classification On two
benchmark action datasets the results obtained by theproposed approach were compared favorably with thosepublished in the literature The primary focus of our futurework will be to investigate the empirical validation of theapproach on more realistic datasets presenting many techni-cal challenges in data handling such as object articulationocclusion and significant background clutter
References
[1] S Sadek A Al-Hamadi B Michaelis and U Sayed ldquoRecog-nizing human actions a fuzzy approach via chord-length shapefeaturesrdquo ISRN Machine Vision vol 1 pp 1ndash9 2012
[2] A A Efros A C Berg G Mori and J Malik ldquoRecognizingaction at a distancerdquo in Proceedings of the 9th IEEE InternationalConference on Computer Vision (ICCV rsquo03) vol 2 pp 726ndash733October 2003
[3] S Sadek A Al-Hamadi B Michaelis and U Sayed ldquoTowardsrobust human action retrieval in videordquo in Proceedings of theBritish Machine Vision Conference (BMVC rsquo10) AberystwythUK September 2010
[4] S Sadek A Al-Hamadi B Michaelis and U Sayed ldquoHumanactivity recognition a scheme using multiple cuesrdquo in Proceed-ings of the International Symposium on Visual Computing (ISVCrsquo10) vol 1 pp 574ndash583 Las Vegas Nev USA November 2010
[5] S Sadek A AI-Hamadi M Elmezain B Michaelis and USayed ldquoHuman activity recognition via temporal momentinvariantsrdquo in Proceedings of the 10th IEEE International Sym-posiumon Signal Processing and Information Technology (ISSPITrsquo10) pp 79ndash84 Luxor Egypt December 2010
[6] S Sadek A Al-Hamadi B Michaelis and U Sayed ldquoAn actionrecognition scheme using fuzzy log-polar histogram and tem-poral self-similarityrdquo EURASIP Journal on Advances in SignalProcessing vol 2011 Article ID 540375 2011
[7] R Cutler and L S Davis ldquoRobust real-time periodic motiondetection analysis and applicationsrdquo IEEE Transactions on
ISRNMachine Vision 7
Pattern Analysis andMachine Intelligence vol 22 no 8 pp 781ndash796 2000
[8] E Shechtman and M Irani ldquoSpace-time behavior based corre-lationrdquo in Proceedings of the IEEE Computer Society Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo05) vol 1pp 405ndash412 June 2005
[9] M D Rodriguez J Ahmed and M Shah ldquoAction MACH aspatio-temporal maximum average correlation height filter foraction recognitionrdquo in Proceedings of the 26th IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo08) June2008
[10] N Ikizler and D Forsyth ldquoSearching video for complex activ-ities with finite state modelsrdquo in Proceedings of the IEEE Com-puter Society Conference on Computer Vision and Pattern Recog-nition (CVPR rsquo07) June 2007
[11] D M Blei and J D Lafferty ldquoCorrelated topic modelsrdquo inAdvances in Neural Information Processing Systems (NIPS) vol18 pp 147ndash154 2006
[12] D M Blei A Y Ng and M I Jordan ldquoLatent Dirichlet alloca-tionrdquo Journal of Machine Learning Research vol 3 no 4-5 pp993ndash1022 2003
[13] T Hofmann ldquoProbabilistic latent semantic indexingrdquo in Pro-ceedings of the 22nd Annual International ACM SIGIR Con-ference on Research and Development in Information Retrieval(SIGIR rsquo99) pp 50ndash57 1999
[14] S J McKenna Y Raja and S Gong ldquoTracking colour objectsusing adaptive mixture modelsrdquo Image and Vision Computingvol 17 no 3-4 pp 225ndash231 1999
[15] J Liu and M Shah ldquoLearning human actions via informationmaximizationrdquo in Proceedings of the 26th IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo08) June2008
[16] YWang andGMori ldquoMax-Margin hidden conditional randomfields for human action recognitionrdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition Workshops (CVPR rsquo09) pp 872ndash879 June 2009
[17] H Jhuang T Serre L Wolf and T Poggio ldquoA biologicallyinspired system for action recognitionrdquo in Proceedings of the 11thIEEE International Conference on Computer Vision (ICCV rsquo07)pp 257ndash267 October 2007
[18] K Rapantzikos Y Avrithis and S Kollias ldquoDense saliency-based spatiotemporal feature points for action recognitionrdquoin Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition Workshops (CVPRrsquo09) pp 1454ndash1461 June 2009
[19] P Dollar V Rabaud G Cottrell and S Belongie ldquoBehaviorrecognition via sparse spatio-temporal featuresrdquo in Proceedingsof the 2nd Joint IEEE International Workshop on Visual Surveil-lance and Performance Evaluation of Tracking and Surveillance(VS-PETS rsquo05) pp 65ndash72 October 2005
[20] Y Ke R Sukthankar and M Hebert ldquoEfficient visual eventdetection using volumetric featuresrdquo in Proceedings of the 10thIEEE International Conference on Computer Vision (ICCV rsquo05)pp 166ndash173 October 2005
[21] A Fathi and GMori ldquoAction recognition by learning mid-levelmotion featuresrdquo in Proceedings of the 26th IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo08) June2008
[22] M Bregonzio S Gong and T Xiang ldquoRecognising action asclouds of space-time interest pointsrdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition Workshops (CVPR rsquo09) pp 1948ndash1955 June 2009
[23] Z Zhang YHu S Chan and L-T Chia ldquoMotion context a newrepresentation for human action recognitionrdquo in Proceeding ofthe European Conference on Computer Vision (ECCV rsquo08) vol4 pp 817ndash829 2008
[24] J C Niebles H Wang and L Fei-Fei ldquoUnsupervised learningof human action categories using spatial-temporalwordsrdquo Inter-national Journal of Computer Vision vol 79 no 3 pp 299ndash3182008
[25] A Klaser M Marszaek and C Schmid ldquoA spatiotemporaldescriptor based on 3D-gradientsrdquo in Proceedings of the BritishMachine Vision Conference (BMVC rsquo08) 2008
[26] S Sadek A Al-Hamadi B Michaelis and U Sayed ldquoHumanaction recognition via affinemoment invariantsrdquo in Proceedingsof the 21st International Conference on Pattern Recognition(ICPR rsquo12) pp 218ndash221 Tsukuba Science City Japan November2012
[27] J Flusser and T Suk ldquoPattern recognition by affine momentinvariantsrdquo Pattern Recognition vol 26 no 1 pp 167ndash174 1993
[28] D Xu and H Li ldquo3-D affine moment invariants generated bygeometric primitivesrdquo in Proceedings of the 18th InternationalConference on Pattern Recognition (ICPR rsquo06) pp 544ndash547August 2006
[29] S Sadek A Al-Hamadi B Michaelis and U Sayed ldquoAn SVMapproach for activity recognition based on chord-length-function shape featuresrdquo inProceedings of the IEEE InternationalConference on Image Processing (ICIP rsquo12) pp 767ndash770OrlandoFla USA October 2012
[30] VN VapnikTheNature of Statistical LearningTheory SpringerNew York NY USA 1995
[31] C Schuldt I Laptev and B Caputo ldquoRecognizing humanactions a local SVM approachrdquo in Proceedings of the 17thInternational Conference on Pattern Recognition (ICPR rsquo04) pp32ndash36 2004
[32] M Blank L Gorelick E Shechtman M Irani and R BasrildquoActions as space-time shapesrdquo in Proceedings of the 10th IEEEInternational Conference on Computer Vision (ICCV rsquo05) vol 2pp 1395ndash1402 October 2005
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
ISRNMachine Vision 7
Pattern Analysis andMachine Intelligence vol 22 no 8 pp 781ndash796 2000
[8] E Shechtman and M Irani ldquoSpace-time behavior based corre-lationrdquo in Proceedings of the IEEE Computer Society Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo05) vol 1pp 405ndash412 June 2005
[9] M D Rodriguez J Ahmed and M Shah ldquoAction MACH aspatio-temporal maximum average correlation height filter foraction recognitionrdquo in Proceedings of the 26th IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo08) June2008
[10] N Ikizler and D Forsyth ldquoSearching video for complex activ-ities with finite state modelsrdquo in Proceedings of the IEEE Com-puter Society Conference on Computer Vision and Pattern Recog-nition (CVPR rsquo07) June 2007
[11] D M Blei and J D Lafferty ldquoCorrelated topic modelsrdquo inAdvances in Neural Information Processing Systems (NIPS) vol18 pp 147ndash154 2006
[12] D M Blei A Y Ng and M I Jordan ldquoLatent Dirichlet alloca-tionrdquo Journal of Machine Learning Research vol 3 no 4-5 pp993ndash1022 2003
[13] T Hofmann ldquoProbabilistic latent semantic indexingrdquo in Pro-ceedings of the 22nd Annual International ACM SIGIR Con-ference on Research and Development in Information Retrieval(SIGIR rsquo99) pp 50ndash57 1999
[14] S J McKenna Y Raja and S Gong ldquoTracking colour objectsusing adaptive mixture modelsrdquo Image and Vision Computingvol 17 no 3-4 pp 225ndash231 1999
[15] J Liu and M Shah ldquoLearning human actions via informationmaximizationrdquo in Proceedings of the 26th IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo08) June2008
[16] YWang andGMori ldquoMax-Margin hidden conditional randomfields for human action recognitionrdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition Workshops (CVPR rsquo09) pp 872ndash879 June 2009
[17] H Jhuang T Serre L Wolf and T Poggio ldquoA biologicallyinspired system for action recognitionrdquo in Proceedings of the 11thIEEE International Conference on Computer Vision (ICCV rsquo07)pp 257ndash267 October 2007
[18] K Rapantzikos Y Avrithis and S Kollias ldquoDense saliency-based spatiotemporal feature points for action recognitionrdquoin Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition Workshops (CVPRrsquo09) pp 1454ndash1461 June 2009
[19] P Dollar V Rabaud G Cottrell and S Belongie ldquoBehaviorrecognition via sparse spatio-temporal featuresrdquo in Proceedingsof the 2nd Joint IEEE International Workshop on Visual Surveil-lance and Performance Evaluation of Tracking and Surveillance(VS-PETS rsquo05) pp 65ndash72 October 2005
[20] Y Ke R Sukthankar and M Hebert ldquoEfficient visual eventdetection using volumetric featuresrdquo in Proceedings of the 10thIEEE International Conference on Computer Vision (ICCV rsquo05)pp 166ndash173 October 2005
[21] A Fathi and GMori ldquoAction recognition by learning mid-levelmotion featuresrdquo in Proceedings of the 26th IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo08) June2008
[22] M Bregonzio S Gong and T Xiang ldquoRecognising action asclouds of space-time interest pointsrdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition Workshops (CVPR rsquo09) pp 1948ndash1955 June 2009
[23] Z Zhang YHu S Chan and L-T Chia ldquoMotion context a newrepresentation for human action recognitionrdquo in Proceeding ofthe European Conference on Computer Vision (ECCV rsquo08) vol4 pp 817ndash829 2008
[24] J C Niebles H Wang and L Fei-Fei ldquoUnsupervised learningof human action categories using spatial-temporalwordsrdquo Inter-national Journal of Computer Vision vol 79 no 3 pp 299ndash3182008
[25] A Klaser M Marszaek and C Schmid ldquoA spatiotemporaldescriptor based on 3D-gradientsrdquo in Proceedings of the BritishMachine Vision Conference (BMVC rsquo08) 2008
[26] S Sadek A Al-Hamadi B Michaelis and U Sayed ldquoHumanaction recognition via affinemoment invariantsrdquo in Proceedingsof the 21st International Conference on Pattern Recognition(ICPR rsquo12) pp 218ndash221 Tsukuba Science City Japan November2012
[27] J Flusser and T Suk ldquoPattern recognition by affine momentinvariantsrdquo Pattern Recognition vol 26 no 1 pp 167ndash174 1993
[28] D Xu and H Li ldquo3-D affine moment invariants generated bygeometric primitivesrdquo in Proceedings of the 18th InternationalConference on Pattern Recognition (ICPR rsquo06) pp 544ndash547August 2006
[29] S Sadek A Al-Hamadi B Michaelis and U Sayed ldquoAn SVMapproach for activity recognition based on chord-length-function shape featuresrdquo inProceedings of the IEEE InternationalConference on Image Processing (ICIP rsquo12) pp 767ndash770OrlandoFla USA October 2012
[30] VN VapnikTheNature of Statistical LearningTheory SpringerNew York NY USA 1995
[31] C Schuldt I Laptev and B Caputo ldquoRecognizing humanactions a local SVM approachrdquo in Proceedings of the 17thInternational Conference on Pattern Recognition (ICPR rsquo04) pp32ndash36 2004
[32] M Blank L Gorelick E Shechtman M Irani and R BasrildquoActions as space-time shapesrdquo in Proceedings of the 10th IEEEInternational Conference on Computer Vision (ICCV rsquo05) vol 2pp 1395ndash1402 October 2005
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of