the shanghai-hongkong team at mediaeval2012: violent scene detection using trajectory-based features
DESCRIPTION
TRANSCRIPT
![Page 1: The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Trajectory-based Features](https://reader033.vdocuments.site/reader033/viewer/2022061304/549843a9b47959604d8b5412/html5/thumbnails/1.jpg)
The Shanghai-Hongkong Team at MediaEval2012: Violent
Scene Detection Using Trajectory-based Features
Yu-Gang Jiang*, Qi Dai*, Chun Chet Tan**, Xiangyang Xue*, Chong-Wah Ngo**
*School of Computer Science, Fudan University, Shanghai
**Department of Computer Science, City University of Hong Kong, HK
MediaEval 2012 Workshop, Oct 4-5, Pisa, Italy
![Page 2: The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Trajectory-based Features](https://reader033.vdocuments.site/reader033/viewer/2022061304/549843a9b47959604d8b5412/html5/thumbnails/2.jpg)
Outlines• Introduction
• Framework
• Feature Extraction
• Classifiers
• Temporal Smoothing
• Results
• Discussions
• First 20 clips retrieved
![Page 3: The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Trajectory-based Features](https://reader033.vdocuments.site/reader033/viewer/2022061304/549843a9b47959604d8b5412/html5/thumbnails/3.jpg)
Introduction• Violent Scene Detection task [1] -
practical challenge, great potential in applications.
• Focus on novel features.
• Top performance in mAP@20, runner-up in mAP@100
[1] C.-H. Demarty, C. Penet, G. Gravier, and M. Soleymani. The MediaEval 2012 Affect Task: Violent Scenes Detection. In MediaEval 2012 Workshop, Pisa, Italy, 2012.
![Page 4: The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Trajectory-based Features](https://reader033.vdocuments.site/reader033/viewer/2022061304/549843a9b47959604d8b5412/html5/thumbnails/4.jpg)
Framework
The circled numbers indicate the 5 submitted runs
Feature extraction
Trajectory-based (7 features)
Spatial-temporal interest point
MFCC audio feature
χ2 kernel SVM
Classifiers
SIFT
Concept-based
5
4
Video shots
3
Detection score-level temporal
smoothing
1
All features except
concept-based
χ2 kernel SVM
Temporal feature
smoothing2
![Page 5: The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Trajectory-based Features](https://reader033.vdocuments.site/reader033/viewer/2022061304/549843a9b47959604d8b5412/html5/thumbnails/5.jpg)
Feature Extraction• Trajectory-based features [2]:
- dense trajectory, HOG, HOF, MBH [5]
- TrajMF (relative locations and motions between trajectory pairs)
- Trajectory shape feature
• Advantages: robust to camera movement, rich information, implicitly capture object-object and object-background relationships.
[2] Y.-G. Jiang, Q. Dai, X. Xue, W. Liu, and C.-W. Ngo. Trajectory-based modeling of human actions with motion reference points. In ECCV, 2012.
[5] H. Wang, A. Klaser, C. Schmid, and C.-L. Liu. Action recognition by dense trajectories. In CVPR, 2011.
![Page 6: The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Trajectory-based Features](https://reader033.vdocuments.site/reader033/viewer/2022061304/549843a9b47959604d8b5412/html5/thumbnails/6.jpg)
Feature Extraction• SIFT [4]
• STIP [3]
• MFCC
• Concept-based Features (10 concepts: blood, carchase, coldarms, fights, fire, firearms, gore, explosions, gunshots, screams)
[3] I. Laptev. On space-time interest points. International Journal of Computer Vision, 64:107-123, 2005.
[4] D. Lowe. Distinctive image features from scale-invariant keypoints. International Journal on Computer Vision, 60:91-110, 2004.
![Page 7: The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Trajectory-based Features](https://reader033.vdocuments.site/reader033/viewer/2022061304/549843a9b47959604d8b5412/html5/thumbnails/7.jpg)
Classifiers• BoW representation
• Chi-squared kernel SVMs
• Kernel level early fusion is used to combine multiple features
![Page 8: The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Trajectory-based Features](https://reader033.vdocuments.site/reader033/viewer/2022061304/549843a9b47959604d8b5412/html5/thumbnails/8.jpg)
Temporal Smoothing• Feature Smoothing – averaged
features over a three-shot window.
• Score Smoothing – averaged prediction scores over a three-shot window.
![Page 9: The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Trajectory-based Features](https://reader033.vdocuments.site/reader033/viewer/2022061304/549843a9b47959604d8b5412/html5/thumbnails/9.jpg)
r3 r2 r5 r4 r10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Mea
n Av
erag
e Pr
ecis
ion
at 2
0
Results (mAP@20)
• Run 5: 7 dense trajectory features
• Run 4: Run 5 + SIFT + STIP + MFCC
• Run 3: Run 4 + concept scores
• Run 2: Run 4 + feature smoothing
• Run 1: Run 4 + score smoothing
![Page 10: The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Trajectory-based Features](https://reader033.vdocuments.site/reader033/viewer/2022061304/549843a9b47959604d8b5412/html5/thumbnails/10.jpg)
Results (mAP@100)
• Run 5: 7 dense trajectory features
• Run 4: Run 5 + SIFT + STIP + MFCC
• Run 3: Run 4 + concept scores
• Run 2: Run 4 + feature smoothing
• Run 1: Run 4 + score smoothing
r3 r4 r5 r2 r10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Mea
n Av
erag
e Pr
ecis
ion
at 1
00
![Page 11: The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Trajectory-based Features](https://reader033.vdocuments.site/reader033/viewer/2022061304/549843a9b47959604d8b5412/html5/thumbnails/11.jpg)
Discussions• SIFT + STIP + MFCC show insignificant
improvement. TrajMF has encoded the rich information of SIFT and STIP.
• Concept-based scores do not improve the performances - overfitting SVMs due to insufficient training data. In fact, using mid-level concept detectors is a promising direction.
• Score smoothing boosts the performances. Feature smoothing that “blurs” the features across shots might not be a good option.
![Page 12: The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Trajectory-based Features](https://reader033.vdocuments.site/reader033/viewer/2022061304/549843a9b47959604d8b5412/html5/thumbnails/12.jpg)
First 20 clips retrieved
![Page 13: The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Trajectory-based Features](https://reader033.vdocuments.site/reader033/viewer/2022061304/549843a9b47959604d8b5412/html5/thumbnails/13.jpg)
Thank You