visual instance mining of news videos using a graph-based approach
DESCRIPTION
Full details: https://imatge.upc.edu/web/publications/visual-instance-mining-news-videos-using-graph-based-approach Author: David Almendros-Gutiérrez Advisors: Xavier Giró-i-Nieto (UPC) and Horst Eidenberger (TU Wien) Degree: Telecommunications Engineering (5 years) at Telecom BCN-ETSETB (UPC) The aim of this thesis is to design a tool that performs visual instance search mining for news video summarization. This means to extract the relevant content of the video in order to be able to recognize the storyline of the news. Initially, a sampling of the video is required to get the frames with a desired rate. Then, different relevant contents are detected from each frame, focusing on faces, text and several objects that the user can select. Next, we use a graph-based clustering method in order to recognize them with a high accuracy and select the most representative ones to show them in the visual summary. Furthermore, a graphical user interface in Wt was developed to create an online demo to test the application. During the development of the application we have been testing the tool with the CCMA dataset. We prepared a web-based survey based on four results from this dataset to check the opinion of the users. We also validate our visual instance mining results comparing them with the results obtained applying an algorithm developed at Columbia University for video summarization. We have run the algorithm on a dataset of a few videos on two events: 'Boston bombings' and the 'search of the Malaysian airlines flight'. We carried out another web-based survey in which users could compare our approach with this related work. With these surveys we analyze if our tool fulfill the requirements we set up. We can conclude that our system extract visual instances that show the most relevant content of news videos and can be used to summarize these videos effectively.TRANSCRIPT
BY D AV I D A L M E N D R O S G U T I É R R E Z
D I R E C T E D BYH O R S T E I D E N B E R G E RXAV I E R G I R Ó - I - N I E T O
2 0 1 3 - 2 0 1 4
VISUAL INSTANCE MINING USING OF NEWS VIDEOS USING
A GRAPH-BASED APPROACH
2
CONTENTS
IntroductionState of the artRequirements analysisDeveloped solutionEvaluationFuture work
3
CONTENTS
Introduction Motivation
State of the artRequirements analysisDeveloped solutionEvaluationFuture work
4
INTRODUCTION
5
INTRODUCTION
Motivation
Manel Martos’s Thesis (2013)“Content-based video summarization
oriented to movie trailers”
6
INTRODUCTION
News domain
• Websites
• News bulletins
• Newspaper
7
CONTENTS
IntroductionState of the art
Visual instance mining News summarization
Requirements analysisDeveloped solutionEvaluationFuture work
8
STATE OF THE ART
Visual instance mining
From a video
From a large collection of images
* Wei Zhang et al, "Scalable Visual Instance Mining with Threads of Features" (ACM MultiMedia 2014)
*
9
STATE OF THE ART
News summarizationNews Rover * Developed at Columbia University
* H. Li et al, "News rover: exploring topical structures and serendipity in heterogeneous multimedia news" (ACM MultiMedia 2013)
10
CONTENTS
IntroductionState of the artRequirements analysis
Content requirements Structural requirements
Developed solutionEvaluationFuture work
11
REQUIREMENTS ANALYSIS
Content requirements
Barack ObamaPresident of the USA
Núria SoléAnchorwoman of tv3 news
Flag
Fire truck
12
REQUIREMENTS ANALYSIS
Structural requirements
13
CONTENTS
IntroductionState of the artRequirements analysisDeveloped solution
Environment System architecture overview Temporal sampling Instances detection Graph-based selection Presentation
EvaluationFuture work
14
DEVELOPED SOLUTION
Environment
15
DEVELOPED SOLUTION
System architecture overview
16
DEVELOPED SOLUTION
Temporal samplingFrom user’s desired frame rate Uniform sampling
17
DEVELOPED SOLUTION
Instance detection Faces detection
Viola & Jones algorithm
DetectMultiscale method
18
DEVELOPED SOLUTION
Objects detection SURF descriptors and matching
Training images
19
DEVELOPED SOLUTION
3. Matching
2. Keypoints & Surf descriptors of frames
1. Keypoints & Surf descriptors of training images
20
DEVELOPED SOLUTION
Heuristic decision
0.1 0.2 0.3 0.4 0.5
0.600000000000001
0.700000000000001 0.8
00.10.20.30.40.50.60.7
Test with ambulances
Detection threshold
% c
orre
ct d
etec
tion
0.1 0.2 0.3 0.4 0.5
0.600000000000001
0.700000000000001
00.10.20.30.40.50.60.7
Test with police cars
Detection threshold
% c
orre
ct d
etec
tions
21
Edge detection
DEVELOPED SOLUTION
Texts detection Stroke width based algorithm
ResultsStroke width of all pixels are computed
22
DEVELOPED SOLUTION
Graph-based selection of representative instances
Pre-processing Increase the accuracy
Original GrayscaleCropped
Resized Equalized
Pre-processin
g
Features extraction
Similarity graph
Clustering Selection
23
Features extraction LBPH
Histogram comparing Histogram intersection Chi-square distance
Similarity value
DEVELOPED SOLUTION
With 𝛼 = 1
24
DEVELOPED SOLUTION
Similarity graph (Full connectivity)
Node = Visual instance
Awn = Visual similarity
25
Clustering by Edge filtering
DEVELOPED SOLUTION
Similarity value > Threshold
Subgraphs Heuristic decision
26
DEVELOPED SOLUTION
Selection of the representatives visual instances
Mutual reinforcement Scores
Number of nodes > Threshold
Time of appearanceHeuristic decision
27
DEVELOPED SOLUTION
Presentation Graphical User Interface online (GUI)
Developed with Wt
Initial design
28
DEVELOPED SOLUTION
Final result of the GUI
29
CONTENTS
IntroductionState of the artRequirements analysisSystem architecture overviewDeveloped solutionEvaluation
User study 1 User study 2 Conclusions
Future work
30
EVALUATION
User study 1 2 complementary web-based surveys
4 videos from the CCMA dataset 40 participants
Evaluation Redundancy Understanding Quality (Mean Opinion Score (MOS))
1. Unacceptable 2. Poor 3. Fair 4. Good 5. Excellent
31
EVALUATION
Visual summary 1
Visual summary 4
32
EVALUATION
Redundancy
70%
30%
Visual summary 1
YESNO
70%
30%
Visual summary 2
YESNO
70%
30%
Visual summary 3
YESNO 48%53%
Visual summary 4
YESNO
33
EVALUATION
UnderstandingRanking Keywords before
watching the videoKeywords after
watching the video1 Puerto Rico Independence
2 Independence Puerto Rico
3 Political party Future
4 Election Voting
5 Opinion Political party
Ranking Keywords before watching the video
Keywords after watching the video
1 Music New schedule
2 Catalunya Radio Novelty
3 Programming Catalunya Radio
4 Office Culture
5 Schedule Information
34
EVALUATION
1 2 3 4 502468
101214161820
Visual summary 1
Visual summary 2
Score rate
o Quality
Part
icip
an
ts
MOS1 = 3,8MOS2 = 3,57MOS3 = 3,6MOS4 = 3,72
35
EVALUATION
User study 2 Web-based survey
2 well-known news 356 videos of “Boston Marathon bombings” 406 videos of “Disappearance of the Malaysia airlines flight”
55 participants
Evaluation Comparison with W. Zhang (ACM MM 2014)
Quality (Mean Opinion Score (MOS))• 1. Unacceptable • 2. Poor• 3. Fair• 4. Good• 5. Excellent
36
EVALUATION
Boston Marathon bombings
W. Zhang (ACM MM
2014)
Our visual summary
37
EVALUATION
1 2 3 4 50
5
10
15
20
25
30
W. Zhang (ACM MM 2014)Our visual summary
Score rate
Part
icip
an
ts
MOS = 2,2MOS = 4,15
38
EVALUATION
Disappearance of the Malaysia airlines flight
W. Zhang(ACM MM 2014)
Our visual summary
39
EVALUATION
1 2 3 4 50
5
10
15
20
25
30
W. Zhang (ACM MM 2014)Our visual summary
Score rate
Part
icip
an
ts
MOS = 2,56MOS = 3,62
40
EVALUATION
Conclusions
Pros Extract relevant content Summarize the news video Seem to be competitive with the state of the art
Cons Exist redundancy Low accuracy of the object detection
41
CONTENTS
IntroductionState of the artRequirements analysisSystem architecture overviewDeveloped solutionEvaluationFuture work
42
CONCLUSION
Future work
Improve the detection
Audio transcription
Content presentation
Interactive prototype
43
THANK YOU VERY MUCH FOR YOUR ATTENTION