vireo-tno @ trecvid 2014 zero-shot event detection and … · 2014. 11. 25. · vireo-tno @ trecvid...
TRANSCRIPT
VIREO-TNO @ TRECVID 2014Zero-Shot Event Detection and Recounting
Speaker: Maaike de Boer (TNO)
Yi-Jie Lu1, Hao Zhang1, Chong-Wah Ngo1
Maaike de Boer2, John Schavemaker2, Klamer Schutte2, Wessel Kraaij2
1VIREO Group, City University of Hong Kong, Hong Kong2Netherlands Organization for Applied Scientific Research (TNO), Netherlands
Outline
0-Shot System
– System Overview
– Findings
MER System
– System Workflow
– Results
Semantic Query Generation (SQG)
– Given an event query, SQG translates the query description into a representation of semantic concepts
Event Query(Attempting a Bike Trick)
SQG
< Objects >• Bike 0.60• Motorcycle 0.60• Mountain bike 0.60< Actions >• Bike trick 1.00• Ridding bike 0.62• Flipping bike 0.61< Scenes >• Parking lot 0.01
Semantic Query
Relevant ConceptsRelevance ScoreConcept Bank
$
Concept Bank
€TRECVID
SIN
₤Research Collection
ƒ HMDB51
$UCF101
¥ImageNet
Concept Bank
– Research collection (497 concepts)
– ImageNet ILSVRC’12 (1000 concepts)
– SIN’14 (346 concepts)
$
Concept Bank
€TRECVID
SIN
₤Research Collection
ƒ HMDB51
$UCF101
¥ImageNet
Event Search
– Ranking according to the SQ and concept responses
< Objects >• Bike 0.60• Motorcycle 0.60• Mountain bike 0.60< Actions >• Bike trick 1.00• Ridding bike 0.62• Flipping bike 0.61< Scenes >• Parking lot 0.01
Semantic Query
Video Ranking
Event Search is iqc
q
Concept Response ic
Outline
0-Shot System
– System Overview
– Findings
MER System
– System Workflow
– Results
SQG Experiments
– Exact matching vs. WordNet/ConceptNet matching
– How many concepts are used to represent an event?
– To further improve the weighting:
TF-IDF
Term specificity
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Ave
rage
Pre
cisi
on
Event ID
WordNet ExactMatching EM-TOP
WordNet
Exact Matching
Exact matching but only retains the top few concepts
7%
Exact matching vs. WordNet matching
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
1 6 11 16 21 26
Mea
n A
vera
ge P
reci
sio
n
Top k Concepts
MAP(all)
Hit the best MAP by only retaining the Top 8 concepts
Amount of concepts used to represent event
Insights
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
1 6 11 16 21 26
Ave
rage
Pre
cisi
on
Top k Concepts
21
Event 21: Attempting a bike trick
TrickWheel
Paddle wheel
Car wheelPotter wheel
Person riding
Jumping
Insights
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
1 6 11 16 21 26
Ave
rage
Pre
cisi
on
Top k Concepts
31
Event 31: Beekeeping
Honeycomb (ImageNet)
Bee (ImageNet)
Bee house (ImageNet)Cutting (research collection)
Cutting down tree (research collection)
Insights
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
1 6 11 16 21 26
Ave
rage
Pre
cisi
on
Top k Concepts
23
Event 23: Dog show
Brush dog (research collection)
Dog show (research collection)
Improvements by TF-IDF and word specificityMethod MAP (on MED14-Test)
Exact Matching Only 0.0306
Exact Matching + TF 0.0420
Exact Matching + TFIDF 0.0495
Exact Matching + TFIDF + Word Specificity 0.0502
0
0.01
0.02
0.03
0.04
0.05
0.06
EM Only EM + TF EM + TFIDF EM + TFIDF +Spec.
Findings
1. Exact matching performs better than matching with WordNet and/or ConceptNet
2. Performance is even better by only retaining the top few exactly matched concepts
3. Adding both TF-IDF and Word Specificity increases performance
Why ontology-based mapping would not work?
A s
amp
le q
uer
y in
TR
ECV
ID 2
00
9
Why ontology-based mapping would not work?
Dog Show
Concept“dog”
cat
horse
mammal
carnivore
animal
kit fox
red wolf
SIN
ImageNet
Why ConceptNet mapping would not work?
Tailgating
car
food
helmet
team uniform
portable shelter
parking lot
driver
engine
tailgatingdesires
bus
Findings
It is difficult to
– harness the ontology-based mapping while constraining the mapping by event context
In the Ad-Hoc event “Extinguishing a Fire”
– Key concepts are missing:
Fire extinguisher
Firefighter
Findings
It is reasonable to
– Scale up the number of concepts, thus increasing the chance of exact matching
MED14-Eval-Full Results
PS 000Ex
– Automatic semantic query generation and search
– Fusion of 0-Shot and OCR system
– Achieves the MAP of 5.2
AH 000Ex
– System is the same as in PS 000Ex
– Achieves the MAP of 2.6
– Performance drops due to the lack of key concepts
Outline
0-Shot System
– System Overview
– Findings
MER System
– System Workflow
– Results
MER System
In algorithm design, we aim to optimize– Concept-to-event relevancy
– Evidence diversity
– Viewing time of evidential shots
MER System
In algorithm design, we aim to optimize– Concept-to-event relevancy
First, we require that candidate shots are relevant to the event;
Second, we do concept-to-shot alignment.
– Evidence diversity
– Viewing time of evidential shots
MER System
In algorithm design, we aim to optimize– Concept-to-event relevancy
First, we require that candidate shots are relevant to the event;
Second, we do concept-to-shot alignment.
– Evidence diversity In concept-to-shot alignment, we recount each shot with a unique concept
different from other shots.
– Viewing time of evidential shots
MER System
In algorithm design, we aim to optimize– Concept-to-event relevancy
First, we require that candidate shots are relevant to the event;
Second, we do concept-to-shot alignment.
– Evidence diversity In concept-to-shot alignment, we recount each shot with a unique concept
different from other shots.
– Viewing time of evidential shots Select only the three most confident shots as key evidence
Basically, each shot is in about 5 seconds
Outline
0-Shot System
– System Overview
– Findings
MER System
– System Workflow
– Results
Key Evidence Localization
Extract keyframes uniformly
Key Evidence Localization
Concept Reponses
Apply concept detectors$
Concept Bank
€TRECVID
SIN
₤Research Collection
ƒ HMDB51
$UCF101
¥ImageNet
Key Evidence Localization
Choose keyframes that are most relevant to this event
• All concepts in semantic query are taken into account by calculating the weighted sum
is iwr
Key Evidence Localization
Expand keyframes to shots
Key Evidence Localization
The top 3 shots are selected as key evidences
Key Evidence Localization
The rests are non-key evidences
Concept-to-Shot Alignment
The top concept in the key evidence is selected as the representative concept* We choose unique concept for each shot
< Objects >• Bike• Motorcycle• Mountain bike< Actions >• Bike trick• Ridding bike• Flipping bike< Scenes >• Parking lot
Semantic Query
Key
Non-Key
Ridding bikeBike trickBike
Bike trickBikeRidding bike
Key
Key
MER14 Results
The percentage of strongly agree
(b) Event query quality(a) Evidence quality
0%
5%
10%
15%
20%
25%
30%
VIREO Team1 Team2 Team3 Team4 Team6 Team5
0%
5%
10%
15%
20%
25%
30%
Team2 VIREO Team4 Team3 Team6 Team1 Team5
MER14 Results
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
Team1 Team2 Team3 VIREO Team4 Team5 Team6
0%
10%
20%
30%
40%
50%
60%
70%
Team2 VIREO Team4 Team1 Team6 Team5 Team3
The percentage of both agree and strongly agree
(b) Event query quality(a) Evidence quality
Summary
0-Shot System
– The simple exact matching performs the best
– The quality of concepts selected to represent an event is more important than quantity
– It’s an open problem of how to harness the ontology-based mapping
Summary
MER System
– In key evidence localization, we emphasize the event relevancy first, then the hot concepts
– We recommend three shots as key evidences and each in about 5 seconds
Thanks!