much ado about time - princeton university · much ado about time: exhaustive annotation of...
TRANSCRIPT
Much Ado About Time:Exhaustive Annotation
of Temporal Data
Gunnar A. Sigurdsson, Olga Russakovsky, Ali Farhadi, Ivan Laptev, Abhinav Gupta
MUCH ADO ABOUT TIME: EXHAUSTIVE ANNOTATION OF TEMPORAL DATA HTTP://ALLENAI.ORG/PLATO/CHARADES/
Datasets drive computer vision progress
Caltech 101 [Fei-Fei ‘04]
Algorithms:[Berg ’05], [Grauman ’05],[Zhang ’06], [Lazebnik ’06],
[Jain ’08], [Boiman ’08], [Yang ’09], [Maji ’09]
[Wang ’10], [Zhou ’10], [Feng ’11], [Jiang ’11], …
PASCAL VOC [Everingham ’07]
Algorithms: [Chum ’07], [Felzenszwalb ’08],
[Wang ’09], [Harzallah ’09], [Bourdev ’09], [Vedaldi ’09],
[Lin ’09], [Lampert ’09], [Carreira ’10], [Wang ’10],
[Song ’11], [vanDeSande ’11], …
Algorithms:[Deng ’10], [Sanchez ’11], [Lin ’11], [Krizhevsky ’12], [Zeiler ’13], [Wang ’13],
[Sermanet ’13], [Simonyan ’14], [Lin ’14],[Girshick ’14],
[Szegedy ’14], [He ’15], …
ImageNet [Deng ’09]
Need: (1) Dense, detailed,
multi-label annotations
(2) Large-scale annotated video datasets
Dataset scale and complexity
Com
pute
r vis
ion
capa
bilit
ies
MUCH ADO ABOUT TIME: EXHAUSTIVE ANNOTATION OF TEMPORAL DATA HTTP://ALLENAI.ORG/PLATO/CHARADES/
Multi-label video annotationopens book
puts book
on shelf walksturns on
stove eats sits down sneezes
- - - + - - -
- + + - - - +
- - + - - + -
- - - - + - -
+ - + - - + -
100-200labels
10,000 videos
MUCH ADO ABOUT TIME: EXHAUSTIVE ANNOTATION OF TEMPORAL DATA HTTP://ALLENAI.ORG/PLATO/CHARADES/
Multi-label video annotationopens book
puts book on
shelf walksturns on
stove eats sits down sneezes
? - - + - - -
? + + - - - +
? - + - - + -
? - - - + - -
? - + - - + -
MUCH ADO ABOUT TIME: EXHAUSTIVE ANNOTATION OF TEMPORAL DATA HTTP://ALLENAI.ORG/PLATO/CHARADES/
Multi-label video annotationopens book
putsbook on
shelf walksturns on
stove eats sits down sneezes
? ? ? ? ? ? ?
- + + - - - +
- - + - - + -
- - - - + - -
+ - + - - + -
MUCH ADO ABOUT TIME: EXHAUSTIVE ANNOTATION OF TEMPORAL DATA HTTP://ALLENAI.ORG/PLATO/CHARADES/
One-label All-labels
☐ Opens book ☐ Opens book ☐ Puts book on shelf ☐ Walks ☐ Turns on stove ☐ Eats ☐ Sits down …
Repeat N times for N labels
vs
Expect better annotation accuracy
Expect better annotation time
Which interface is better?
MUCH ADO ABOUT TIME: EXHAUSTIVE ANNOTATION OF TEMPORAL DATA HTTP://ALLENAI.ORG/PLATO/CHARADES/
Data: 140 videos, each ~30 secs long Labels: 52 human actions Charades dataset of [Sigurdsson ECCV 2016] Experiment on Amazon Mechanical Turk
Which interface is better?
Many-labels is betterTime Accuracy
Few-labels is better
One-label
☐ Opens book
Repeat N times for N labels
All-labels☐ Opens book ☐ Puts book on shelf ☐ Walks ☐ Turns on stove …
[Miller PsychologyReview 1956]
MUCH ADO ABOUT TIME: EXHAUSTIVE ANNOTATION OF TEMPORAL DATA HTTP://ALLENAI.ORG/PLATO/CHARADES/
Play video at 2x speed [Lasecki UIST 2014]
Improving annotation timeConsistency in the few-labels setting
Ask same worker about the same actions for multiple videos => 13.6% reduction in annotation time
Semantic hierarchy of labels [Deng CHI 2014]
☐ Opens book ☐ Opens book ☐ Opens book
☐ Opens book ☐ Walks ☐ Sits down
vs
Many-labels is better
Worker 1:
Worker 1:
MUCH ADO ABOUT TIME: EXHAUSTIVE ANNOTATION OF TEMPORAL DATA HTTP://ALLENAI.ORG/PLATO/CHARADES/
Improving recallVideo summary
Request a 20-word description of the video => no effect on recall, 40% slower
Forced responseRequest a yes/no response for every label
=> actually drops recall! (annoys workers?)
Consensus annotationRely on multiple rounds of annotation with different workers
=> recall improves from 58.0% to 83.3% with 3 rounds
Many-labels
☐ Opens book ☐ Puts book on shelf ☐ Walks ☐ Turns on stove ☐ Eats ☐ Sits down ☐ Sneezes ☐ Picks up a cup ☐ Holds a dish …
[Krishna CHI 2016]
Few-labels is better
MUCH ADO ABOUT TIME: EXHAUSTIVE ANNOTATION OF TEMPORAL DATA HTTP://ALLENAI.ORG/PLATO/CHARADES/
0 5 10 15 20 25
Average time to annotate a video [min]
50
60
70
80
90
100
Rec
all
0 5 10 15 20 25
Average time to annotate a video [min]
70
75
80
85
90
95
100
Pre
cisi
on
Cumulative time [min] Cumulative time [min]
Rec
all
Prec
isio
nMany-labelinterface (26)
Few-labelinterface (5)
Data: 1,815 videos, each ~30 secs long, 2x speed Labels: 157 human actions, organized into a hierarchy with 52 high-level actions Charades dataset of [Sigurdsson ECCV 2016]
Experiments on Amazon Mechanical Turk Label is positive if >= 1 worker marks it as positive
3 rounds
7 rounds
1st round
1stround
3 rounds
7 rounds
Bringing it all together
MUCH ADO ABOUT TIME: EXHAUSTIVE ANNOTATION OF TEMPORAL DATA HTTP://ALLENAI.ORG/PLATO/CHARADES/
• Quantitative analysis of multi-label video annotation • Many-labels interface is better than the few-labels interface • Annotated of 157 human actions on 9,848 videos (incl. temporal extent)
Conclusions
Actions Video (3x speed)
Download dataset at http://allenai.org/plato/charades