lessons learned from large scale crowdsourced data...
TRANSCRIPT
Lessons Learned from Large‑Scale Crowdsourced Data Collection for ILSVRC
Jonathan Krause
OverviewClassification
Localization
Detection Pelican
OverviewClassification
Localization
Detection Pelican
OverviewClassification
Localization
Detection BirdFrog
Classification Overview• 1.4M images • 1,000 classes
Pelican
By hand: • 5 sec/image • 50% images correct • 12 hours worked/day
= 324 days!
CrowdsourcingLet the crowd do the work for you!
Classification Pipeline1. Collect candidate images for each category
2. Put candidate images on Amazon Mechanical Turk (AMT)
3. AMT workers click on images containing each class
4. Aggregate worker responses into labels
Collecting ImagesCategory: “Whippet”
Google Image Search:
Problem: Limited Images• Web searches are limited • Solution: Query Expansion
• WordNet: Whippet: “a small slender dog of greyhound type developed in England”
→ “whippet dog”, “whippet greyhound”
→ translate into other languages
Deploying on AMT
Annotate many images at once!
Make sure workers
understand the classes!
Understanding ClassesWikipedia and Google links
Understanding ClassesGive them a definition delta: a low triangular area of alluvial deposits where a river divid before entering a larger body of water: “the Mississippi River delta”; “the Nile delta”
Understanding ClassesTest them on the definition
Understanding ClassesTest them on the definition
Understanding ClassesGive example images (if you have them)
Hard Easya small slender dog of greyhound type developed in England
a small slender dog of greyhound type developed in England
+
Quality ControlWorkers on AMT are:
• Fast • Inexpensive • Plentiful
But they are not: • Highly trained
Solution: Multiple responses, merge results
Quality ControlGiven
• Set of (worker, image, response)
Want • P(image has label) for each image • (Optionally) worker quality estimates
A Simple Method• Majority vote
Q: Is this a whippet?Responses:
Yes No Yes Yes No No Yes
Yes
Majority VoteProblems: • Doesn’t give confidence • Hard to measure worker quality
Responses: Yes No Yes Yes No No Yes
How sure are we it’s positive?
How good are these workers?
One Approach• Annotate a subset of images with many
annotations • Majority vote to determine ground truth • Determine confidence given fewer annotations
Deng et al. 2009
Pro & ConPro
• Simple • Gives image confidence
Con • Treats all workers the same • Relies on initial majority vote
Another ApproachModel:
• Prior of label correct • Worker confusion matrix
Dawid, Skene. 1979
Max-likelihood with EM
Another ApproachWorker Quality
• Compute Soft Label: distribution over labels given worker response
• Calculate expected cost of soft label q:
Ipeirotis, Provost, Wang. 2012
Pro & ConPro
• Gives image confidence • Gives worker quality
Con • More complex • Need to run optimization
OverviewClassification
Localization
Detection Pelican
Localization Overview• Classification images • 1,000 classes • 600k training bounding boxes
Pelican
Main Challenge: Collecting and verifying bounding boxes
Bounding BoxesRequirements:
• Tight around object • Around all object instances • Not around other objects
Su, Deng, Fei-Fei. 2012
bounding boxes for “bottle”
Tasks1. Draw a bounding box around a single instance
2. Quality verification of bounding box
3. Coverage verification
DrawingIntuitively simple…..
But the devil is in the details
DrawingThings vision researchers take for granted
• Include all visible parts • Include only visible parts • Make the bounding box tight • Only include a single instance • Don’t draw over any instances that already have
bounding boxes • What if there are no unannotated objects?
→ Provide instructions and use a qualification task!
DrawingInclude all visible parts
Good Bad
DrawingInclude only all visible parts
• Don’t try to “complete” the object
Good Bad
DrawingMake the bounding box tight
• Even though loose is much faster
Good Bad
DrawingOnly include a single instance
Good Bad
DrawingDon’t draw over instances that already have bounding boxes
• Can enforce this in the UI
Good Bad
DrawingWhat if there are no unannotated objects?
• Give option to annotate no bounding boxes
Good Bad
→ No more objects anything else
Quality VerificationSimpler than bounding box drawing
Still has some details
Is this bounding box good? YES
Quality VerificationDetails:
• Still need to know about good bounding boxes • Quality control
Is this bounding box good? YES
Quality VerificationQuality control
• Embed “gold standard” images • Positives: Majority vote • Negatives: Perturb the positives • Reject annotations if bad answers to these • Can be used for almost any type of task!
• (Optionally) require agreement of more than one annotator
Coverage Verification
Any unannotated raccoons?
Similar in style to quality verification • Just a different question • Still need instructions, quality control
Nope!
Bounding Boxes: Misc.Provide definitions and example images!
• Especially if uncommon objects • But also helps with common objects • Annotators from different cultures
Make sure objects being annotated are actually in your images
• Do the classification task first
Bounding Boxes: Misc.Make qualification tasks
Verification tasks are much faster than drawing
Corner cases: Each task needs plan for when previous task goes wrong.
Detection Overview• 456k training images • 61k fully-annotated val+test • 200 classes
BirdFrog
Detection Overview• 456k training images • 61k fully-annotated val+test • 200 classes
BirdFrog
Main Challenge: Annotating all 200 classes in every image.
Detection Pipeline1. Collect images
2. Class presence annotation
3. Bounding box annotation
BirdFrog
Detection Pipeline1. Collect images
2. Class presence annotation
3. Bounding box annotation
BirdFrog
Same as previous
Detection Pipeline1. Collect images
2. Class presence annotation
3. Bounding box annotation
BirdFrog
Collecting ImagesNeed images that aren’t single object-centric
Additional queries: • Compound object queries (“tiger lion”,
“skunk and cat”) • Complex scene queries (“kitchenette”,
“dining table”, “orchestra”)
Detection Pipeline1. Collect images
2. Class presence annotation
3. Bounding box annotation
BirdFrog
Deng, Russakovsky, Krause, Bernstein, Berg, Fei-‐Fei. CHI 2014
Naive approach: ask for each object
Answer
Question
Machine Crowd
Is there a table?
Yes
Table Chair Horse Dog Cat Bird
? ? ? ? ? ?
Deng, Russakovsky, Krause, Bernstein, Berg, Fei-‐Fei. CHI 2014
Naive approach: ask for each object
Table Chair Horse Dog Cat Bird
+ ? ? ? ? ?
Answer
Question
Machine Crowd
Is there a table?
Yes
Deng, Russakovsky, Krause, Bernstein, Berg, Fei-‐Fei. CHI 2014
Table Chair Horse Dog Cat Bird
+ + ? ? ? ?
Answer
Question
Machine Crowd
Is there a chair?
Yes
Naive approach: ask for each object
Deng, Russakovsky, Krause, Bernstein, Berg, Fei-‐Fei. CHI 2014
Table Chair Horse Dog Cat Bird
+ + -‐ ? ? ?
Answer
Question
Machine Crowd
Is there a horse?
No
Naive approach: ask for each object
Deng, Russakovsky, Krause, Bernstein, Berg, Fei-‐Fei. CHI 2014
Table Chair Horse Dog Cat Bird
+ + -‐ -‐ ? ?
Answer
Question
Machine Crowd
Is there a dog?
No
Naive approach: ask for each object
Deng, Russakovsky, Krause, Bernstein, Berg, Fei-‐Fei. CHI 2014
Table Chair Horse Dog Cat Bird
+ + -‐ -‐ -‐ ?
Answer
Question
Machine Crowd
Is there a cat?
No
Naive approach: ask for each object
Deng, Russakovsky, Krause, Bernstein, Berg, Fei-‐Fei. CHI 2014
Table Chair Horse Dog Cat Bird
+ + -‐ -‐ -‐ -‐
Answer
Question
Machine Crowd
Is there a bird?
No
Naive approach: ask for each object
Deng, Russakovsky, Krause, Bernstein, Berg, Fei-‐Fei. CHI 2014
Table Chair Horse Dog Cat Bird
+ + -‐ -‐ -‐ -‐
+ -‐ -‐ -‐ + -‐
+ + -‐ -‐ -‐ -‐
Cost: O(NK) for N images and K objects
Naive approach: ask for each object
Deng, Russakovsky, Krause, Bernstein, Berg, Fei-‐Fei. CHI 2014
Table Chair Horse Dog Cat Bird
+ + -‐ -‐ -‐ -‐
Furniture Mammal
AnimalHierarchy
Deng, Russakovsky, Krause, Bernstein, Berg, Fei-‐Fei. CHI 2014
Table Chair Horse Dog Cat Bird
+ + -‐ -‐ -‐ -‐
+ -‐ -‐ -‐ + -‐
+ + -‐ -‐ -‐ -‐
SparsityCorrelation
FurnitureMammal
AnimalHierarchy
Deng, Russakovsky, Krause, Bernstein, Berg, Fei-‐Fei. CHI 2014
Table Chair Horse Dog Cat Bird
? ? ? ? ? ?
Answer
Question
Machine Crowd
Furniture Mammal
Animal
Better approach: exploit label structure
Deng, Russakovsky, Krause, Bernstein, Berg, Fei-‐Fei. CHI 2014
Table Chair Horse Dog Cat Bird
? ? ? ? ? ?
Answer
Question
Machine Crowd
Is there an animal?
No
Furniture Mammal
Animal
Better approach: exploit label structure
Deng, Russakovsky, Krause, Bernstein, Berg, Fei-‐Fei. CHI 2014
Table Chair Horse Dog Cat Bird
? ? -‐ -‐ -‐ -‐
Answer
Question
Machine Crowd
Is there an animal?
No
Furniture Mammal
Animal
Better approach: exploit label structure
Deng, Russakovsky, Krause, Bernstein, Berg, Fei-‐Fei. CHI 2014
Table Chair Horse Dog Cat Bird
? ? -‐ -‐ -‐ -‐
Answer
Question
Machine Crowd
Is there furniture?
Yes
Mammal
Animal
Furniture
Better approach: exploit label structure
Deng, Russakovsky, Krause, Bernstein, Berg, Fei-‐Fei. CHI 2014
Answer
Question
Machine Crowd
Is there a table?
Yes
Mammal
Animal
Table Chair Horse Dog Cat Bird
? ? -‐ -‐ -‐ -‐
Furniture
Better approach: exploit label structure
Deng, Russakovsky, Krause, Bernstein, Berg, Fei-‐Fei. CHI 2014
Table Chair Horse Dog Cat Bird
+ ? -‐ -‐ -‐ -‐
Answer
Question
Machine Crowd
Is there a chair?
Yes
Mammal
Animal
Furniture
Better approach: exploit label structure
Deng, Russakovsky, Krause, Bernstein, Berg, Fei-‐Fei. CHI 2014
Answer
Question
Machine Crowd
Is there a chair?
Yes
Mammal
Animal
Table Chair Horse Dog Cat Bird
+ + -‐ -‐ -‐ -‐
Furniture
Better approach: exploit label structure
Deng, Russakovsky, Krause, Bernstein, Berg, Fei-‐Fei. CHI 2014
Selecting the Right Question
Goal: Get as much utility (new labels) as possible, for as little cost (worker time) as possible, given a desired level of accuracy.
Deng, Russakovsky, Krause, Bernstein, Berg, Fei-‐Fei. CHI 2014
Accuracy constraint
• User-specified accuracy threshold, e.g., 95% • Might require only one worker, might require
several based on the task
Deng, Russakovsky, Krause, Bernstein, Berg, Fei-‐Fei. CHI 2014
Cost: worker time (time = money)
Question (is there …) Cost (second)
a thing used to open cans/bottles 14.4an item that runs on electricity (plugged in or using batteries) 12.6a stringed instrument 3.4a canine 2.0
expected human time to get an answer with 95% accuracy
Deng, Russakovsky, Krause, Bernstein, Berg, Fei-‐Fei. CHI 2014
Utility: expected # of new labelsTable Chair Horse Dog Cat Bird
? ? ? ? ? ?
Is there a table?
Yes
No
Table Chair Horse Dog Cat Bird
+ ? ? ? ? ?
Table Chair Horse Dog Cat Bird
-‐ ? ? ? ? ?
utility = 1
Deng, Russakovsky, Krause, Bernstein, Berg, Fei-‐Fei. CHI 2014
Table Chair Horse Dog Cat Bird
? ? ? ? ? ?
Is there a table?
Yes
No
Table Chair Horse Dog Cat Bird
+ ? ? ? ? ?
Table Chair Horse Dog Cat Bird
-‐ ? ? ? ? ?
Table Chair Horse Dog Cat Bird
? ? ? ? ? ?
Is there an animal?
Table Chair Horse Dog Cat Bird
? ? ? ? ? ?
Table Chair Horse Dog Cat Bird
? ? -‐ -‐ -‐ -‐
utility = 1
utility = 0.5 * 0 + 0.5 * 4 = 2
Pr(Y) = 0.5
Pr(N) = 0.5
Utility: expected # of new labels
Deng, Russakovsky, Krause, Bernstein, Berg, Fei-‐Fei. CHI 2014
Pick the question with the most labels per second
Query: Is there a... Utility (num labels)
Cost (worker time in secs)
Utility-‐Cost Ratio (labels per sec)
mammal with claws or fingers
12.0 3.0 4.0
living organism 24.8 7.9 3.1mammal 17.6 7.4 2.4creature without legs 5.9 2.6 2.3land or avian creature 20.8 9.5 2.2
Selecting the Right Question
Deng, Russakovsky, Krause, Bernstein, Berg, Fei-‐Fei. CHI 2014
• Dataset: 20K images from ImageNet Challenge 2013. • Labels: 200 basic categories (dog, cat, table…) • 64 internal nodes in hierarchy
Results
Deng, Russakovsky, Krause, Bernstein, Berg, Fei-‐Fei. CHI 2014
Results: accuracy
Accuracy Threshold per question (parameter)
Accuracy (F1 score) Naive approach
Accuracy (F1 score) Our approach
0.95 99.64 (75.67) 99.75 (76.97)0.90 99.29 (60.17) 99.62 (60.69)
Annotating 10K images with 200 objects
Deng, Russakovsky, Krause, Bernstein, Berg, Fei-‐Fei. CHI 2014
Results: cost
Accuracy Threshold per question (parameter)
Cost saving (our approach compared to
naive approach)
0.95 3.93x0.90 6.18x
Annotating 10K images with 200 objects
6 times more labels per second
OverviewClassification
Localization
Detection BirdFrog
Final Thoughts• Provide good instructions • Do quality control • Visualize results • Listen to your workers
Questions?