lessons learned from large scale crowdsourced data...

Post on 04-Aug-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Lessons Learned from Large‑Scale Crowdsourced Data Collection for ILSVRC

Jonathan Krause

OverviewClassification

Localization

Detection Pelican

OverviewClassification

Localization

Detection Pelican

OverviewClassification

Localization

Detection BirdFrog

Classification Overview• 1.4M images • 1,000 classes

Pelican

By hand: • 5 sec/image • 50% images correct • 12 hours worked/day

= 324 days!

CrowdsourcingLet the crowd do the work for you!

Classification Pipeline1. Collect candidate images for each category

2. Put candidate images on Amazon Mechanical Turk (AMT)

3. AMT workers click on images containing each class

4. Aggregate worker responses into labels

Collecting ImagesCategory: “Whippet”

Google Image Search:

Problem: Limited Images• Web searches are limited • Solution: Query Expansion

• WordNet: Whippet: “a small slender dog of greyhound type developed in England”

→ “whippet dog”, “whippet greyhound”

→ translate into other languages

Deploying on AMT

Annotate many images at once!

Make sure workers

understand the classes!

Understanding ClassesWikipedia and Google links

Understanding ClassesGive them a definition delta: a low triangular area of alluvial deposits where a river divid before entering a larger body of water: “the Mississippi River delta”; “the Nile delta”

Understanding ClassesTest them on the definition

Understanding ClassesTest them on the definition

Understanding ClassesGive example images (if you have them)

Hard Easya small slender dog of greyhound type developed in England

a small slender dog of greyhound type developed in England

+

Quality ControlWorkers on AMT are:

• Fast • Inexpensive • Plentiful

But they are not: • Highly trained

Solution: Multiple responses, merge results

Quality ControlGiven

• Set of (worker, image, response)

Want • P(image has label) for each image • (Optionally) worker quality estimates

A Simple Method• Majority vote

Q: Is this a whippet?Responses:

Yes No Yes Yes No No Yes

Yes

Majority VoteProblems: • Doesn’t give confidence • Hard to measure worker quality

Responses: Yes No Yes Yes No No Yes

How sure are we it’s positive?

How good are these workers?

One Approach• Annotate a subset of images with many

annotations • Majority vote to determine ground truth • Determine confidence given fewer annotations

Deng et al. 2009

Pro & ConPro

• Simple • Gives image confidence

Con • Treats all workers the same • Relies on initial majority vote

Another ApproachModel:

• Prior of label correct • Worker confusion matrix

Dawid, Skene. 1979

Max-likelihood with EM

Another ApproachWorker Quality

• Compute Soft Label: distribution over labels given worker response

• Calculate expected cost of soft label q:

Ipeirotis, Provost, Wang. 2012

Pro & ConPro

• Gives image confidence • Gives worker quality

Con • More complex • Need to run optimization

OverviewClassification

Localization

Detection Pelican

Localization Overview• Classification images • 1,000 classes • 600k training bounding boxes

Pelican

Main Challenge: Collecting and verifying bounding boxes

Bounding BoxesRequirements:

• Tight around object • Around all object instances • Not around other objects

Su, Deng, Fei-Fei. 2012

bounding boxes for “bottle”

Tasks1. Draw a bounding box around a single instance

2. Quality verification of bounding box

3. Coverage verification

DrawingIntuitively simple…..

But the devil is in the details

DrawingThings vision researchers take for granted

• Include all visible parts • Include only visible parts • Make the bounding box tight • Only include a single instance • Don’t draw over any instances that already have

bounding boxes • What if there are no unannotated objects?

→ Provide instructions and use a qualification task!

DrawingInclude all visible parts

Good Bad

DrawingInclude only all visible parts

• Don’t try to “complete” the object

Good Bad

DrawingMake the bounding box tight

• Even though loose is much faster

Good Bad

DrawingOnly include a single instance

Good Bad

DrawingDon’t draw over instances that already have bounding boxes

• Can enforce this in the UI

Good Bad

DrawingWhat if there are no unannotated objects?

• Give option to annotate no bounding boxes

Good Bad

→ No more objects anything else

Quality VerificationSimpler than bounding box drawing

Still has some details

Is this bounding box good? YES

Quality VerificationDetails:

• Still need to know about good bounding boxes • Quality control

Is this bounding box good? YES

Quality VerificationQuality control

• Embed “gold standard” images • Positives: Majority vote • Negatives: Perturb the positives • Reject annotations if bad answers to these • Can be used for almost any type of task!

• (Optionally) require agreement of more than one annotator

Coverage Verification

Any unannotated raccoons?

Similar in style to quality verification • Just a different question • Still need instructions, quality control

Nope!

Bounding Boxes: Misc.Provide definitions and example images!

• Especially if uncommon objects • But also helps with common objects • Annotators from different cultures

Make sure objects being annotated are actually in your images

• Do the classification task first

Bounding Boxes: Misc.Make qualification tasks

Verification tasks are much faster than drawing

Corner cases: Each task needs plan for when previous task goes wrong.

Detection Overview• 456k training images • 61k fully-annotated val+test • 200 classes

BirdFrog

Detection Overview• 456k training images • 61k fully-annotated val+test • 200 classes

BirdFrog

Main Challenge: Annotating all 200 classes in every image.

Detection Pipeline1. Collect images

2. Class presence annotation

3. Bounding box annotation

BirdFrog

Detection Pipeline1. Collect images

2. Class presence annotation

3. Bounding box annotation

BirdFrog

Same as previous

Detection Pipeline1. Collect images

2. Class presence annotation

3. Bounding box annotation

BirdFrog

Collecting ImagesNeed images that aren’t single object-centric

Additional queries: • Compound object queries (“tiger lion”,

“skunk and cat”) • Complex scene queries (“kitchenette”,

“dining table”, “orchestra”)

Detection Pipeline1. Collect images

2. Class presence annotation

3. Bounding box annotation

BirdFrog

Deng,  Russakovsky,  Krause,  Bernstein,  Berg,  Fei-­‐Fei.  CHI  2014

Naive approach: ask for each object

Answer

Question

Machine Crowd

Is  there  a  table?

Yes

Table Chair Horse Dog Cat Bird

? ? ? ? ? ?

Deng,  Russakovsky,  Krause,  Bernstein,  Berg,  Fei-­‐Fei.  CHI  2014

Naive approach: ask for each object

Table Chair Horse Dog Cat Bird

+ ? ? ? ? ?

Answer

Question

Machine Crowd

Is  there  a  table?

Yes

Deng,  Russakovsky,  Krause,  Bernstein,  Berg,  Fei-­‐Fei.  CHI  2014

Table Chair Horse Dog Cat Bird

+ + ? ? ? ?

Answer

Question

Machine Crowd

Is  there  a  chair?

Yes

Naive approach: ask for each object

Deng,  Russakovsky,  Krause,  Bernstein,  Berg,  Fei-­‐Fei.  CHI  2014

Table Chair Horse Dog Cat Bird

+ + -­‐ ? ? ?

Answer

Question

Machine Crowd

Is  there  a  horse?

No

Naive approach: ask for each object

Deng,  Russakovsky,  Krause,  Bernstein,  Berg,  Fei-­‐Fei.  CHI  2014

Table Chair Horse Dog Cat Bird

+ + -­‐ -­‐ ? ?

Answer

Question

Machine Crowd

Is  there  a  dog?

No

Naive approach: ask for each object

Deng,  Russakovsky,  Krause,  Bernstein,  Berg,  Fei-­‐Fei.  CHI  2014

Table Chair Horse Dog Cat Bird

+ + -­‐ -­‐ -­‐ ?

Answer

Question

Machine Crowd

Is  there  a  cat?

No

Naive approach: ask for each object

Deng,  Russakovsky,  Krause,  Bernstein,  Berg,  Fei-­‐Fei.  CHI  2014

Table Chair Horse Dog Cat Bird

+ + -­‐ -­‐ -­‐ -­‐

Answer

Question

Machine Crowd

Is  there  a  bird?

No

Naive approach: ask for each object

Deng,  Russakovsky,  Krause,  Bernstein,  Berg,  Fei-­‐Fei.  CHI  2014

Table Chair Horse Dog Cat Bird

+ + -­‐ -­‐ -­‐ -­‐

+ -­‐ -­‐ -­‐ + -­‐

+ + -­‐ -­‐ -­‐ -­‐

Cost: O(NK) for N images and K objects

Naive approach: ask for each object

Deng,  Russakovsky,  Krause,  Bernstein,  Berg,  Fei-­‐Fei.  CHI  2014

Table Chair Horse Dog Cat Bird

+ + -­‐ -­‐ -­‐ -­‐

Furniture Mammal

AnimalHierarchy

Deng,  Russakovsky,  Krause,  Bernstein,  Berg,  Fei-­‐Fei.  CHI  2014

Table Chair Horse Dog Cat Bird

+ + -­‐ -­‐ -­‐ -­‐

+ -­‐ -­‐ -­‐ + -­‐

+ + -­‐ -­‐ -­‐ -­‐

SparsityCorrelation

FurnitureMammal

AnimalHierarchy

Deng,  Russakovsky,  Krause,  Bernstein,  Berg,  Fei-­‐Fei.  CHI  2014

Table Chair Horse Dog Cat Bird

? ? ? ? ? ?

Answer

Question

Machine Crowd

Furniture Mammal

Animal

Better approach: exploit label structure

Deng,  Russakovsky,  Krause,  Bernstein,  Berg,  Fei-­‐Fei.  CHI  2014

Table Chair Horse Dog Cat Bird

? ? ? ? ? ?

Answer

Question

Machine Crowd

Is  there  an  animal?

No

Furniture Mammal

Animal

Better approach: exploit label structure

Deng,  Russakovsky,  Krause,  Bernstein,  Berg,  Fei-­‐Fei.  CHI  2014

Table Chair Horse Dog Cat Bird

? ? -­‐ -­‐ -­‐ -­‐

Answer

Question

Machine Crowd

Is  there  an  animal?

No

Furniture Mammal

Animal

Better approach: exploit label structure

Deng,  Russakovsky,  Krause,  Bernstein,  Berg,  Fei-­‐Fei.  CHI  2014

Table Chair Horse Dog Cat Bird

? ? -­‐ -­‐ -­‐ -­‐

Answer

Question

Machine Crowd

Is  there  furniture?

Yes

Mammal

Animal

Furniture

Better approach: exploit label structure

Deng,  Russakovsky,  Krause,  Bernstein,  Berg,  Fei-­‐Fei.  CHI  2014

Answer

Question

Machine Crowd

Is  there  a  table?

Yes

Mammal

Animal

Table Chair Horse Dog Cat Bird

? ? -­‐ -­‐ -­‐ -­‐

Furniture

Better approach: exploit label structure

Deng,  Russakovsky,  Krause,  Bernstein,  Berg,  Fei-­‐Fei.  CHI  2014

Table Chair Horse Dog Cat Bird

+ ? -­‐ -­‐ -­‐ -­‐

Answer

Question

Machine Crowd

Is  there  a  chair?

Yes

Mammal

Animal

Furniture

Better approach: exploit label structure

Deng,  Russakovsky,  Krause,  Bernstein,  Berg,  Fei-­‐Fei.  CHI  2014

Answer

Question

Machine Crowd

Is  there  a  chair?

Yes

Mammal

Animal

Table Chair Horse Dog Cat Bird

+ + -­‐ -­‐ -­‐ -­‐

Furniture

Better approach: exploit label structure

Deng,  Russakovsky,  Krause,  Bernstein,  Berg,  Fei-­‐Fei.  CHI  2014

Selecting the Right Question

Goal: Get as much utility (new labels) as possible, for as little cost (worker time) as possible, given a desired level of accuracy.

Deng,  Russakovsky,  Krause,  Bernstein,  Berg,  Fei-­‐Fei.  CHI  2014

Accuracy constraint

• User-specified accuracy threshold, e.g., 95% • Might require only one worker, might require

several based on the task

Deng,  Russakovsky,  Krause,  Bernstein,  Berg,  Fei-­‐Fei.  CHI  2014

Cost: worker time (time = money)

Question  (is  there  …) Cost  (second)

a  thing  used  to  open  cans/bottles 14.4an  item  that  runs  on  electricity  (plugged  in  or  using  batteries) 12.6a  stringed  instrument 3.4a  canine 2.0

expected human time to get an answer with 95% accuracy

Deng,  Russakovsky,  Krause,  Bernstein,  Berg,  Fei-­‐Fei.  CHI  2014

Utility: expected # of new labelsTable Chair Horse Dog Cat Bird

? ? ? ? ? ?

Is  there  a  table?

Yes

No

Table Chair Horse Dog Cat Bird

+ ? ? ? ? ?

Table Chair Horse Dog Cat Bird

-­‐ ? ? ? ? ?

utility  =  1

Deng,  Russakovsky,  Krause,  Bernstein,  Berg,  Fei-­‐Fei.  CHI  2014

Table Chair Horse Dog Cat Bird

? ? ? ? ? ?

Is  there  a  table?

Yes

No

Table Chair Horse Dog Cat Bird

+ ? ? ? ? ?

Table Chair Horse Dog Cat Bird

-­‐ ? ? ? ? ?

Table Chair Horse Dog Cat Bird

? ? ? ? ? ?

Is  there  an  animal?

Table Chair Horse Dog Cat Bird

? ? ? ? ? ?

Table Chair Horse Dog Cat Bird

? ? -­‐ -­‐ -­‐ -­‐

utility  =  1

utility  =  0.5  *  0  +  0.5  *  4  =  2

Pr(Y)  =  0.5

Pr(N)  =  0.5

Utility: expected # of new labels

Deng,  Russakovsky,  Krause,  Bernstein,  Berg,  Fei-­‐Fei.  CHI  2014

Pick the question with the most labels per second

Query:  Is  there  a...   Utility    (num  labels)  

Cost    (worker  time  in  secs)  

Utility-­‐Cost  Ratio  (labels  per  sec)

mammal  with  claws  or  fingers  

12.0 3.0 4.0

living  organism 24.8 7.9 3.1mammal 17.6 7.4 2.4creature  without  legs   5.9 2.6 2.3land  or  avian  creature 20.8 9.5 2.2

Selecting the Right Question

Deng,  Russakovsky,  Krause,  Bernstein,  Berg,  Fei-­‐Fei.  CHI  2014

• Dataset: 20K images from ImageNet Challenge 2013. • Labels: 200 basic categories (dog, cat, table…) • 64 internal nodes in hierarchy

Results

Deng,  Russakovsky,  Krause,  Bernstein,  Berg,  Fei-­‐Fei.  CHI  2014

Results: accuracy

Accuracy  Threshold  per  question  (parameter)

Accuracy    (F1  score)    Naive  approach

Accuracy    (F1  score)  Our  approach

0.95 99.64  (75.67) 99.75  (76.97)0.90 99.29  (60.17) 99.62  (60.69)

Annotating 10K images with 200 objects

Deng,  Russakovsky,  Krause,  Bernstein,  Berg,  Fei-­‐Fei.  CHI  2014

Results: cost

Accuracy  Threshold  per  question  (parameter)

Cost  saving  (our  approach  compared  to  

naive  approach)

0.95 3.93x0.90 6.18x

Annotating 10K images with 200 objects

6  times  more  labels  per  second

OverviewClassification

Localization

Detection BirdFrog

Final Thoughts• Provide good instructions • Do quality control • Visualize results • Listen to your workers

Questions?

top related