get another label? improving data quality and machine learning using multiple, noisy labelers
DESCRIPTION
Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers. Panos Ipeirotis New York University. Joint work with Jing Wang, Foster Provost, and Victor Sheng. Outsourcing machine learning preprocessing. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/1.jpg)
Get Another Label? Improving Data Quality and Machine
Learning Using Multiple, Noisy Labelers
Panos IpeirotisNew York University
Joint work with Jing Wang, Foster Provost, and Victor Sheng
![Page 2: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/2.jpg)
![Page 3: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/3.jpg)
3
![Page 4: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/4.jpg)
4
![Page 5: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/5.jpg)
5
![Page 6: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/6.jpg)
6
Outsourcing machine learning preprocessing
Traditionally, modeling teams have invested substantial internal resources in data formulation, information extraction, cleaning, and other preprocessing
Now, we can outsource preprocessing tasks, such as labeling, feature extraction, verifying information extraction, etc.– using Mechanical Turk, oDesk, etc.– quality may be lower than expert labeling (much?) – but low costs can allow massive scale
![Page 7: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/7.jpg)
Example: Build an “Adult Web Site” Classifier
Need a large number of hand-labeled sites Get people to look at sites and classify them as:G (general audience) PG (parental guidance) R (restricted) X (porn)
![Page 8: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/8.jpg)
8
![Page 9: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/9.jpg)
Example: Build an “Adult Web Site” Classifier
Need a large number of hand-labeled sites Get people to look at sites and classify them as:G (general audience) PG (parental guidance) R (restricted) X (porn)
Cost/Speed Statistics Undergrad intern: 200 websites/hr, cost:
$15/hr MTurk: 2500 websites/hr, cost: $12/hr
![Page 10: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/10.jpg)
10
Noisy labels can be problematic
Many tasks rely on high-quality labels for objects:– webpage classification for safe advertising– learning predictive models– searching for relevant information – finding duplicate database records – image recognition/labeling– song categorization– sentiment analysis
Noisy labels can lead to degraded task performance
![Page 11: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/11.jpg)
11
40
50
60
70
80
90
100
1 20 40 60 80 100
120
140
160
180
200
220
240
260
280
300
AUC
Number of examples ("Mushroom" data set)
Quality and Classification Performance
Labeling quality increases classification quality increases
P = 50%
P = 60%
P = 80%
P = 100%
P: Single-labeler quality (probability of assigning correctly a binary label)
![Page 12: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/12.jpg)
12
Solutions
Get better labelers– Often beyond our
control or too expensive
Get more labelers per item– Our focus
![Page 13: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/13.jpg)
13
Majority Voting and Label Quality
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 3 5 7 9 11 13
Number of labelers
Inte
grat
ed q
ualit
y
P=0.4
P=0.5
P=0.6
P=0.7
P=0.8
P=0.9
P=1.0
Ask multiple labelers, keep majority label as “true” label Quality is probability of being correct
P is probabilityof individual labelerbeing correct
![Page 14: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/14.jpg)
14
Tradeoffs for Modeling
Get more examples Improve classification Get more labels Improve label quality Improve classification
40
50
60
70
80
90
100
1 20 40 60 80 100
120
140
160
180
200
220
240
260
280
300
Number of examples (Mushroom)
Accu
racy
P = 0.5
P = 0.6
P = 0.8
P = 1.0
![Page 15: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/15.jpg)
15
Basic Labeling Strategies
Single Labeling– Get as many data points as possible– One label each
Round-robin Repeated Labeling– Repeatedly label data points, giving next label to the
one with the fewest so far
![Page 16: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/16.jpg)
16
Round Robin vs. Single: Low Noise
p= 0.8, labeling quality#examples =50
FRR (50 examples)
Single
With low noise/few examples, more (single labeled) examples better
![Page 17: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/17.jpg)
17
Round Robin vs. Single: High Noise
p= 0.6, labeling quality#examples =100
FRR(100 examples)
SL
High noise/many examples, repeated labeling better
![Page 18: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/18.jpg)
18
Selective Repeated-Labeling
We have seen so far: – With enough examples and noisy labels, getting multiple
labels is better than single-labeling– When we consider extra cost for getting the unlabeled part
the benefit is magnified Can we do better than the basic strategies? Key observation: we have additional information to
guide selection of data for repeated labeling the current multiset of labels
Example: {+,-,+,-,-,+} vs. {+,+,+,+,+,+}
![Page 19: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/19.jpg)
19
Natural Candidate: Entropy
Entropy is a natural measure of label uncertainty:
E({+,+,+,+,+,+})=0 E({+,-, +,-, -,+ })=1
Strategy: Get more labels for high-entropy label multisets
||||log
||||
||||log
||||)( 22 S
SSS
SS
SSSE
negativeSpositiveS |:||:|
![Page 20: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/20.jpg)
20
What Not to Do: Use Entropy
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
0 1000 2000 3000 4000
Labe
ling
qual
ity
Number of labels (mushroom, p=0.6)
GRR
ENTROPY
Entropy: Improves at first, hurts in long run
![Page 21: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/21.jpg)
Why not Entropy
In the presence of noise, entropy will be high even with many labels
Entropy is scale invariant {3+, 2-} has same entropy as {600+ , 400-}, i.e., 0.97
21
![Page 22: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/22.jpg)
22
Estimating Label Uncertainty (LU)
Observe +’s and –’s and compute Pr{+|obs} and Pr{-|obs} Use Bayesian estimate of Bernoulli: Beta prior + update Uncertainty = tail of beta distribution
SLU
0.50.0 1.0
Beta probability density function
![Page 23: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/23.jpg)
Label Uncertainty
p=0.7 5 labels (3+, 2-) Entropy ~ 0.97 CDFb=0.34
23
![Page 24: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/24.jpg)
Label Uncertainty
p=0.7 10 labels (7+, 3-) Entropy ~ 0.88 CDFb=0.11
24
![Page 25: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/25.jpg)
Label Uncertainty
p=0.7 20 labels (14+, 6-) Entropy ~ 0.88 CDFb=0.04
25
![Page 26: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/26.jpg)
Label Uncertainty vs. Round Robin
26
0.600.650.700.75
0.800.850.900.951.00
0 1000 2000 3000 4000
Labe
ling
qual
ity
Number of labels (mushroom, p=0.6)
GRR
LU
similar results across a dozen data sets
Remember: GRR already better than single labeling
![Page 27: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/27.jpg)
Why LU is a hack?
![Page 28: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/28.jpg)
28
More sophisticated label uncertainty
Observe +’s and –’s and compute Pr{+|obs} and Pr{-|obs} Estimated Beta distribution = quality of labelers for the
example
Estimate the prior Pr(+) by iterating
![Page 29: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/29.jpg)
![Page 30: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/30.jpg)
![Page 31: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/31.jpg)
More sophisticated LU improves labeling quality under class imbalance and fixes some pesky LU learning curve glitches
31
Both techniquesperform essentially optimally with balanced classes
![Page 32: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/32.jpg)
32
Another strategy:Model Uncertainty (MU)
Learning models of the data provides an alternative source of information about label certainty– (a random forest for the results to come)
Model uncertainty: get more labels for instances that cause model uncertainty
Intuition?– for modeling: why improve training data
quality if model already is certain there?– for data quality, low-certainty “regions”
may be due to incorrect labeling of corresponding instances
Models
Examples
Self-healing process [Brodley et al, 99]
+ ++
++ ++
+
+ ++
+
+ ++
+ + ++
+
- - - -- - - -- -
- -
- - - -
- - - -- - - -- - - -
- - - -
+
![Page 33: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/33.jpg)
33
Yet another strategy:Label & Model Uncertainty (LMU)
Label and model uncertainty (LMU): avoid examples where either strategy is certain
MULULMU SSS
![Page 34: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/34.jpg)
Label Quality
34
0.60.650.7
0.750.8
0.850.9
0.951
0 400 800 1200 1600 2000Number of labels (waveform, p=0.6)
Labe
ling
qual
ity
UNF MULU LMU
Label Uncertainty
Uniform, round robin
Label + Model Uncertainty
Model Uncertainty alone also improves quality
![Page 35: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/35.jpg)
35
Model Quality
Label & Model Uncertainty
Across 12 domains, LMU is always better than GRR. LMU is statistically significantlybetter than LU and MU.
70
75
80
85
90
95
100
0 1000 2000 3000 4000Number of labels (sick, p=0.6)
Acc
urac
y
GRR MULU LMU
![Page 36: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/36.jpg)
Adult content classification
36
![Page 37: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/37.jpg)
Example: Build an “Adult Web Site” Classifier
Need a large number of hand-labeled sites Get people to look at sites and classify them as:G (general audience) PG (parental guidance) R (restricted) X (porn)
Cost/Speed Statistics Undergrad intern: 200 websites/hr, cost:
$15/hr MTurk: 2500 websites/hr, cost: $12/hr
![Page 38: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/38.jpg)
Bad news: Spammers!
Worker ATAMRO447HWJQ
labeled X (porn) sites as G (general audience)
![Page 39: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/39.jpg)
Redundant votes, infer quality
Look at our spammer friend ATAMRO447HWJQ together with other 9 workers
Using redundancy, we can compute error rates for each worker
![Page 40: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/40.jpg)
1. Initialize“correct” label for each object (e.g., use majority vote)2. Estimate error rates for workers (using “correct”
labels)3. Estimate “correct” labels (using error rates, weight
worker votes according to quality)4. Go to Step 2 and iterate until convergence
Repeated Labeling, EM, and Confusion Matrices (Dawid & Skene, 1979)
Iterative process to estimate worker error rates
Our friend ATAMRO447HWJQ marked almost all sites as G.
Seems like a spammer…
Error rates for ATAMRO447HWJQP[G → G]=99.947% P[G → X]=0.053%P[X → G]=99.153% P[X → X]=0.847%
![Page 41: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/41.jpg)
Challenge: From Confusion Matrixes to Quality Scores
The algorithm generates “confusion matrixes” for workers
How to check if a worker is a spammer
using the confusion matrix?(hint: error rate not enough)
Error rates for ATAMRO447HWJQ P[X → X]=0.847% P[X → G]=99.153%
P[G → X]=0.053% P[G → G]=99.947%
![Page 42: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/42.jpg)
Challenge 1: Spammers are lazy and smart!
Confusion matrix for spammer P[X → X]=0% P[X → G]=100%
P[G → X]=0% P[G → G]=100%
Confusion matrix for good worker
P[X → X]=80% P[X → G]=20%
P[G → X]=20%P[G → G]=80% Spammers figure out how to fly under the radar…
In reality, we have 85% G sites and 15% X sites
Errors of spammer = 0% * 85% + 100% * 15% = 15% Error rate of good worker = 85% * 20% + 85% * 20% = 20%
False negatives: Spam workers pass as legitimate
![Page 43: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/43.jpg)
Challenge 2: Humans are biased!
Error rates for CEO of AdSafe
P[G → G]=20.0% P[G → P]=80.0%P[G → R]=0.0% P[G → X]=0.0%P[P → G]=0.0% P[P → P]=0.0% P[P → R]=100.0% P[P →
X]=0.0%P[R → G]=0.0% P[R → P]=0.0% P[R → R]=100.0% P[R →
X]=0.0%P[X → G]=0.0% P[X → P]=0.0% P[X → R]=0.0% P[X → X]=100.0% In reality, we have 85% G sites, 5% P sites, 5% R
sites, 5% X sites
Errors of spammer (all in G) = 0% * 85% + 100% * 15% = 15%
Error rate of biased worker = 80% * 85% + 100% * 5% = 73%
False positives: Legitimate workers appear to be spammers
![Page 44: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/44.jpg)
Solution: Reverse errors first, compute error rate afterwards
When biased worker says G, it is 100% G When biased worker says P, it is 100% G When biased worker says R, it is 50% P, 50% R When biased worker says X, it is 100% X
Small ambiguity for “R-rated” votes but other than that, fine!
Error Rates for biased workerP[G → G]=20.0% P[G → P]=80.0%P[G → R]=0.0% P[G → X]=0.0%P[P → G]=0.0% P[P → P]=0.0% P[P → R]=100.0% P[P →
X]=0.0%P[R → G]=0.0% P[R → P]=0.0% P[R → R]=100.0% P[R →
X]=0.0%P[X → G]=0.0% P[X → P]=0.0% P[X → R]=0.0% P[X → X]=100.0%
![Page 45: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/45.jpg)
When spammer says G, it is 25% G, 25% P, 25% R, 25% X When spammer says P, it is 25% G, 25% P, 25% R, 25% X When spammer says R, it is 25% G, 25% P, 25% R, 25% X When spammer says X, it is 25% G, 25% P, 25% R, 25% X[note: assume equal priors]
The results are highly ambiguous. No information provided!
Error Rates for spammer: ATAMRO447HWJQP[G → G]=100.0% P[G → P]=0.0% P[G → R]=0.0% P[G → X]=0.0%P[P → G]=100.0% P[P → P]=0.0% P[P → R]=0.0% P[P → X]=0.0%P[R → G]=100.0% P[R → P]=0.0% P[R → R]=0.0% P[R → X]=0.0%P[X → G]=100.0% P[X → P]=0.0% P[X → R]=0.0% P[X → X]=0.0%
Solution: Reverse errors first, compute error rate afterwards
![Page 46: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/46.jpg)
Cost of “soft” label <p1, p2, …., pL > with cost cij for labeling as j an object of class i
“Soft” labels with probability mass in a single class are good “Soft” labels with probability mass spread across classes are bad
Cost (spammer) = Replace pi with Priori
Computing Quality Scores
Quality = 1 – Cost(worker) / Cost(spammer)
![Page 47: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/47.jpg)
Experimental Results
500 web pages in G, P, R, X 100 workers per page (just to evaluate effect of more labels) 339 workers Lots of noise! 95% accuracy with majority vote (only!) Dropped all workers with quality score 50% or below
Error rate: 1% of labels dropped Quality scores: 30% of labels dropped, accuracy 99.8%
Note massive amount of redundancy and very conservative spam rejection
![Page 48: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/48.jpg)
Too much theory?
Open source implementation available at:http://code.google.com/p/get-another-label/
Input: – Labels from Mechanical Turk– Cost of incorrect labelings (e.g., XG costlier than GX)
Output: – Corrected labels– Worker error rates– Ranking of workers according to their quality
Beta version, more improvements to come! Suggestions and collaborations welcomed!
![Page 49: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/49.jpg)
49
Workers reacting to quality scores
Score-based feedback leads to strange interactions:
The angry, has-been-burnt-too-many-times worker: “F*** YOU! I am doing everything correctly and you know
it! Stop trying to reject me with your stupid ‘scores’!”
The overachiever: “What am I doing wrong?? My score is 92% and I want to
have 100%”
![Page 50: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/50.jpg)
50
An unexpected connection at theNAS “Frontiers of Science” conf.
Your spammers behave like my mice!
![Page 51: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/51.jpg)
51
An unexpected connection at theNAS “Frontiers of Science” conf.
Your spammers behave like my mice!
Eh?
![Page 52: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/52.jpg)
52
An unexpected connection at theNAS “Frontiers of Science” conf.
Your spammers want to engage their brain only for motor skills and not for cognitive skills
Yeah, makes sense…
![Page 53: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/53.jpg)
53
An unexpected connection at theNAS “Frontiers of Science” conf.
And here is how I train my mice to behave…
I should try this the moment that I get back to my room
![Page 54: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/54.jpg)
54
Implicit Feedback with Frustration
Punish bad answers with frustration (e.g, delays between tasks)– “Loading image, please wait…”– “Image did not load, press here to reload”– “Network error. Return the HIT and accept again”
Reward good answers by giving next task immediately, with no problems
→Make this probabilistic to keep feedback implicit
![Page 55: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/55.jpg)
55
First result
Spammer workers quickly abandon Good workers keep labeling
Bad: Spammer bots unaffected How to frustrate a bot?
– Give it a CAPTHCA
![Page 56: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/56.jpg)
56
Second result (more impressive)
Remember, Don was training the mice…
15% of the spammers start submitting good work! Putting cognitive effort is more beneficial …
Key trick: Learn to test workers on-the-fly– Model performance using Dirichlet compound
multinomial distribution (details offline)
![Page 57: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/57.jpg)
Thanks!
Q & A?
![Page 58: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/58.jpg)
Why does Model Uncertainty (MU) work?
MU score distributions for correctly labeled (blue) and incorrectly labeled (purple) cases
58
+ ++
++ ++ +
+ ++
+
+ ++
+ + ++
+
- - - -- - - -- -
- -
- - - -
- - - -- - - -- - - -
- - - -
+
![Page 59: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/59.jpg)
59
Why does Model Uncertainty (MU) work?
Models
ExamplesSelf-healing process
+ ++
++ ++ +
+ ++
+
+ ++
+ + ++
+
- - - -- - - -- -
- -
- - - -
- - - -- - - -- - - -
- - - -
+
Self-healing MU
“active learning” MU
![Page 60: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/60.jpg)
What if different labelers have different qualities?
(Sometimes) quality of multiple noisy labelers is better than quality of best labeler in set
here, 3 labelers:p-d, p, p+d
60
![Page 61: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/61.jpg)
Soft Labeling vs. Majority Voting
MV: majority voting ME: soft labeling
55
60
65
70
10 410 810 1210 1610Number of examples (bmg, p=0.6)
Acc
urac
y
MVME
![Page 62: Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers](https://reader034.vdocuments.site/reader034/viewer/2022051316/56815c2d550346895dca094b/html5/thumbnails/62.jpg)
62
Related topic
Estimating (and using) the labeler quality– for multilabeled data: Dawid & Skene 1979; Raykar et
al. JMLR 2010; Donmez et al. KDD09– for single-labeled data with variable-noise labelers: Donmez &
Carbonell 2008; Dekel & Shamir 2009a,b– to eliminate/down-weight poor labelers: Dekel &
Shamir, Donmez et al.; Raykar et al. (implicitly)– and correct labeler biases: Ipeirotis et al. HCOMP-10– Example-conditional labeler performance
Yan et al. 2010a,b Using learned model to find bad labelers/labels:
Brodley & Friedl 1999; Dekel & Shamir, Us (I’ll discuss)