on symmetric losses for learning from corrupted labels · on symmetric losses for learning from...

27
On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo 1 RIKEN Center for Advanced Intelligence Project (AIP) 2 Nontawat Charoenphakdee 1,2 , Jongyeong Lee 1,2 and Masashi Sugiyama 2,1

Upload: others

Post on 27-Feb-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project

On Symmetric Losses for Learning from Corrupted Labels

The University of Tokyo1

RIKEN Center for Advanced Intelligence Project (AIP)2

Nontawat Charoenphakdee1,2 , Jongyeong Lee1,2

and Masashi Sugiyama2,1

Page 2: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project

Supervised learning

https://t.pimg.jp/006/570/886/1/6570886.jpghttps://www.kullabs.com/uploads/meauring-clip-art-at-clker-com-vector-clip-art-online-royalty-free-H2SJHF-clipart.pnghttps://d3njjcbhbojbot.cloudfront.net/api/utilities/v1/imageproxy/https://coursera.s3.amazonaws.com/topics/ml/large-icon.png\

Features (Input) Labels (Output)

No noise robustness

Prediction function

2

Machine learning

Data collection

Learn from input-output pairs Predict output ofunseen input accurately

Such that

Page 3: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project

Learning from corrupted labels

https://thumbs.dreamstime.com/b/power-crowd-d-render-crowdsourcing-concept-30738769.jpghttp://www.process-improvement-institute.com/wp-content/uploads/2015/05/Accounting-for-Human-Error-Probability-in-SIL-Verification.jpghttps://www.kullabs.com/uploads/meauring-clip-art-at-clker-com-vector-clip-art-online-royalty-free-H2SJHF-clipart.pnghttps://d3njjcbhbojbot.cloudfront.net/api/utilities/v1/imageproxy/https://coursera.s3.amazonaws.com/topics/ml/large-icon.png

Feature collection

Labeling process

Noise-robust ML

Data collection Prediction function

Our goal

Examples:• Expert labelers (human error)• Crowdsourcing (non-expert error)

3

Page 4: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project

Contents

• Background and related work

• The importance of symmetric losses

• Theoretical properties of symmetric losses

• Barrier hinge loss

• Experiments

4

Page 5: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project

Warmup: Binary classification: Label

: Prediction function: Feature vector

• Given: input-output pairs:

• Goal: minimize expected error:

No access to distribution: minimize empirical error (Vapnik, 1998):

: Margin loss function

5

same signdifferent sign

Page 6: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project

Minimizing 0-1 loss directly is difficult.• Discontinuous and not differentiable (Ben-david+, 2003, Feldman+, 2012)

In practice, we minimize a surrogate loss (Zhang, 2004, Bartlett+, 2006).

Surrogate losses

: Label: Prediction function: Feature vector

: Margin

6

Page 7: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project

Given: Two sets of corrupted data:

Clean:

Learning from corrupted labels

Positive:

Negative:

7

Class priors

Positive-unlabeled:

(Scott+, 2013, Menon+, 2015, Lu+, 2019)

This setting covers many weakly-supervised settings (Lu+, 2019).

(du Plessis+, 2014)

Page 8: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project

Given: Two sets of corrupted data:

Assumption:

Problem: are unidentifiable from samples (Scott+, 2013).

How to learn without estimating ?

8Issue on class priors

Positive:

Negative:

Page 9: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project

Classification error:

Balanced error rate (BER):

Area under the receiver operating characteristic curve (AUC) risk:

9Related work:

Class priors are needed! (Lu+, 2019)

Class priors are not needed! (Menon+, 2015)

Page 10: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project

Menon+, 2015: we can treat corrupted data as if they were clean.

Related work: BER and AUC optimization10

Squared loss was used in experiments.

van Rooyen+, 2015: symmetric losses are also useful for BERminimization (no experiments).

The proof relies on a property of 0-1 loss.

Ours: using symmetric loss is preferable for both BER and AUC theoretically and experimentally!

Page 11: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project

Contents

• Background and related work

• The importance of symmetric losses

• Theoretical properties of symmetric losses

• Barrier hinge loss

• Experiments

11

Page 12: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project

Robustness under symmetric noise (label flip with a fixed probability)

Symmetric losses12

Risk estimator simplification in weakly-supervised learning(du Plessis+, 2014, Kiryo+, 2017, Lu+, 2018)

(Ghosh+, 2015, van Rooyen+, 2015)

Applications:

Page 13: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project

Symmetric losses:

AUC maximization13

Excessive terms become constant!

Excessive terms can be safely ignored with symmetric losses

1.

Corrupted risk Clean risk

Page 14: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project

Symmetric losses:

BER minimization14

Excessive term becomes constant!

Corrupted risk Clean risk

Excessive terms can be safely ignored with symmetric losses

Coincides with van Rooyen 2015+

Page 15: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project

Contents

• Background and related work

• The importance of symmetric losses

• Theoretical properties of symmetric losses

• Barrier hinge loss

• Experiments

15

Page 16: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project

Theoretical properties of symmetric losses

Nonnegative symmetric losses are non-convex.• Theory of convex losses cannot be applied.

16

We provide a better understanding of symmetric losses:• Necessary and sufficient condition for classification-calibration• Excess risk bound in binary classification• Inability to estimate class posterior probability• A sufficient condition for AUC-consistency➢ Covers many symmetric losses, e.g., sigmoid, ramp.

(du Plessis+, 2014, Ghosh+, 2015)

Well-known symmetric losses, e.g., sigmoid, ramp are classification-calibrated and AUC-consistent!

Page 17: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project

Contents

• Background and related work

• The importance of symmetric losses

• Theoretical properties of symmetric losses

• Barrier hinge loss

• Experiments

17

Page 18: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project

Convex symmetric losses?By sacrificing nonnegativity:

only unhinged loss is convex and symmetric (van Rooyen+, 2015).

18

This loss has been considered (although robustness was not discussed).(Devroye+, 1996, Schoelkopf+, 2002, Shawe-Taylor+, 2004, Sriperumbudur+, 2009, Reid+, 2011)

Page 19: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project

slope of the non-symmetric region.

width of symmetric region.

High penalty if misclassify or output is outside symmetric region.

Barrier hinge loss19

Page 20: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project

Symmetricity of barrier hinge loss

Satisfies symmetric property in an interval.

20

If output range is restricted in a symmetric region:

unhinged, hinge , barrier are equivalent.

Page 21: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project

Contents

• Background and related work

• The importance of symmetric losses

• Theoretical properties of symmetric losses

• Barrier hinge loss

• Experiments

21

Page 22: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project

Experiments: BER/AUC optimization from corrupted labels22

To empirically answer the following questions:

1. Does the symmetric condition significantly help?

2. Do we need a loss to be symmetric everywhere?

3. Does the negative unboundedness degrade the practical performance?

We conducted the following experiments: Fix the models, vary the loss functions

Losses: Barrier [s=200, w=50], Unhinged, Sigmoid, Logistic, Hinge, Squared, Savage

Experiment 1:

MLPs on UCI/LIBSVM datasets.

Experiment 2:

CNNs on more difficult datasets (MNIST, CIFAR-10).

Page 23: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project

Experiments: BER/AUC optimization from corrupted labels23

For UCI datasets:

Multilayered perceptrons (MLPs) with one hidden layer: [d-500-1]

Activation function: Rectifier Linear Units (ReLU) (Nair+, 2010)

MNIST and CIFAR-10:

Convolutional neural networks (CNNs):

[d-Conv[18,5,1,0]-Max[2,2]-Conv[48,5,1,0]-Max[2,2]-800-400-1]

ReLU after fully connected layer follows by dropout layer (Srivastava+, 2010)

MNIST: Odd numbers vs Even numbers

CIFAR: One class vs Airplane (follows Ishida+, 2017)

Conv[18, 5, 1 , 0]: 18 channels, 5 x 5 convolutions, stride 1, padding 0Max[2,2]: max pooling with kernel size 2 and stride 2

Page 24: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project

Experiment 1: MLPs on UCI/LIBSVM datasets24

Dataset information and more experiments and can be found in our paper.

The higher the better.

Page 25: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project

Experiment 1: MLPs on UCI/LIBSVM datasets25

Symmetric losses and barrier hinge loss are preferable!

The higher the better.

Page 26: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project

Experiment 2: CNNs on MNIST/CIFAR-10 26

Page 27: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project

Conclusion

We showed that symmetric loss is preferable under corrupted labels for:• Area under the receiver operating characteristic curve (AUC) maximization

• Balanced error rate (BER) minimization

We provided general theoretical properties for symmetric losses:• Classification-calibration, excess risk bound, AUC-consistency

• Inability of estimating the class posterior probability

We proposed a barrier hinge loss:• As a proof of concept of the importance of symmetric condition

• Symmetric only in an interval but benefits greatly from symmetric condition• Significantly outperformed all losses in BER/AUC optimization using CNNs

27Poster#135: 6:30-9:00PM