![Page 1: Rizal Fathony - AP-Perf: Incorporating Generic Performance … · AP-Perf: Incorporating Generic Performance Metrics in Differentiable Learning RIZAL FATHONY AND ZICO KOLTER (1),(2)](https://reader035.vdocuments.site/reader035/viewer/2022071412/6109d609c833e256bf29dad0/html5/thumbnails/1.jpg)
AP-Perf: Incorporating Generic Performance Metrics in Differentiable Learning
RIZAL FATHONY AND ZICO KOLTER (1),(2)
( 1 ) C a r n e g i e M e l o n U n i v e rs i t y( 2 ) B o s c h C e n t e r fo r A I
1
![Page 2: Rizal Fathony - AP-Perf: Incorporating Generic Performance … · AP-Perf: Incorporating Generic Performance Metrics in Differentiable Learning RIZAL FATHONY AND ZICO KOLTER (1),(2)](https://reader035.vdocuments.site/reader035/viewer/2022071412/6109d609c833e256bf29dad0/html5/thumbnails/2.jpg)
2
Highlight of our paper
A generic framework and programming toolsthat enable practitioners to easily
align training objective with the evaluation metric
![Page 3: Rizal Fathony - AP-Perf: Incorporating Generic Performance … · AP-Perf: Incorporating Generic Performance Metrics in Differentiable Learning RIZAL FATHONY AND ZICO KOLTER (1),(2)](https://reader035.vdocuments.site/reader035/viewer/2022071412/6109d609c833e256bf29dad0/html5/thumbnails/3.jpg)
3
Integration with ML PipelinesEasily incorporates custom performance metricsinto machine learning pipeline
Leaning using binary cross entropy
Leaning using AP formulation for F2-metric
F𝛽 scoredefinition
*) Code in PyTorch
![Page 4: Rizal Fathony - AP-Perf: Incorporating Generic Performance … · AP-Perf: Incorporating Generic Performance Metrics in Differentiable Learning RIZAL FATHONY AND ZICO KOLTER (1),(2)](https://reader035.vdocuments.site/reader035/viewer/2022071412/6109d609c833e256bf29dad0/html5/thumbnails/4.jpg)
Motivation
4
![Page 5: Rizal Fathony - AP-Perf: Incorporating Generic Performance … · AP-Perf: Incorporating Generic Performance Metrics in Differentiable Learning RIZAL FATHONY AND ZICO KOLTER (1),(2)](https://reader035.vdocuments.site/reader035/viewer/2022071412/6109d609c833e256bf29dad0/html5/thumbnails/5.jpg)
5
Evaluation Metrics
Example: Digit Recognition Evaluation Metric:
Performance Metric: Accuracy
Accuracy = # correct prediction
# sample
Loss Metric: Zero-One Loss
Zero-One Loss = # incorrect prediction
# sample
Most widely used metric!
![Page 6: Rizal Fathony - AP-Perf: Incorporating Generic Performance … · AP-Perf: Incorporating Generic Performance Metrics in Differentiable Learning RIZAL FATHONY AND ZICO KOLTER (1),(2)](https://reader035.vdocuments.site/reader035/viewer/2022071412/6109d609c833e256bf29dad0/html5/thumbnails/6.jpg)
6
Accuracy metric is not always desirableExample: Disease Prediction(imbalanced dataset)
98% of the samples: healthy (negative samples)
2% of the samples: have disease (positive samples)
Predict all samples as negative:Accuracy metric: 98%
Confusion Matrix
Precision = # true positive
# predicted positiveRecall =
# true positive# actual positive
Specificity = # true negative# actual negative
Sensitivity = # true positive# actual positive
F1-score = 2 ∙ precision ∙ recall
precision + recall
Fβ-score = (1 + β2) ∙ precision ∙ recall(β2 ∙ precision )+ recall
![Page 7: Rizal Fathony - AP-Perf: Incorporating Generic Performance … · AP-Perf: Incorporating Generic Performance Metrics in Differentiable Learning RIZAL FATHONY AND ZICO KOLTER (1),(2)](https://reader035.vdocuments.site/reader035/viewer/2022071412/6109d609c833e256bf29dad0/html5/thumbnails/7.jpg)
Learning Tasks & Evaluation Metrics
7
Machine Learning Tasks Popular Evaluation Metrics
Imbalanced Datasets - F1-Score- Area under ROC Curve (AUC)- Precision vs Recall
Medical classification tasks - Specificity- Sensitivity- Bookmaker Informedness
Information retrieval tasks - Precision@k- Mean Average Precision (MAP)- Discounted cumulative gain (DCG)
Weighted classification tasks - Cost-sensitive loss metric
Rating tasks - Cohen’s kappa score- Fleiss' kappa score
Computational biology tasks - Precision-Recall curve- Matthews correlation coefficient (MCC)
![Page 8: Rizal Fathony - AP-Perf: Incorporating Generic Performance … · AP-Perf: Incorporating Generic Performance Metrics in Differentiable Learning RIZAL FATHONY AND ZICO KOLTER (1),(2)](https://reader035.vdocuments.site/reader035/viewer/2022071412/6109d609c833e256bf29dad0/html5/thumbnails/8.jpg)
8
Evaluation Metric vs Training Model Mismatch
EvaluationMetric
MLmodel
Trainingobjective
Example: Disease prediction
Optimize specificity & sensitivity
Most of ML models:- No support for specificity & sensitivity metric- Optimize the cross-entropy objective
(a proxy for accuracy metric)
vs
Discrepancy:Evaluation metrics vs training objective
Inferior performance results(Cortes & Mohri, 2004; Eban et.al, 2016)
![Page 9: Rizal Fathony - AP-Perf: Incorporating Generic Performance … · AP-Perf: Incorporating Generic Performance Metrics in Differentiable Learning RIZAL FATHONY AND ZICO KOLTER (1),(2)](https://reader035.vdocuments.site/reader035/viewer/2022071412/6109d609c833e256bf29dad0/html5/thumbnails/9.jpg)
9
Our paper
A generic framework and programming toolsthat enable practitioners to easily
align training objective with the evaluation metric
![Page 10: Rizal Fathony - AP-Perf: Incorporating Generic Performance … · AP-Perf: Incorporating Generic Performance Metrics in Differentiable Learning RIZAL FATHONY AND ZICO KOLTER (1),(2)](https://reader035.vdocuments.site/reader035/viewer/2022071412/6109d609c833e256bf29dad0/html5/thumbnails/10.jpg)
Related Works
10
![Page 11: Rizal Fathony - AP-Perf: Incorporating Generic Performance … · AP-Perf: Incorporating Generic Performance Metrics in Differentiable Learning RIZAL FATHONY AND ZICO KOLTER (1),(2)](https://reader035.vdocuments.site/reader035/viewer/2022071412/6109d609c833e256bf29dad0/html5/thumbnails/11.jpg)
11
Evaluation Metrics
Decomposable Metrics Non-Decomposable Metrics
Can be decomposed into sample-wise sum
Example: accuracy, ordinal regression, and cost-sensitive metrics.
Cannot be decomposed into sample-wise sum
Example: F1-score, GPR, informedness, MCC, Kappa score.
Common in many applications
![Page 12: Rizal Fathony - AP-Perf: Incorporating Generic Performance … · AP-Perf: Incorporating Generic Performance Metrics in Differentiable Learning RIZAL FATHONY AND ZICO KOLTER (1),(2)](https://reader035.vdocuments.site/reader035/viewer/2022071412/6109d609c833e256bf29dad0/html5/thumbnails/12.jpg)
12
Learning Algorithm Design
Empirical Risk Minimization Framework: Approximate the evaluation metrics (discrete, non-continuous) with convex surrogate losses.
Binary classification | accuracy
Evaluation Metric:
Accuracy metric
Non-decomposable metrics
SVM-based model: SVM-perf (Joachims, 2005)
No statistical consistency guarantee
Does not provide easy tool to extend the method to custom metrics
Works on many complex metrics
Most of other models: (e.g.: Koyejo et al, 2014; Narashiman et al, 2014)
Hard to extend to custom metrics
Convex Surrogate Losses
Hinge Loss :: Support Vector Machine
Log Loss :: Logistic Regression
Exponential Loss :: AdaBoost
![Page 13: Rizal Fathony - AP-Perf: Incorporating Generic Performance … · AP-Perf: Incorporating Generic Performance Metrics in Differentiable Learning RIZAL FATHONY AND ZICO KOLTER (1),(2)](https://reader035.vdocuments.site/reader035/viewer/2022071412/6109d609c833e256bf29dad0/html5/thumbnails/13.jpg)
13
Neural Networks LearningCurrently the popular machine learning model.
Classification with accuracy metric
Objective: Cross entropy objective
=
Non-decomposable metrics
Most of ‘classical’ models:
Not applicable to NN learning
NN-targeted models: (Eban et.al, 2016; Song et.al, 2016; Sanyal, et.al, 2018)
Only support few metrics
No support for custom metrics
Use the classical surrogate losses as the last layer (objective).
Logistic regression (log-loss surrogate)
Practitioners’ perspective
Aim to optimize an evaluation metric tailored specifically for their problem.(e.g. specificity, sensitivity, kappa score)
No learning models can easily optimize their specific evaluation metrics.
Choose the standard cross entropy instead
Mismatch betweenEvaluation Metric vs Training Model
![Page 14: Rizal Fathony - AP-Perf: Incorporating Generic Performance … · AP-Perf: Incorporating Generic Performance Metrics in Differentiable Learning RIZAL FATHONY AND ZICO KOLTER (1),(2)](https://reader035.vdocuments.site/reader035/viewer/2022071412/6109d609c833e256bf29dad0/html5/thumbnails/14.jpg)
Approach
14
![Page 15: Rizal Fathony - AP-Perf: Incorporating Generic Performance … · AP-Perf: Incorporating Generic Performance Metrics in Differentiable Learning RIZAL FATHONY AND ZICO KOLTER (1),(2)](https://reader035.vdocuments.site/reader035/viewer/2022071412/6109d609c833e256bf29dad0/html5/thumbnails/15.jpg)
Adversarial Prediction (Asif et.al, 2016; Fathony et.al, 2018)
Optimize Original Evaluation Metric
Discrete, Intractable
Empirical Risk Minimization with Convex Surrogate Loss
Convex, Tractable
Approximate the metric
Exact training data
Adversarial Prediction(Asif et.al ’16; Fathony et.al, ‘18)
Convex, Tractable
Exact evaluation metric
Approximate training data
More complex metric →Harder to construct good surrogate losses
No need to independently construct surrogate loss for every metric
Empirical Risk Minimization
Adversarial Prediction
![Page 16: Rizal Fathony - AP-Perf: Incorporating Generic Performance … · AP-Perf: Incorporating Generic Performance Metrics in Differentiable Learning RIZAL FATHONY AND ZICO KOLTER (1),(2)](https://reader035.vdocuments.site/reader035/viewer/2022071412/6109d609c833e256bf29dad0/html5/thumbnails/16.jpg)
Our MethodOPTIMIZING GENERIC NON-DECOMPOSABLE METRICS
16
![Page 17: Rizal Fathony - AP-Perf: Incorporating Generic Performance … · AP-Perf: Incorporating Generic Performance Metrics in Differentiable Learning RIZAL FATHONY AND ZICO KOLTER (1),(2)](https://reader035.vdocuments.site/reader035/viewer/2022071412/6109d609c833e256bf29dad0/html5/thumbnails/17.jpg)
Non-Decomposable Metric
AP | Decomposable metric (Asif et.al, 2015; Fathony et.al, 2016, 2017, 2018) :
Example: Binary Classification with F1-score metric
17
AP | Non-decomposable metric:
Size: 2 binary x # sample
Size: 2𝑛 (exponential)
Marginalization technique: optimize over marginalization distribution instead:
Size: 2𝑛Original:
Size: 𝑛2Marginalization:
Intractable!Tractable!
Reduces to an optimization over sample-wise conditional probability distributions.
Requires optimization over full training setconditional probability distribution
![Page 18: Rizal Fathony - AP-Perf: Incorporating Generic Performance … · AP-Perf: Incorporating Generic Performance Metrics in Differentiable Learning RIZAL FATHONY AND ZICO KOLTER (1),(2)](https://reader035.vdocuments.site/reader035/viewer/2022071412/6109d609c833e256bf29dad0/html5/thumbnails/18.jpg)
18
Generic Non-Decomposable Performance MetricsMore complex performance metric
Cover a vast range of performance metric families Including most common use cases of non-decomposable metrics: Precision, Recall, F𝛽-score, Balanced Accuracy, Specificity, Sensitivity, Informednes, Markedness, MCC, Kappa score, etc…
Marginalization technique:
Size: 2𝑛Original:
Size: 2𝑛2Marginalization:
Intractable! Tractable!
Practitioners can define their novel custom metricsMetrics that specifically targeted to their novel problems.
&
Optimization: Gradient Descent + an ADMM-based solver (inner optimization)
![Page 19: Rizal Fathony - AP-Perf: Incorporating Generic Performance … · AP-Perf: Incorporating Generic Performance Metrics in Differentiable Learning RIZAL FATHONY AND ZICO KOLTER (1),(2)](https://reader035.vdocuments.site/reader035/viewer/2022071412/6109d609c833e256bf29dad0/html5/thumbnails/19.jpg)
19
Integration with ML PipelinesEasily incorporates custom performance metricsinto machine learning pipeline
Leaning using binary cross entropy
Leaning using AP formulation for F2-metric
F𝛽 scoredefinition
*) Code in PyTorch
![Page 20: Rizal Fathony - AP-Perf: Incorporating Generic Performance … · AP-Perf: Incorporating Generic Performance Metrics in Differentiable Learning RIZAL FATHONY AND ZICO KOLTER (1),(2)](https://reader035.vdocuments.site/reader035/viewer/2022071412/6109d609c833e256bf29dad0/html5/thumbnails/20.jpg)
20
AP-Perf: supports a wide variety of evaluation metrics
Code examples for other performance metrics:
Geometric Mean of Precision and Recall (GPR) Cohen’s Kappa score
![Page 21: Rizal Fathony - AP-Perf: Incorporating Generic Performance … · AP-Perf: Incorporating Generic Performance Metrics in Differentiable Learning RIZAL FATHONY AND ZICO KOLTER (1),(2)](https://reader035.vdocuments.site/reader035/viewer/2022071412/6109d609c833e256bf29dad0/html5/thumbnails/21.jpg)
21
Novel Custom Metrics
Write-your-own Novel Metrics
Example:a weighted modification to the Cohen’s Kappa score and the Mathews correlation coefficient (MCC)
![Page 22: Rizal Fathony - AP-Perf: Incorporating Generic Performance … · AP-Perf: Incorporating Generic Performance Metrics in Differentiable Learning RIZAL FATHONY AND ZICO KOLTER (1),(2)](https://reader035.vdocuments.site/reader035/viewer/2022071412/6109d609c833e256bf29dad0/html5/thumbnails/22.jpg)
22
Empirical Results
Datasets:20 UCI Datasets,MNIST, Fashion MNIST
Neural Networks:Multi Layer Perceptron, Convolutional NN
Performance Metrics:1) Accuracy2) F1 score3) F2 score4) Geom. Prec. Rec. (GPR)5) Mathews Cor. Coef. (MCC)6) Cohen’s Kappa score
![Page 23: Rizal Fathony - AP-Perf: Incorporating Generic Performance … · AP-Perf: Incorporating Generic Performance Metrics in Differentiable Learning RIZAL FATHONY AND ZICO KOLTER (1),(2)](https://reader035.vdocuments.site/reader035/viewer/2022071412/6109d609c833e256bf29dad0/html5/thumbnails/23.jpg)
23
Summary
Support Neural Network
Learning
Easy Interface for Practitioners
(to optimize custom metrics)
SVM-Perf (Joachim, 2005)
Plug-in based classifiers(Koyejo et al, 2014; Narashiman et al, 2014)
AP-Perf(Fathony & Kolter, our method)
Support Custom Metrics
StatisticalConsistency
Global objectives(Eban et al, 2014)
DAME & DUPLE(Sanyal et al, 2018)
![Page 24: Rizal Fathony - AP-Perf: Incorporating Generic Performance … · AP-Perf: Incorporating Generic Performance Metrics in Differentiable Learning RIZAL FATHONY AND ZICO KOLTER (1),(2)](https://reader035.vdocuments.site/reader035/viewer/2022071412/6109d609c833e256bf29dad0/html5/thumbnails/24.jpg)
24
Download
pip install ap_perf ]add AdversarialPredictioninstall: install:
https://github.com/rizalzaf/ap_perf https://github.com/rizalzaf/AdversarialPrediction.jl
github: github:
http://proceedings.mlr.press/v108/fathony20a.html
![Page 25: Rizal Fathony - AP-Perf: Incorporating Generic Performance … · AP-Perf: Incorporating Generic Performance Metrics in Differentiable Learning RIZAL FATHONY AND ZICO KOLTER (1),(2)](https://reader035.vdocuments.site/reader035/viewer/2022071412/6109d609c833e256bf29dad0/html5/thumbnails/25.jpg)
Thank You
25