debugging the machine learning...

28
Debugging the Machine Learning Pipeline Jerry Zhu University of Wisconsin-Madison joint work with Xuezhou Zhang, Stephen Wright Interpretable ML Symposium, NIPS 2017

Upload: others

Post on 15-Mar-2020

17 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical

Debugging the Machine Learning Pipeline

Jerry Zhu

University of Wisconsin-Madison

joint work with Xuezhou Zhang, Stephen WrightInterpretable ML Symposium, NIPS 2017

Page 2: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical

Debugging provides an opportunity for machine learninginterpretability.

Page 3: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical

Harry Potter toy example

Page 4: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical

Hired by the Ministry of Magic?

+ yes

o no

0 0.5 1magical heritage

0

0.5

1ed

ucat

ion

Page 5: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical

Data contain historical biases

Learned vs. ideal decision boundary

0 0.5 1magical heritage

0

0.5

1

educ

atio

n

(RBF kernel logistic regression)

Page 6: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical

Trusted items

I obtained by expensive vetting

I insufficient to learn from

0 0.5 1magical heritage

0

0.5

1ed

ucat

ion

Page 7: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical

Debugging using trusted items

I propose training label bugs

I flipping them makes re-trained model agree with trusted items

0 0.5 1

magical heritage

0

0.5

1ed

uca

tio

n

Page 8: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical

Proposed bugs

I given to experts to interpret

0 0.5 1

magical heritage

0

0.5

1

educa

tion

Page 9: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical

The ML pipeline

data (X,Y ) → learner ` → parameters λ → model θ

θ = argminθ∈Θ

`(X,Y, θ) + λ‖θ‖

Page 10: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical

Postconditions

Ψ(θ)

Examples:

I “the learned model must correctly predict an important item(x, y)”

θ(x) = y

I “the learned model must satisfy individual fairness”

∀x, x′, |p(y = 1 | x, θ)− p(y = 1 | x′, θ)| ≤ L‖x− x′‖

Page 11: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical

Bug Assumptions

I Ψ satisfied if we were to train through “clean pipeline”

I bugs are changes to the clean pipeline

I Ψ violated on the dirty pipeline

Page 12: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical

This is not our goal

Just to learn a better model:

minθ∈Θ

`(X,Y, θ) + λ‖θ‖

s.t. Ψ(θ) = true

Page 13: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical

This is our goalTo identify bugs and fix them (and learn a better model):

minY ′,θ

‖Y − Y ′‖

s.t. Ψ(θ) = true

θ = argminθ∈Θ

`(X,Y ′, θ) + λ‖θ‖

Page 14: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical

Special case: bugs in training labels

I Ψ satisfied if we were to train on “clean data” (X,Y ′)

I bugs are changes to clean labels

(X,Y ) = (X,Y ′ + ∆)

I not just about outliersI may contain systematic biases

Page 15: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical

Input / output to our debugger

Input:

1. dirty training set (X,Y )

2. trusted items (X, Y )

3. the learner

Output:

1. Y ′

2. confidence

Page 16: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical

Formulation equivalent to machine teaching

minY ′

‖Y ′ − Y ‖

s.t. θ(X) = Y

θ = argminθ∈Θ

1

n

n∑i=1

`(xi, y′i, θ) + λ‖θ‖2

Difficult!

I combinatorial

I bilevel optimization (Stackelberg game)

[Dec. 9 Workshop on Teaching Machines, Robots, and Humans]

Page 17: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical

Combinatorial to continuous relaxation

step 1. label to probability simplex

y′i → δi ∈ ∆

step 2. counting to probability mass

‖Y ′ − Y ‖ → 1

n

n∑i=1

(1− δi,yi)

step 3. soften postcondition

θ(X) = Y → 1

m

m∑i=1

`(xi, yi, θ)

Page 18: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical

Continuous now, but still bilevel

argminδ∈∆n,θ

1

m

m∑i=1

`(xi, yi, θ) + γ1

n

n∑i=1

(1− δi,yi)

s.t. θ = argminθ

1

n

n∑i=1

k∑j=1

δij`(xi, j, θ) + λ‖θ‖2

Page 19: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical

Removing the lower level problem

θ = argminθ

1

n

n∑i=1

k∑j=1

δij`(xi, j, θ) + λ‖θ‖2

step 1. the KKT condition

1

n

n∑i=1

k∑j=1

δij∇θ`(xi, j, θ) + 2λθ = 0

step 2. plug implicit function θ(δ) into upper level problem

argminδ

1

m

m∑i=1

`(xi, yi, θ(δ)) + γ1

n

n∑i=1

(1− δi,yi)

step 3. compute gradient ∇δ with implicit function theorem

Page 20: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical

Software available.

Page 21: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical

Harry Potter Toy Example

0 0.5 1

magical heritage

0

0.5

1

educa

tion

0 0.5 1

magical heritage

0

0.5

1

educa

tion

0 0.5 1

magical heritage

0

0.5

1

educa

tion

data our debugger influence function

0 0.5 1

magical heritage

0

0.5

1

educa

tion

0 0.5 1

magical heritage

0

0.5

1ed

uca

tion

0 0.5 1

Recall

0

0.5

1

Pre

cis

ion

DUTI

INF

NN

LND (Oracle)

nearest neighbor label noise detection average PR

Page 22: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical

Another special case: bug in regularization weight

0 20 40 60 80 100

score

0

0.5

1

pass

/fail

(logistic regression)

Page 23: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical

Postcondition violated

Ψ(θ): Individual fairness (Lipschitz condition)

∀x, x′, |p(y = 1 | x, θ)− p(y = 1 | x′, θ)| ≤ L‖x− x′‖

Page 24: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical

Bug assumption

Learner’s regularization weight λ = 0.001 was inappropriate

θ = argminθ∈Θ

`(X,Y, θ) + λ‖θ‖2

Page 25: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical

Debugging formulation

minλ′,θ

(λ′ − λ)2

s.t. Ψ(θ) = true

θ = argminθ∈Θ

`(X,Y, θ) + λ′‖θ‖2

Page 26: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical

Suggested bug

λ = 0.001→ λ′ = 121

0 20 40 60 80 100

score

0

0.5

1

pass

/fail

Page 27: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical

Call for ML bug repository

I like software bug repositories in software engineeringI need data provenance

I which training items (or other things) were wrongI what they should be

Page 28: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical

References

I Xuezhou Zhang, Xiaojin Zhu, and Stephen Wright. Trainingset debugging using trusted items. AAAI 2018

I Gabriel Cadamuro, Ran Gilad-Bachrach, and Xiaojin Zhu.Debugging machine learning models. ICML Workshop onReliable Machine Learning in the Wild, 2016.

I Shalini Ghosh, Patrick Lincoln, Ashish Tiwari, and XiaojinZhu. Trusted machine learning for probabilistic models. –

I http://www.cs.wisc.edu/~jerryzhu/machineteaching/