the mythos of model interpretability -...
TRANSCRIPT
The Mythos of Model Interpretability
Zachary C. Lipton
https://arxiv.org/abs/1606.03490
Outline
• What is interpretability?
• What are its desiderata?
• What model properties confer interpretability?
• Caveats, pitfalls, and takeaways
What is Interpretability?
• Many papers make axiomatic claims that some model is interpretable and therefore preferable
• But what interpretability is and precisely what desiderata it serves are seldom defined
• Does interpretability hold consistent meaning across papers?
Inconsistent Definitions
• Papers use the words interpretable, explainable, intelligible, transparent, and understandable, both interchangeably (within papers) and inconsistently (across papers)
• One common thread, however, is that interpretability is something other than performance
We want good models
Evaluation Metric
We also want interpretable models
Evaluation Metric
Interpretation
The Human Wants Something the Metric Doesn’t
Evaluation Metric
Interpretation
What Gives?• So either the metric captures everything and people
seeking interpretable models are crazy or…
• The metric / loss functions we optimize are fundamentally mismatched from real life objectives
• We hope to refine the discourse on interpretability, introducing more specific language
• Through the lens of the literature, we create a taxonomy of both objectives & methods
Outline
• What is interpretability?
• What are its desiderata?
• What model properties confer interpretability?
• Caveats, pitfalls, and takeaways
Trust
• Does the model know when it’s uncertain?
• Does the model make same mistakes as human?
• Are we comfortable with the model?
Causality• We may want models to
tell us something about the natural world
• Supervised models are trained simply to make predictions, but often used to take actions
• Caruana (2015) shows a mortality predictor (for use in triage) that assigns lower risk to asthma patients
Transferability• The idealized training
setups often differ from real world
• Real problem may be non-stationary, noisier, etc.
• Want sanity-checks that the model doesn’t depend on weaknesses in setup
Informativeness• We may train a model
to make a decision
• But it’s real purpose is to aid a person in making a decision
• Thus an interpretationmay simply be valuable for the extra bits it carries
Outline
• What is interpretability?
• What are its desiderata?
• What model properties confer interpretability?
• Caveats, pitfalls, and takeaways
Transparency
• Proposed solutions conferring interpretability tend to fall into two categories
• Transparency addresses understanding how the model works
• Explainability concerns the model’s ability to offer some (potentially post-hoc) explanation
Simulatability• One notion of transparency
is simplicity
• This accords with papers advocating small decision trees
• A model is transparent if a person can step through the algorithm in reasonable time
Decomposability
• A relaxed notion requires understanding individualcomponents of a model
• Such as: weights of a linear model or the nodes of a decision tree
Transparent Algorithms
• A yet weaker notion would require onlythat we understand the behavior algorithm
• E.g. convergence of convex optimizations, generalization bounds
Post-Hoc Interpretability
Ah yes, something cool is happening in node 750,345,167… maybe it sees a cat?
Maybe we’ll see something awesome if we jiggle the inputs?
Verbal Explanations• Just as people generate
explanations (absent transparency), we might train a (possibly separate)model to generate explanations
• We might consider image captions as interpretations of object predictions
(Image: Karpathy et al 2015)
Saliency Maps
• While the full relationship between input and output might be impossible to describe succinctly, local explanations are potentially useful.
(Image: Wang et al 2016)
Case-Based Explanations• Another way to generate
a post-hoc explanation might be to retrieve labeled items that are deemed similar by the model
• For some models, we can retrieve histories from similar patients
(Image: Mikolov et al 2014)
Outline
• What is interpretability?
• What are its desiderata?
• What model properties confer interpretability?
• Caveats, pitfalls, and takeaways
Discussion Points
• Linear models not strictly more interpretable than deep learning
• Claims about interpretability must be qualified
• Transparency may be at odds with the goals of AI
• Post-hoc interpretations may potentially mislead
Thanks!Acknowledgments:Zachary C. Lipton was supported by the Division of Biomedical Informatics at UCSD, via training grant (T15LM011271) from the NIH/NLM. Thanks to Charles Elkan, Julian McAuley, David Kale, Maggie Makar, Been Kim, Lihong Li, Rich Caruana, Daniel Fried, Jack Berkowitz, & Sepp Hochreiter
References:The Mythos of Model Interpretability (ICML Workshop on Human Interpretability 2016) - ZC Liptonhttp://arxiv.org/abs/1511.03677
Directly Modeling Missing Data with RNNs (MLHC 2016) - ZC Lipton, DC Kale, R Wetzelhttp://arxiv.org/abs/1606.04130
Learning to Diagnose (ICLR 2016) - ZC Lipton, DC Kale, Charles Elkan, R Wetzelhttp://arxiv.org/abs/1511.03677
Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. (2015) - R Caruana et alhttp://dl.acm.org/citation.cfm?id=2788613