Abstract
As machine learning systems become ubiquitous, there has been a surge of interest in interpretable machine learning: systems that provide explanation for their outputs. These explanations are often used to qualitatively assess other criteria such as safety or non-discrimination. However, despite the interest in interpretability, there is little consensus on what interpretable machine learning is and how it should be measured and evaluated. In this paper, we discuss a definitions of interpretability and describe when interpretability is needed (and when it is not). Finally, we talk about a taxonomy for rigorous evaluation, and recommendations for researchers. We will end with discussing open questions and concrete problems for new researchers.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Merriam-Webster dictionary, accessed 2017-02-07.
References
Adler P, Falk C, Friedler SA, Rybeck G, Scheidegger C, Smith B, Venkatasubramanian S (2016) Auditing black-box models for indirect influence. In: Data Mining (ICDM), 2016 IEEE 16th International Conference on, IEEE, pp 1–10
Allahyari H, Lavesson N (2011) User-oriented assessment of classification model understandability. In: 11th scandinavian conference on Artificial intelligence, IOS Press
Amodei D, Olah C, Steinhardt J, Christiano P, Schulman J, Mané D (2016) Concrete problems in AI safety. arXiv preprint arXiv:160606565
Antunes P, Herskovic V, Ochoa SF, Pino JA (2012) Structuring dimensions for collaborative systems evaluation. In: ACM Computing Surveys, ACM
Bechtel W, Abrahamsen A (2005) Explanation: A mechanist alternative. Studies in History and Philosophy of Science Part C: Studies in History and Philosophy of Biological and Biomedical Sciences
Bostrom N, Yudkowsky E (2014) The ethics of artificial intelligence. The Cambridge Handbook of Artificial Intelligence
Buciluǎ C, Caruana R, Niculescu-Mizil A (2006) Model compression. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM
Bussone A, Stumpf S, O’Sullivan D (2015) The role of explanations on trust and reliance in clinical decision support systems. In: Healthcare Informatics (ICHI), 2015 International Conference on, IEEE, pp 160–169
Carton S, Helsby J, Joseph K, Mahmud A, Park Y, Walsh J, Cody C, Patterson CE, Haynes L, Ghani R (2016) Identifying police officers at risk of adverse events. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM
Caruana R, Lou Y, Gehrke J, Koch P, Sturm M, Elhadad N (2015) Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp 1721–1730
Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Computers & Electrical Engineering 40(1):16–28
Chang J, Boyd-Graber JL, Gerrish S, Wang C, Blei DM (2009) Reading tea leaves: How humans interpret topic models. In: NIPS
Chater N, Oaksford M (2006) Speculations on human causal learning and reasoning. Information sampling and adaptive cognition
Doshi-Velez F, Ge Y, Kohane I (2014) Comorbidity clusters in autism spectrum disorders: an electronic health record time-series analysis. In: Pediatrics, Am Acad Pediatrics, vol 133:1, pp e54–e63
Doshi-Velez F, Wallace B, Adams R (2015) Graph-sparse lda: a topic model with structured sparsity. In: Association for the Advancement of Artificial Intelligence
Dwork C, Hardt M, Pitassi T, Reingold O, Zemel R (2012) Fairness through awareness. In: Innovations in Theoretical Computer Science Conference, ACM
Elomaa T (2017) In defense of c4. 5: Notes on learning one-level decision trees. ML-94 254:62
Freitas A (2014) Comprehensible classification models: a position paper. In: ACM SIGKDD Explorations
Glennan S (2002) Rethinking mechanistic explanation. Philosophy of science
Goodman B, Flaxman S (2016) European union regulations on algorithmic decision-making and a “right to explanation”. arXiv preprint arXiv:160608813
Gupta M, Cotter A, Pfeifer J, Voevodski K, Canini K, Mangylov A, Moczydlowski W, Van Esbroeck A (2016) Monotonic calibrated interpolated look-up tables. In: Journal of Machine Learning Research
Hamill S (2017) CMU computer won poker battle over humans by statistically significant margin. http://www.post-gazette.com/business/tech-news/2017/01/31/CMU-computer-won-poker-battle-over-humans-by-statistically-significant-margin/stories/201701310250, accessed: 2017-02-07
Hardt M, Talwar K (2010) On the geometry of differential privacy. In: ACM Symposium on Theory of Computing, ACM
Hardt M, Price E, Srebro N (2016) Equality of opportunity in supervised learning. In: Advances in Neural Information Processing Systems
Hayete B, Bienkowska JR (2004) Gotrees: Predicting go associations from proteins. Biocomputing 2005 p 127
Hempel C, Oppenheim P (1948) Studies in the logic of explanation. Philosophy of science
Hughes MC, Elibol HM, McCoy T, Perlis R, Doshi-Velez F (2016) Supervised topic models for clinical interpretability. In: arXiv preprint arXiv:1612.01678
Huysmans J, Dejaeger K, Mues C, Vanthienen J, Baesens B (2011) An empirical evaluation of the comprehensibility of decision table, tree and rule based predictive models. In: DSS, Elsevier
Keil F (2006) Explanation and understanding. Annu Rev Psychol
Keil F, Rozenblit L, Mills C (2004) What lies beneath? understanding the limits of understanding. Thinking and seeing: Visual metacognition in adults and children
Kim B, Chacha C, Shah J (2013) Inferring robot task plans from human team meetings: A generative modeling approach with logic-based prior. Association for the Advancement of Artificial Intelligence
Kim B, Rudin C, Shah J (2014) The Bayesian Case Model: A generative approach for case-based reasoning and prototype classification. In: NIPS
Kim B, Glassman E, Johnson B, Shah J (2015a) iBCM: Interactive bayesian case model empowering humans via intuitive interaction. In: MIT-CSAIL-TR-2015-010
Kim B, Shah J, Doshi-Velez F (2015b) Mind the gap: A generative approach to interpretable feature selection and extraction. In: Advances in Neural Information Processing Systems
Kindermans PJ, Schütt KT, Alber M, Müller KR, Dähne S (2017) Patternnet and patternlrp–improving the interpretability of neural networks. arXiv preprint arXiv:170505598
Kochenderfer MJ, Holland JE, Chryssanthacopoulos JP (2012) Next-generation airborne collision avoidance system. Tech. rep., Massachusetts Institute of Technology-Lincoln Laboratory Lexington United States
Krakovna V, Doshi-Velez F (2016) Increasing the interpretability of recurrent neural networks using hidden markov models. In: arXiv preprint arXiv:1606.05320
Kulesza T, Stumpf S, Burnett M, Yang S, Kwan I, Wong WK (2013) Too much, too little, or just right? ways explanations impact end users’ mental models. In: Visual Languages and Human-Centric Computing (VL/HCC), 2013 IEEE Symposium on, IEEE, pp 3–10
Lakkaraju H, Bach SH, Leskovec J (2016) Interpretable decision sets: A joint framework for description and prediction. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp 1675–1684
Lazar J, Feng JH, Hochheiser H (2010) Research methods in human-computer interaction. John Wiley & Sons
Lei T, Barzilay R, Jaakkola T (2016) Rationalizing neural predictions. arXiv preprint arXiv:160604155
Lipton ZC (2016) The mythos of model interpretability. arXiv preprint arXiv:160603490
Liu W, Tsang IW (2016) Sparse perceptron decision tree for millions of dimensions. In: AAAI, pp 1881–1887
Lombrozo T (2006) The structure and function of explanations. Trends in cognitive sciences 10(10):464–470
Lou Y, Caruana R, Gehrke J (2012) Intelligible models for classification and regression. In: ACM SIGKDD international conference on Knowledge discovery and data mining, ACM
Mehmood T, Liland KH, Snipen L, Sæbø S (2012) A review of variable selection methods in partial least squares regression. Chemometrics and Intelligent Laboratory Systems 118:62–69
Miller GA (1956) The magical number seven, plus or minus two: Some limits on our capacity for processing information. The Psychological Review (2):81–97
Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. In: arXiv preprint arXiv:1312.5602
Neath I, Surprenant A (2003) Human Memory. Wadsworth Cengage Learning
Otte C (2013) Safe and interpretable machine learning: A methodological review. In: Computational Intelligence in Intelligent Data Analysis, Springer
Parliament, of the European Union C (2016) General data protection regulation
Ribeiro MT, Singh S, Guestrin C (2016) “why should i trust you?”: Explaining the predictions of any classifier. In: arXiv preprint arXiv:1602.04938
Ross A, Hughes MC, Doshi-Velez F (2017) Right for the right reasons: Training differentiable models by constraining their explanations. In: International Joint Conference on Artificial Intelligence
Ruggieri S, Pedreschi D, Turini F (2010) Data mining for discrimination discovery. ACM Transactions on Knowledge Discovery from Data
Rüping S (2006) Thesis: Learning interpretable models. PhD thesis, Universitat Dortmund
Safavian SR, Landgrebe D (1991) A survey of decision tree classifier methodology. IEEE transactions on systems, man, and cybernetics 21(3):660–674
Schulz E, Tenenbaum J, Duvenaud D, Speekenbrink M, Gershman S (2016) Compositional inductive biases in function learning. In: bioRxiv, Cold Spring Harbor Labs Journals
Sculley D, Holt G, Golovin D, Davydov E, Phillips T, Ebner D, Chaudhary V, Young M, Crespo JF, Dennison D (2015) Hidden technical debt in machine learning systems. In: Advances in Neural Information Processing Systems
Selvaraju RR, Das A, Vedantam R, Cogswell M, Parikh D, Batra D (2016) Grad-cam: Why did you say that? visual explanations from deep networks via gradient-based localization. arXiv preprint arXiv:161002391
Shrikumar A, Greenside P, Shcherbina A, Kundaje A (2016) Not just a black box: Interpretable deep learning by propagating activation differences. ICML
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, et al (2016) Mastering the game of go with deep neural networks and tree search. In: Nature, Nature Publishing Group
Singh S, Ribeiro MT, Guestrin C (2016) Programs as black-box explanations. arXiv preprint arXiv:161107579
Smilkov D, Thorat N, Kim B, Viégas F, Wattenberg M (2017) Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:170603825
Strahilevitz LJ (2008) Privacy versus antidiscrimination. University of Chicago Law School Working Paper
Subramanian GH, Nosek J, Raghunathan SP, Kanitkar SS (1992) A comparison of the decision table and tree. Communications of the ACM 35(1):89–94
Suissa-Peleg A, Haehn D, Knowles-Barley S, Kaynig V, Jones TR, Wilson A, Schalek R, Lichtman JW, Pfister H (2016) Automatic neural reconstruction from petavoxel of electron microscopy data. In: Microscopy and Microanalysis, Cambridge Univ Press
Toubiana V, Narayanan A, Boneh D, Nissenbaum H, Barocas S (2010) Adnostic: Privacy preserving targeted advertising
Ustun B, Rudin C (2016) Supersparse linear integer models for optimized medical scoring systems. Machine Learning 102(3):349–391
Varshney K, Alemzadeh H (2016) On the safety of machine learning: Cyber-physical systems, decision sciences, and data products. In: CoRR
Wang F, Rudin C (2015) Falling rule lists. In: Artificial Intelligence and Statistics, pp 1013–1022
Wang T, Rudin C, Doshi-Velez F, Liu Y, Klampfl E, MacNeille P (2017) Bayesian rule sets for interpretable classification. In: International Conference on Data Mining
Williams JJ, Kim J, Rafferty A, Maldonado S, Gajos KZ, Lasecki WS, Heffernan N (2016) Axis: Generating explanations at scale with learnersourcing and machine learning. In: ACM Conference on Learning@ Scale, ACM
Wilson A, Dann C, Lucas C, Xing E (2015) The human kernel. In: Advances in Neural Information Processing Systems
Acknowledgements
This piece would not have been possible without the dozens of deep conversations about interpretability with machine learning researchers and domain experts. Our friends and colleagues, we appreciate your support. We want to particularity thank Ian Goodfellow, Kush Varshney, Hanna Wallach, Solon Barocas, Stefan Rüping and Jesse Johnson for their feedback.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Doshi-Velez, F., Kim, B. (2018). Considerations for Evaluation and Generalization in Interpretable Machine Learning. In: Escalante, H., et al. Explainable and Interpretable Models in Computer Vision and Machine Learning. The Springer Series on Challenges in Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-319-98131-4_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-98131-4_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98130-7
Online ISBN: 978-3-319-98131-4
eBook Packages: Computer ScienceComputer Science (R0)