Understanding Black-box Predictions via Influence Functions
How can we explain the predictions of a black-box model? In this paper, we use influence functions — a classic technique from robust statistics — to trace a model’s prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. To scale up influence functions to modern machine learning settings, we develop a simple, efficient implementation that requires only oracle access to gradients and Hessian-vector products. We show that even on non-convex and non-differentiable models where the theory breaks down, approximations to influence functions can still provide valuable information. On linear models and convolutional neural networks, we demonstrate that influence functions are useful for multiple purposes: understanding model behavior, debugging models, detecting dataset errors, and even creating visually-indistinguishable training-set attacks.
- Date:
- Speakers:
- Pang Wei Koh
- Affiliation:
- Stanford University
-
-
Ryota Tomioka
Principal Research Manager
-
-
Watch Next
-
[ICSE'22] TOGA: A Neural Method for Test Oracle Generation
Speakers:- Elizabeth Dinella and Gabriel Ryan
-
-
Microsoft Research-IISc AI Seminar Series: Learning to Walk
Speakers:- Professor Jitendra Malik
-
Research talks: AI for software development
Speakers:- Neel Sundaresan,
- Oege de Moor
-
MDETR: Modulated Detection for End-to-End Multi-Modal Understanding
Speakers:- Aishwarya Kamath
-
Introducing Retiarii: A deep learning exploratory-training framework on NNI
Speakers:- Quanlu Zhang,
- Scarlett Li
-
Platform for Situated Intelligence Workshop | Day 2
Speakers:- Sean Andrist,
- Nick Saw,
- Ashley Feniello
-
Knowledge Distillation as Semiparametric Inference [Talk]
Speakers:- Lester Mackey
-
-