How to explain a machine learning model such that the explanation is truthful to the model and yet interpretable to people?
The main objective of DiCE is to explain the predictions of ML-based systems that are used to inform decisions in societally critical domains such as finance, healthcare, education, and criminal justice. In these domains, it is important to provide explanations to all key stakeholders who interact with the ML model: model designers, decision-makers, decision-subjects, and decision-evaluators. Most explanation techniques, however, face an inherent tradeoff between fidelity and interpretability: a high-fidelity explanation for an ML model tends to be complex and hard to interpret, while an interpretable explanation is often inconsistent with the ML model it was meant to explain.
Counterfactual explanations offer a promising alternative. Rather than approximate an ML model or rank features by their predictive importance, a CF explanation “interrogates” a model to find required changes that would flip the model’s decision. Specifically, DiCE provides this information by showing the feature-perturbed versions of the same input who would have got a different outcome. For example, consider a person who applied for a loan and was rejected by the loan distribution algorithm of a financial company. DiCE would show the person a diverse set of feature-perturbed versions of the same person who would have received the loan by the same ML model, e.g., “You would have received the loan if your income was higher by $10k”. In other words, a counterfactual explanation helps a decision-subject decide what they should do next to obtain a desired outcome rather than providing them only with important features that contributed to the prediction. In addition, CF explanations from DiCE are also useful to the decision-maker who can use them to evaluate the trustworthiness of a particular predicton from the ML model. Similarly, CF explanations over multiple inputs can be useful for decision-evaluators to evaluate criteria such as fairness, and by model developers to debug their models and prevent errors on new data.
Two key challenges in generating CF explanations are diversity and feasibility. The DiCE project aims to constructs a universal engine that can be used to explain any machine learning in terms of feature perturbations. Current research focuses on ensuring that high-diversity CF explanations are produced, and that the generated CFs are also feasible with respect to an underlying causal model that generates the observed data.
DiCE is available as an open-source project on GitHub.