This tutorial offers a unified introduction to the modern theory of causality based on counterfactuals (aka potential outcomes), directed acyclic graphs (DAGs) and non-parametric structural equation models (NPSEMs). There are very large literatures associated with each of these frameworks, but the connections, which will be highlighted in this tutorial, are often obscure.
In the first part of the tutorial we will introduce key concepts and distinctions by considering models that involve only a few variables, this will then provide the foundation for the consideration of more complex models based on graphs in the second.
We begin by introducing counterfactuals in the simplest causal model involving a single binary ‘treatment’ and binary ‘outcome’. This then leads naturally to the instrumental variable model, that applies in contexts where, for ethical or practical reasons, we are unable to ‘compel’ subjects to take a given treatment level (e.g. to take part in an exercise program, or to click on a link). This ‘non-compliance’ presents a challenge for causal inference since subjects who do participate are often not representative of the population of interest. However, if it possible to find another variable (e.g. payments for participation) that can be randomized and that increases the probability that a subject will take treatment then it is possible to make informative (non-parametric) inferences regarding causal effects. We will see that if the instrument does not directly affect the final outcome then the resulting set of distributions corresponds to a particular convex polytope, more specifically a rhombic dodecahedron, for a discrete instrument with finitely many levels.
In the second part we will describe Single-World Intervention Graphs (SWIGs) that provide a simple bridge between potential outcomes and causal graphs. The application of Pearl’s d-separation to the SWIG allows us to determine how and when simple adjustment for variables in an observational study may provide a basis for drawing causal conclusions by controlling for confounding. We will then present a simple reformulation of the ID algorithm (due to Tian) that provides a complete answer to the question of whether certain causal queries are non-parametrically identified in the context of a given DAG (that may include hidden variables). We will then discuss the problem of structure identification (aka causal discovery) describing the state of the art in (non-parametric) structure learning algorithms that allow for the presence of hidden variables. Lastly we will given an overview of current research on nested Markov models that encode non-parametric constraints that generalize conditional independence (aka Verma constraints).