Exploiting structured data for learning contagious diseases under incomplete testing

  • Maggie Makar ,
  • Lauren R West ,
  • David C Hooper ,
  • ,
  • Erica Shenoy ,
  • John Guttag

2021 International Conference on Machine Learning |

Publication

One of the ways that machine learning algorithms can help control the spread of an infectious disease is by building models that predict who is likely to get infected whether or not they display any symptoms, making them good candidates for preemptive isolation. In this work we ask: can we build reliable infection prediction models when the observed data is collected under limited, and biased testing that prioritizes testing symptomatic individuals? Our analysis suggests that under favorable conditions, incomplete testing might be sufficient to achieve relatively good out-of-sample prediction error. Favorable conditions occur when untested-infected individuals have sufficiently different characteristics from untested-healthy, and when the infected individuals are “potent”, meaning they infect a large majority of their neighbors. We develop an algorithm that predicts infections, and show that it outperforms benchmarks on simulated data. We apply our model to data from a large hospital to predict Clostridioides difficile infections; a communicable disease that is characterized by asymptomatic (i.e., untested) carriers. Using a proxy instead of the unobserved untested-infected state, we show that our model outperforms benchmarks in predicting infections.