Ebola data from the Internet: An opportunity for syndromic surveillance or a news event?

Published by ACM - Association for Computing Machinery

Syndromic surveillance refers to the analysis of medical information for the purpose of detecting outbreaks of disease earlier than would have been possible otherwise and to estimate the prevalence of the disease in a population. Internet data, especially search engine queries and social media postings, have shown promise in contributing to syndromic surveillance for in uenza and dengue fever. Here we focus on the recent outbreak of Ebola Virus Disease and ask whether three major sources of Internet data could have been used for early detection of the outbreak and for its ongoing monitoring. We analyze queries submitted to the Bing search engine, postings made by people using Twitter, and news articles in mainstream media, all collected from both the main infected countries in Africa and from across the world between November 2013 and October 2014. Our results indicate that it is unlikely any of the three sources would have provided an alert more than a week before the official announcement of the World Health Organization. Furthermore, over time, the number of Twitter messages and Bing queries related to Ebola are better correlated with the number of news articles than with the number of cases of the disease, even in the most affected countries. Information sought by users was predominantly from news sites and Wikipedia, and exhibited temporal patterns similar to those typical of news events. Thus, it is likely that the majority of Internet data about Ebola stems from news-like interest, not from information needs of people with Ebola. We discuss the differences between the current Ebola outbreak and seasonal in uenza with respect to syndromic surveillance, and suggest further research is needed to understand where Internet data can assist in surveillance, and where it cannot.