Mass gatherings, such as music festivals and religious events, pose a healthcare challenge because of the risk of transmission of communicable diseases. This is exacerbated by the fact that participants disperse soon after the gathering, potentially spreading disease within their communities. The dispersion of participants also poses a challenge for traditional surveillance methods. The ubiquitous use of the Internet may enable the detection of disease outbreaks through analysis of data generated by users during events and shortly thereafter.
Objective: To develop algorithms that can alert of possible outbreaks of communicable diseases from Internet data, specifically Twitter and search engine queries.
Methods: We extracted all Twitter postings and queries made to the Bing search engine for a period of 30 days before until 30 days after each festival made by users who repeatedly mentioned each one of six major music festivals held in the United Kingdom during 2012. We analyzed these data using three methods, two which compare words associated with disease symptoms before and after the time of the festival, and one which compares the frequency of these words with those of other users in the UK in the days following the festivals.
Results: The data comprised, on average, 10.02 million tweets made by 18,134 users, and 32,143 queries made by 1756 users from each festival. Our methods indicated the statistically significant appearance of a disease symptom in two of the six festivals. For example, cough was detected at higher than expected levels following the Wakestock festival. Statistically significant agreement (chi2 test, p<0.05) between methods and across data sources was found for most pairwise comparisons in festivals where a statistically significant symptom was detected. In other festivals such agreement was not significant, except between the two methods which compare pre-event activity with post-event activity. Anecdotal evidence suggests that the symptoms detected in the two festivals are indeed indicative of a disease that some users attributed to being at the festival.
Conclusions: Our work shows the feasibility of creating a public health surveillance system for mass gatherings based on Internet data. The use of multiple data sources and analysis methods was found to be advantageous for rejecting false positives. Further studied are required in order to validate our findings with data from public health authorities.