The number one United Nations Sustainable Development Goal is to eliminate poverty, leaving nobody behind. Researchers in the United Kingdom are harnessing the large-scale data-processing power of Microsoft Azure to map the location of every person on Earth to provide the accurate population statistics needed to achieve this international humanitarian goal.
“There are about 2 billion people in the world today who are so poor that they earn less every day than the price of a cappuccino,” explains Claire Melamed, Executive Director of The Global Partnership for Sustainable Development Data.
For decades, poverty has been the premier predictor of the general health and economic development needs of the world’s most vulnerable populations. Conventional means of assessing poverty, such as national census and household survey data, cannot provide the necessary detailed situational views in a timely manner for aid providers to be fully effective. In 2012, for example, one of the world’s largest foundations funded the distribution of polio vaccines in northern Nigeria using 2006 census data. Workers either ran out of vaccines or returned with unused supplies due to the lack of more detailed population information.
The WorldPop research team at the University of Southampton, U.K., provides critical data for tracking the UN Sustainable Development Goals by counting every person on Earth, where they are and who they are. The team does this using novel data science techniques and cloud computing to combine large datasets drawn from census, surveys, satellite, GIS and other sources to provide governments and NGOs with extremely detailed spatial and temporal mappings—some with resolutions down to 100 meters square. “The datasets can be so large and complex that it’s impractical or impossible to build them on a single workstation,” says Andy Tatem, a professor of geography and environment at the University of Southampton and the director of the WorldPop initiative. “But now our researchers are able to cut them down to size with the compute clusters and parallel computing that Microsoft Azure provides.”
For example, consider WorldPop researcher Maksym Bondarenko. He is working on an ambitious global analysis project to draw a range of insightful predictions and calculations from geospatial data for every country in the world at a resolution of 100 square meters, which involves processing 800 million cells of data. Bondarenko built a high-performance computing (HPC) cluster on Azure, using A8-A9 virtual machines that support ultrafast InfiniBand network connections. This high-performance networking is used in many supercomputers, and is necessary to scale beyond a few machines.
“Azure was the only cloud that gave us true supercomputing performance,” says Tatem.
The team does much of their analysis using the R programming language, and Microsoft’s open source capabilities are ideal for this. “With Azure HDInsight, we also used open source R programming with Microsoft R Server for hosting and managing parallel and distributed workloads of R processes on the VMs,” Bondarenko says. “We then output our results to a Random Forest tree-based machine learning model. This approach can enable predictive models and map nonlinear relationships quite well.”
Using Azure, WorldPop Research Fellow Jessica Steele has the computing power she needs to analyze how poverty and gender inequality are related to how people live and move. “Poverty is absolutely gendered,“ she says. “We know women are more likely to be poor and more susceptible to falling into poverty.”
Azure is helping Steele achieve more when analyzing large population datasets. “Running statistical models of poverty is a very iterative process. Being able to parallelize and speed up the process using Azure makes that iteration process shorter. This lets us get results back faster, talk to team members more quickly, and make decisions about how to move forward,” she says.
WorldPop Research Fellow Dr. Chigozie Edson Utazi provides another example of how Azure helps facilitate the organization’s research projects: he uses the cloud computer infrastructure to seamlessly scale his R analysis that provides data for measles vaccination programs in Africa.
”I grew up in Nigeria, and it’s amazing that I can sit here help improve the lives of people in my country with this research,” he says. “I use R for all my statistical analysis. These datasets cover entire countries sometimes. And you could have millions of grid cells in just one of them. The data is too big, and my computer doesn’t have enough memory to handle it. So I would require a bigger computer, a high-performance computer, to be able to do that. It would be nightmarish to do this without Azure. So it’s refreshing to use Azure because I don’t have to wait in a queue, and I get my code run very quickly, and I have some results back in a very timely manner.”
Easy, anywhere access to the Microsoft Azure cloud computing platform, combined with its power, versatility and scalability, has changed how the WorldPop research team does their spatial and temporal mapping research.
“At WorldPop, we’re shaving as much as 90 percent off our calculation run-times using Microsoft Azure,” Tatem says. “This frees us to focus more on data science, to improve the quality of our population mapping and ultimately to help governments and aid providers target poverty issues more efficiently and effectively.”