Trace Id is missing
May 26, 2021

Zooniverse’s move to Azure enhances citizen science

Zooniverse is the worlds largest platform for people-powered research. In 2020, based on a desire to do more with Microsoft from a research perspective, the development team at Zooniverse moved its platform from Amazon Web Services (AWS) to Microsoft Azure. The Zooniverse platform is now more cohesive, with more consistent performance and less downtime. Less time is required for day-to-day operational tasks and new code can be deployed much more quickly, leaving the development team with more time to support new feature development and additional platform improvements.

Zooniverse

“Today we can focus more of our time on feature development instead of delivering code to production. Deployments that used to take an hour now take 3 to 10 minutes, with a much smaller chance of something going wrong.”

Zach Wolfenbarger, Software Developer, Zooniverse

A platform for people-powered research

A collaboration led by Chicagos Adler Planetarium, the University of Oxford, and the University of Minnesota, Zooniverse is the worlds largest people-powered research platform. Through the aid of more than 2 million public participants, Zooniverse helps researchers deal with the flood of information theyre able to collect about the world around us. It does this by taking advantage of the human minds unmatched ability for complex pattern recognition to help classify images, audio, and video files more quickly and accurately, and ultimately advance the ability of computers to perform those same tasks.

Zooniverse is being used to fuel new discoveries across a multitude of disciplines, from addressing climate change and protecting endangered species to analyzing ancient manuscripts and identifying new frontiers in space. Heres how it works: The Zooniverse Project Builder portal lets researchers quickly and easily set up new projects, and then upload their images, videos, or audio files—such as camera trap images of wild animals or satellite imagery of a star. Zooniverse volunteers can then help tag, annotate, or transcribe those files, without the need for any specialized training or expertise. The classifications performed by those volunteers are ultimately combined and, after each data point has been consistently classified by enough volunteers, its considered reliable and accurate enough to be used for further scientific analysis.

For all this to work smoothly, the Zooniverse platform needs to deliver on several requirements. First and foremost, to engage users, especially its volunteers, the platform needs to be reliable and performant—as needed to deliver a frictionless and enjoyable experience. Equally important, because all Zooniverse projects run on the same infrastructure, the platform needs to meet these criteria even when a single project experiences huge traffic spikes.

We typically have about 100 active projects at any given time, with 10,000 to 15,000 active daily users and an average of 1,500 to 4,000 calls per minute to our main API,” says Cam Allen, a software developer at Zooniverse. However, traffic volumes can spike to tens of thousands of API calls per minute for short periods of time when Zooniverse receives coverage on public radio or BBCs Stargazing Live.”

Whats more, because Zooniverse depends entirely on grants for its funding, the platform needs to deliver all this cost-effectively. Budget constraints also mandate high developer productivity, especially at the platform level. Of the 20 or so employees at Zooniverse, about half are software developers,” explains Zach Wolfenbarger, who works alongside Allen. Were the only two developers who work on API services and platform infrastructure, however, with everyone else who writes code working on front-end development, analytics, or data science. Today, the two of us handle all new API service development and keep everything up and running.”

Platform evolution

Today, Zooniverse runs on Microsoft Azure. However, up through 2019, it ran on Amazon Web Services (AWS). Some 60 or so individual virtual machines (VMs) ran containerized React, Ruby, Node.js, and Python code on EC2, with the team managing its own EC2 and Docker environments. All content and media were stored in S3, with the single-page application relying on multiple managed PostgreSQL databases, a managed MySQL database, a self-hosted MongoDB database, an AWS Kinesis queue, and Redis running in a VM.

Of all the moving parts, the self-managed Docker environment required the most care and feeding. We adopted containers fairly early on, but embedded them into our pre-existing VMs,” explains Allen. “As such, we had to implement some basic scripting to run the containers on the VMs, which relied heavily on Docker and Docker Compose.”

While the team had everything working fairly well, maintaining the environment required a fair amount of work. “Deployments required the scripts to build the VMs, apply security updates and other patches, download the container images, and test the services to make sure they worked,” explains Allen. “All of this could take up to an hour and each step could break the process.”

Database scalability was another issue. The Zooniverse database workload is somewhat unusual, consisting of many small inserts as volunteers respond to questions and help with classifications. As such, the Zooniverse platform needs the ability to scale its database to keep up with the volume of changes coming from volunteers. To avoid SQL reads on this changing dataset, the team deployed Redis caches and used a Kinesis data stream to move data in batches out of the main API to “downstream” services for statistical counts, custom aggregations, and so on.

Reimagining Zooniverse on Azure

Although Zooniverse ran on AWS, the rest of the organization had a long history of working with Microsoft, including Microsoft Research and Microsoft AI for Earth—an initiative to put the Microsoft cloud and AI tools in the hands of those working to solve global environmental challenges. And the leadership at Zooniverse wanted to accelerate those efforts.

We realized we could do even more with Microsoft—and thus accelerate scientific research—if Zooniverse also ran on Azure,” says Wolfenbarger. Access to labeled data hosted by AI for Earth is one example, and access to the Computer Vision API developed by Microsoft Research is another. Being able to collaborate more closely and effectively with Microsoft was a powerful motivator.”

The Zooniverse team also saw several advantages in moving to Azure, the largest one being an opportunity to reimagine and modernize its infrastructure. We saw how we could benefit greatly by migrating to Azure, with the move to a fully managed Kubernetes environment being the low-hanging fruit,” says Allen. We also saw an opportunity to decommission some legacy apps, bring more clarity to our CI/CD strategy, and ultimately build a more sophisticated, robust, and automated DevOps pipeline.”

Adds Wolfenbarger, We were already heavily containerized, so our code was fairly mobile. Azure also offered comprehensive support for open source, and an equivalent to nearly all the services we were using on AWS. We knew that migrating Zooniverse from AWS to Azure would take some work, but a lot of that work was for improvements we already wanted to make. All things considered, there were a lot more pros than cons.”

Planning, implementation, and migration

The effort began in September 2019. At the start, the team spent most of its time planning—determining how to make the move in the safest way possible. After we had sufficient confidence in our plan, we basically rolled up our sleeves and went hands-on, learning the technical details as we went about the work,” says Allen. The support and guidance provided by Microsoft was invaluable, as was the wealth of online documentation provided for Azure.”

Throughout the implementation and migration process, the team employed a multi-cloud approach. It was a gradual migration, not a hard cut-over,” says Wolfenbarger. “Piece by piece, we migrated our infrastructure to Azure. As each part was completed, we used Azure DNS combined with Azure Front Door to move that workload component from AWS to Azure. We were extremely careful about any downtime, always keeping the ability to move a component back to AWS temporarily if needed. There were a few hiccups along the way, but nothing too impactful.”

A modern, cloud-native architecture

Figure 1 shows the current architecture for Zooniverse. Azure DNS and Azure Front Door were two of the first services to go live, with the latter providing the means to route traffic between Zooniverse components running on AWS and those running on Azure as the migration progressed. Azure Front Door serves as a giant, automatic, worldwide CDN—connecting users around the world to the closest ‘Microsoft fiber’ to deliver worldwide performance optimization,” explains Allen. “Azure Front Door played a key role in our migration strategy, providing the performance of the optimized Microsoft global network along with the flexibility to route traffic to existing AWS-based services and those newly migrated to Azure.”

Moving to a fully managed Kubernetes environment

The team then moved the application instances into Azure Kubernetes Service (AKS), a fully managed Kubernetes environment, using Azure Front Door to isolate the services running in AKS from those in AWS. Within each Kubernetes cluster, dynamic scaling automatically adjusts both the number of nodes within the cluster and the number of pods (or replicas) on each node.

Because code running within AKS was already containerized, it didn’t need to change much. As before, it’s based on Ruby, Rails, and Node.js. The AKS environment also supports some Python-based Django and Flask apps, Memcached instances that run as sidecars, and self-hosted Redis instances that are backed by Azure Files and provide caching and persisted data storage. Zooniverse also uses Azure Cache for Redis, which provides it with a high-performance, fully managed Redis service for shared state management across Kubernetes pods and nodes.

While the move from a self-managed Docker environment to AKS took the most effort in terms of planning and implementation, it also delivered the largest benefits. Today, the team no longer spends nearly as much time on operations (such as patching), can deploy new code faster and more confidently, and has more time to work on other things. Today we can focus more of our time on feature development instead of delivering code to production,” says Wolfenbarger. Deployments that used to take an hour now take 3 to 10 minutes, with a much smaller chance of something going wrong.”

Migrating all of the data

After migrating the application instances to Azure, the main databases were still in the AWS RDS ecosystem. The team didn’t want any service downtime, so it used the Azure Database Migration Service to move the primary data store from PostgreSQL on Amazon RDS to Azure Database for PostgreSQL. The team also moved semi-structured data from MongoDB into Azure Database for PostgreSQL, taking advantage of its comprehensive support for JSON data types.

According to Allen, Azure Database for PostgreSQL has been rock-solid,” and has given his team some powerful new capabilities in terms of tooling and insights. The intelligent performance optimization features in Azure Database for PostgreSQL are a prime example, providing a means of easily persisting and visualizing query performance data along with determining unused or missing indexes that might improve query performance. This helps keep the PostgreSQL databases performant, resulting in more resource headroom to handle any traffic spikes.

Altogether, the team maintains six instances of Azure Database for PostgreSQL. The largest production instance, which contains about 650 gigabytes of data, is based on the 8-vCore, memory-optimized pricing tier for Azure Database for PostgreSQL. The primary database is supported by a read replica, which helps offload traffic from the applications main read-write instance.

In December 2020, after making sure it had the right pricing tier, the team took advantage of reserved capacity pricing to reduce the cost for that pricing tier. Reserved capacity pricing saved us about 40 percent, enabling us to afford larger database instances for the same price,” explains Wolfenbarger. Now we no longer need to scale up our database when traffic surges due to a mention on public radio or the BBC, and thats on top of the 30 percent increase in average traffic we’ve seen due to COVID-19 and people staying at home.”

For the final step of the migration, the team used AzCopy to move some 80 terabytes of project data from Amazon S3 to Azure Blob Storage—including images, audio, video, and web content for the site itself, such as HTML, JavaScript, and CSS.

A win on all levels …”

Zooniverse completed its move to Azure in December 2020. Throughout that same year, as the team worked to make it all happen, the COVID-19 pandemic forced developers to work from home, collaborating via GitHub. In 2020, Zooniverse experienced a 30 percent increase in average daily traffic, launched 72 new projects, and grew its number of registered volunteers by 260,000, to 2.2 million. And those volunteers performed 100 million classifications, an increase of 71 percent over the 58.5 million performed in 2019. Even with all those headwinds, the development team completed its move to Azure while keeping total downtime—both planned and unplanned—to just 17 hours.

To date, as of March 2021, Zooniverses 2.2 million registered volunteers have generated some 576 million classifications to aid researchers in their discoveries. And both of those groups are benefiting substantially thanks to the development teams efforts throughout 2020. Our infrastructure is a lot more cohesive after our move to Azure,” says Wolfenbarger. The last unexpected traffic spike we experienced was absorbed by our platform infrastructure with no issues.”

The Zooniverse development team is also benefiting from its move to Azure, along with the rest of the organization. With less time required for day-to-day operational tasks, the development team now has more time to support new feature development—and can deploy those new features into production more quickly and efficiently.

Things just work better and run more smoothly today,” says Allen. With managed services on Azure, were able to spend a lot less time on system administration and management, leaving 60 to 70 percent of our time to deliver new value. All in all, Id estimate that our move to fully managed services on Azure has reduced our operational workload by one FTE.”

Theyre spending some of that newfound bandwidth exploring additional Azure features and services that might further improve the Zooniverse platform, including GitHub actions and Azure Application Insights. They also have more time to support the organizations primary reason for moving to Azure—the ability to do more with Microsoft”—through projects such as direct syncing between Zooniverse and AI for Earthtools for processing wildlife imagery.

When this syncing is finished, project owners working on camera trap data will be able to use Microsoft AI to help classify the animals captured in some of their images, making even more efficient use of citizen science by saving the efforts of volunteers for those images that are more difficult to classify. We have a working system, which were now showing to project teams—and theyre all asking when they can get it,” says Wolfenbarger.

Last but not least, costs have also improved, leading to more efficient use of the grants and other funding upon which Zooniverse relies. On top of all the other ways were benefiting, we’ve been able to save about 10 to 20 percent on our monthly bill compared to when we were on AWS,” says Allen. Clearly, our move to Azure has been a win on all levels.”

Wolfenbarger agrees with Allens assessment. Fully buying into Azure has opened a lot of doors for us—not just in terms of platform maturity, but also in terms of deeper connection and collaboration with AI for Earth and Microsoft Research,” he says. We“ value the time that volunteers spend on Zooniverse, and Azure is helping us make the most of their efforts.”

“On top of all the other ways we’re benefiting, we’ve been able to save about 10 to 20 percent on our monthly bill compared to when we were on AWS. Clearly, our move to Azure has been a win on all levels.”

Cam Allen, Software Developer, Zooniverse

Take the next step

Fuel innovation with Microsoft

Talk to an expert about custom solutions

Let us help you create customized solutions and achieve your unique business goals.

Drive results with proven solutions

Achieve more with the products and solutions that helped our customers reach their goals.

Follow Microsoft