Up through 2018, Hi-Rez Studios ran its core game platform on-premises and at co-location facilities. To achieve the scalability needed for Rogue Company, its latest game, Hi-Rez Studios moved its core platform to Azure Kubernetes Service and Azure SQL, using Kafka for messaging, HashiCorp Terraform for deployment, and GitHub for source control and collaboration. Since then, Hi-Rez Studios has adopted several additional Azure services, open-source solutions, and HashiCorp tools—giving the company a scalable DevOps environment that is benefiting both the company’s centralized back-end services team and its portfolio of game studios.
“Azure gave us the freedom to bring in virtually any open-source tool, helping to make everything that we’ve done with our Gen-2 architecture possible.”
Jon Kenkel, Technical Lead, Back-end Services Team, Hi-Rez Studios
Located just outside of Atlanta, GA, Hi-Rez Studios develops free-to-play action games for all major gaming platforms, including Xbox, Play Station, Switch, the PC and mobile. SMITE, the studio’s first big hit, is a third-person Multiplayer Online Battle Arena (MOBA) in which teams of five players try to take down the other team's base. Rogue Company, the studio’s latest game, is a cross-platform, arena-style action shooter that has been enjoyed by more than 20 million people since its introduction in 2020.
Since launching SMITE in early 2014, Hi-Rez Studios has grown from 75 employees in one office to more than 450 employees in the United States, the United Kingdom, and China. The company is organized into a number of individual game studios, which are supported by a centralized technology organization that maintains the build systems and online services for all Hi-Rez games—including backend services for logins, matchmaking, in-game inventory, and so on.
Preparing for continued success
Up through 2018, when Rogue Company was still in development, Hi-Rez Studios ran its core game platform on-premises and at co-location facilities. To prepare for the release of Rogue Company, which would drive game traffic to an unprecedented level, the company had to scale beyond what it could achieve by manually deploying and managing its own servers. Looking back, the company now calls this on-premises/co-located environment its Gen-1 architecture.
For its Gen-2 architecture, Hi-Rez Studios knew that it needed to harness the scalability of the cloud—and started testing Microsoft Azure. At first, they explored using Azure VMs, but quickly determined that self-managed VMs would still require too much hands-on effort. A short time later, after testing Azure Kubernetes Service as a means of scaling containerized applications automatically and on-demand, the company knew it had found the right approach.
“We explored using virtual machines hosted on Azure, which proved to be almost as burdensome as running on-premises because we still had to manually manage everything,” says Jon Kenkel, Technical Lead on the Back-end Services Team at Hi-Rez Studios. “But with Azure Kubernetes Service, all of that work is automatically handled for us—enabling us to scale up or down on-demand, without worrying about all the details. It helped us escape the trap we had fallen into, where we would just manually spin-up new servers when more players showed up.”
Containerization on Azure Kubernetes Service
Justin Driggers, an Advanced Software Engineer on Kenkel’s team, joined the company shortly before the transition to Gen-2 and spearheaded the move to Azure Kubernetes Service. He started by integrating Apache Kafka into the company’s existing on-premises systems, which at the time relied on direct service-to-service communication. “Kafka helped us prepare to scale out in a way that our Gen-1 architecture couldn’t support,” he explains. “After that, we moved our services into Kubernetes and eventually into Azure Kubernetes Service. It’s all been working great; the scalability we've gained with Azure Kubernetes Service is beyond anything we could have achieved before, and the way it automates everything for us is like having a free ops team.”
As Hi-Rez Studios moved its on-premises applications into Azure Kubernetes Service, they also migrated the data stores needed to support those applications to Azure. Azure SQL Managed Instance provided a simple migration path for older databases, while Azure SQL Database Hyperscale tier gave Hi-Rez Studios the scalability needed to handle the massive database load that Rogue Company would eventually generate.
Initial Gen-2 efforts also marked the company’s adoption of Azure Container Registry, which stores container images before they’re deployed into Azure Kubernetes Service. “Azure Container Registry is a fantastic tool—and it's incredibly easy to work with,” says Kenkel.
Adoption of additional Azure Services
While Azure Kubernetes Service and Azure SQL were primary enablers for supporting more concurrent gamers, they aren’t the only Azure services that Hi-Rez Studios has adopted to meet its scalability needs. The company’s homegrown ETL pipeline that feeds its data warehouse—developed soon after the company’s move to Azure—is a good example.
In the past, the ETL pipeline used long-running Python scripts that ran on game server instances for data collection. Other Python scripts running as scheduled jobs would then periodically collect that data, transform it, and load it into the data warehouse. The problems provided by this approach were manifold. “Scalability was very bad—limited to processing about 5,000 files per day,” says Ying Xie, a DevOps Software Engineer at Hi-Rez Studios. “Our old ETL pipeline was also hard to deploy, hard to manage, and hard to monitor.”
Xie solved all these problems by rebuilding the ETL pipeline using Azure services. Today, the game servers send HTTP messages containing file content to an Azure Function, which stores that data in Azure Blob Storage. The function also sends an event to Azure Event Hub, upon which another Azure Function loads the information into the data warehouse. “We have a receiver function that’s HTTP-triggered and a consumer function that’s event-hub triggered,” explains Xie. “With Azure Functions, we only need to concentrate on the core ETL logic, with Azure Functions and the Azure platform automatically handling all aspects of scalability.”
The new ETL pipeline is now loading about 200,000 files per day—some 40 times what it could handle before. Equally important, Xie no longer constantly worries about it. “In the past, as server load increased, I never knew what might break next,” she says. “Today, with Azure handling everything for me, I don’t need to babysit our ETL pipeline anymore.”
In the off chance a problem does arise, Xie is immediately notified by Azure Monitor, which the company now uses to monitor its entire Azure infrastructure. Application Insights—a feature of Azure Monitor— continually collects log, performance, and error data from Azure Functions, providing Xie with visibility into the workings of the ETL pipeline at all times. “With Azure Monitor Application Insights, I can see how often a function is being invoked, monitor its execution times, view traces written by functions code, and more,” she says. “And if Azure Monitor detects any errors, it automatically notifies us of the issue—so that we can immediately address it.”
Expanded use of open source
As the company’s use of Azure services has expanded, so has its use of open source. Today, in addition to Kafka, Hi-Rez Studios also uses Apache ZooKeeper for configuration management, Linkerd as a service mesh, FastAPI to publish APIs written in Python, and HELM for deployment of containerized applications into Azure Kubernetes Service.
“Azure gave us the freedom to bring in virtually any open-source tool, helping to make everything that we’ve done with our Gen-2 architecture possible,” says Kenkel. “We used to rely on a lot of home-grown, proprietary tools, which meant that they didn't get bug fixes or security patches unless someone in-house did that work. With open source, it's easy to track versions over time, review change histories, and find others who may be having similar questions or issues.”
For Kenkel’s team, the move to open source has been a journey of discovery. “We got to the mix of open source we have today through experimentation,” he says. “We saw the home-grown tools we were maintaining or new use cases that weren’t covered by what we had, and there were open source initiatives that not only provided the answers to our current problems but also answers to problems we suddenly realized we would eventually be facing. It was easy to talk about switching over to Kafka or FastAPI because of the flexibility that they provided.”
Kenkel also appreciates how open source lets the company keep its options open. “If we find an open-source library that’s falling behind or not meeting our needs, there's always two or three other open-source projects that are doing something similar,” he says. “And we can always fork an open source project to tailor it to our needs.”
For the Back-end Services Team, its adoption of open source has been somewhat of a cultural shift—albeit a good one. “Many of the legacy applications we developed in-house were difficult to maintain, creating a burden that we had to carry around with us wherever we went,” says Kenkel. “Open source has given us a sense of freedom—of being able to say, ‘out with the old, in with the new’—and that’s really made our lives a lot easier.”
Kenkel expects his team’s use of open source to continue expanding as it begins looking toward a Gen-3 architecture. “We weren’t able to move to a full-on microservices architecture for Gen-2 because of the timelines we faced—but that’s the direction we’re headed,” he says. “I’m not sure exactly which technologies we’ll be using, but there are many fantastic open-source tools and I’m sure we’ll be relying on them heavily as we move forward.”
GitHub for source control, collaboration, and automation
Historically, Hi-Rez Studios used Perforce for all its source control needs, as do most leading game studios. However, since the company’s move to Azure, the Back-end Services Team has moved almost all of its code to GitHub. “Perforce is well-suited to game development because of the large file storage required for things like textures, 3D models, and so on,” explains Driggers. “However, our team doesn’t use those types of artifacts; we write code. Alongside our initial move to Azure Kubernetes Service, we slowly transitioned our code base into GitHub—and then integrated our GitHub repositories with our CI/CD pipelines to perform builds that get promoted into our Kubernetes environment.”
Driggers recalls advocating for the move to GitHub when he joined the company in 2020. “I had used GitHub in the past and preferred it over other solutions,” he says. “It promotes cross-team collaboration and lets everyone participate in code reviews in a very intuitive manner—making it easy to comment on code and discuss code changes. GitHub really fosters a sense of community among team members, enabling each of us to focus on our own code while also maintaining visibility into what other team members are doing.”
The Back-end Services Team also uses GitHub for things like change requests from other parts of the business. “We’ve moved to GitHub for almost all of our service management,” says Kenkel. “And we use it for pull requests and other coordination tasks. The HELM charts for our Kubernetes environment are also stored in GitHub, which we use to automate a CI/CD process for each of our environments. And we run many environments for each game, so there’s a lot to manage in that area. Being able to stay hands-off via automation has been a huge win.”
HashiCorp tools for DevOps
Alongside Azure, open source, and GitHub, tools from HashiCorp comprise the fourth major pillar of the company’s Gen-2 architecture. Use of the HashiCorp suite began alongside the move to Kubernetes, when the company began using HashiCorp Terraform to describe the deployment of Azure Kubernetes Service clusters using code stored in GitHub.
“Before our move to Azure, a lot of our infrastructure was manually allocated,” says Driggers. “Someone would go into a UI, click some buttons, and bring up a new server—and nobody else knew exactly what was there. Terraform lets us codify what we want in a text file then creates that infrastructure for us, in a fully documented and repeatable manner. With Terraform, any time we deploy a Kubernetes cluster or other resource, we know that it was setup correctly.”
Terraform also makes provisioning new infrastructure faster and easier. The new ETL pipeline Xie developed is a good example. “In the past, to setup an instance of our ETL pipeline, it took a full day to manually deploy and configure all the Azure resources, connection strings, secrets involved, and so on—and then validate I hadn’t made any errors,” Xie recalls. “With Terraform, anyone can deploy a new instance of our ETL pipeline with the click of a button.”
Adds Matthew Smith, Lead DevOps Engineer at Hi-Rez Studios, “Another great thing about Terraform is that we can use it to deploy our Azure Monitor definitions. So when we stand up an instance of our ETL pipeline, or anything on Azure for that matter, we can use Terraform to simultaneously attach a monitoring environment.”
Alongside its initial use of Terraform, the Back-end Services Team adopted HashiCorp Vault for secrets management—such as providing controlled access to the cryptographic keys needed to setup a new Azure Kubernetes Service cluster. “Historically, we’ve had multiple different secrets management mechanisms—including instances where secrets literally lived on somebody's desktop and were copied into new infrastructure for it to work correctly,” explains Driggers. “Vault gives a single place to store all secrets, along with a robust set of access controls.”
According to Kenkel, integrating the movement of secrets into the deployment process was a trivial task. “Traditionally, one of the difficult parts of application development has been getting your secrets into your applications in a secure way,” he says. “Vault and Terraform make it really easy to store secrets securely and then propagate them to the appropriate places when needed—all while ensuring that people don't have access to things they're not supposed to.”
Following Terraform and Vault, Hi-Rex Studios deployed HashiCorp Consul for service discovery and networking. “If our Terraform definition of an environment needs to shift, rather than having to rewrite our Terraform file, we can just change the variables in Consul that are being fed into Terraform,” explains Smith.
Consul also stores information on the Azure resources created using Terraform, giving different applications across the company a single place to query for the information they need. And it stores other data, including parameters for build pipelines and schema mappings for the company’s ETL pipeline.
Hi-Rez Studios also adopted HashiCorp Packer, which enables the creation of identical machine images for multiple platforms from a single source configuration. “We use Packer to generate base images for a lot of our build systems,” explains Smith. “From there, using build servers created with Terraform, we construct container images for our game teams and then push those container images to Azure Container Registry. Finally, using HELM, we deploy the container images into Azure Kubernetes Service clusters that were also provisioned using Terraform. And there you have it—a new core gaming platform build that’s ready-to-go.”
While each tool in the HashiCorp stack delivers strong value on its own, it’s how they build on each other that has made Smith such a strong HashiCorp advocate. “Packer lets us define images that are usable in multiple locations with simple command lines,” explains Smith. “Packer plus Terraform solves that same problem in a way that means we don't need to care about how those images are deployed. Packer plus Terraform plus Vault means that we also don't have to worry about a bunch of credentials stored in our Terraform files. Now add Consul, which lets us plumb it all together and get service discovery on top of it. It all comes together beautifully.”
Looking back over his use of HashiCorp tools, Smith appreciates the ease with which he’s been able to integrate them to deliver new value. “We initially implemented Vault without using Terraform to deploy it,” he says. “That's no longer the case. We adopted Consul because we wanted a configuration management system—but why use that without also using the HA functionality in Vault that leverages Consul? So we hooked all that together. HashiCorp tools work extremely well together because HashiCorp builds and maintains them as such.”
Smith also appreciates how HashiCorp always seems to anticipate his needs. “There are several times I’ve set out to build something, performed a few searches, and then said to myself ‘Oh… HashiCorp already built that.’ And they’ve thought about everything I’ve ever needed to care about. If you’re in DevOps and you're looking at building something, go see if HashiCorp built it first. Their stack is reliable, it’s secure, and I trust it to meet my needs.”
A successful launch
Hi-Rez Studios released Rogue Company into open beta in October 2020 across the Windows, Xbox One, PlayStation 4, and Nintendo Switch platforms, with an Xbox Series X/S release following in November 2020. A PlayStation 5 release followed in March 2021, and the game came out of beta in May 2022.
The company’s Gen-2 architecture based on Azure, GitHub, open source, and the HashiCorp suite helped pave the way for the success of Rogue Company, delivering the scalability needed to support 15 million players within the first two months—followed by millions more since then. “Our current architecture is much more scalable and elastic,” says Kenkel. “And troubleshooting is improved too; it’s a lot easier to jump in and take a look at what’s going on in Azure Kubernetes Service than when we were running a bunch of bare-metal servers. Today we can deploy and manage applications at a scale that we couldn’t even talk about before.”
Patching is also easier now, with players of Rogue Company never knowing it happened. “Shortly after we went into open beta, we had to do some emergency patching to address a performance issue,” recalls Kenkel. “And we did it with zero downtime. Everything went smoothly, and users quickly stopped seeing the performance issues they were experiencing.”
What’s more, with the DevOps environment that Hi-Rez Studios now has in place, developers across the company are also more agile—able to deliver new functionality faster and with fewer distractions. “Most of what I do is inside of that Kubernetes cluster,” says Kenkel. “I use Terraform to create the cluster, and then from there on out it’s agnostic. I don't need to care which flavor of Kubernetes we're using because our services are written to be able to lift and shift. Using Azure Kubernetes Service has given us the opportunity to focus heavily on our application development without worrying about where it's deployed or how it's deployed.”
For the company’s game teams, its Gen-2 architecture helps them keep their respective offerings fresh because any new environments they need are readily available—and as scalable as needed. “The speed at which we can spin-up new environments has decreased dramatically since we started leveraging HashiCorp tools,” says Smith. “Today, when a game team says ‘Hey…. Can we get a new environment?’, instead of two days going back-and-forth with them about how they want it setup, my response is, ‘Yeah, sure. What's the problem? Just push the button.’ They don’t get slowed down developing their games, and I don’t get distracted from delivering new DevOps capabilities. It’s a huge win for everyone.”
Looking forward, as Hi-Rez Studios continues to delight gamers with new experiences, the company plans to continue relying on the technology partners that helped it get to where it is today. “We have some great new games coming this year,” says Stewart Chisam, CEO at Hi-Rez Studios. “We’re developing all of them with Visual Studio and will use Azure to scale them to millions of players. Microsoft has a deep understanding of gaming through its Xbox franchise, making our partnership with them a natural one that will continue to grow.”
Find out more about Hi-Rez Studios on Twitter, Facebook, and LinkedIn.
“It’s all been working great; the scalability we've gained with Azure Kubernetes Service is beyond anything we could have achieved before, and the way it automates everything for us is like having a free ops team.”
Justin Driggers, Advanced Software Engineer, Back-end Services Team, Hi-Rez Studios
Follow Microsoft