Changing the Game: Halo 4 Team Gets New User Insights from Big Data in the Cloud

In late 2012, Halo 4 gamers took to their Xbox 360 consoles en masse for a five-week online battle. They all had the same goal: to see whose Spartan could climb to the top of the global leaderboards in the largest free-to-play Halo online tournament in history.
Using the game’s multiplayer modes, players participating in the tournament—the Halo 4 “Infinity Challenge”—earned powerful new weapons and armor for their Spartan-IV and fought their way from one level to the next. And with 2,800 available prizes, there was plenty of incentive to play.
Behind the scenes, a powerful new Microsoft technology platform called HDInsight was capturing data from the cloud and feeding daily game statistics to the tournament’s operator, Virgin Gaming. Virgin not only used the data to update online leaderboards each day; it also relied on the data to detect cheaters, removing them from the boards to ensure that the right gamers got the chance to win.
But this new technology didn’t just support the Infinity Challenge. From day one, the Xbox 360 game has been using the Hadoop open source framework to gain deep insights into players. The Halo 4 development team at 343 Industries is taking these insights and updating the game almost weekly, using direct player feedback to tweak the game. In the process, the game’s multiplayer ecosystem continues to evolve with the community as the title matures in the marketplace.
Tapping into the Power of the Cloud
Using the latest technology has always been important to the Halo 4 development team. Since the award-winning game launched in November 2012, the team has used the Windows Azure cloud development platform to power the game’s back-end supporting services. These services run the game’s key multiplayer features, including leaderboards and avatar rendering. Hosting the multiplayer parts of the game in Windows Azure also gives the Halo 4 team a way to quickly and inexpensively increase or decrease server loads as needed.
As Microsoft prepared to officially release the game, 343 Industries wanted to find a solution to mine user data with the hope of gaining insight into player behavior and gauging the overall health of the game after its release. Additionally, the Halo 4 development team was tasked with feeding daily data about the five-week online Infinity Challenge tournament to Virgin Gaming, a Halo 4 partner.
To meet these business requirements, the Halo 4 team knew it needed to find business intelligence (BI) technology that would work well with Azure. “One of the great things about the Halo team is how they use cutting-edge technology like Azure,” says Alex Gregorio, a program manager for Microsoft Studios, which developed Halo 4. “So we wanted to find the best BI environment out there, and we needed to make sure it integrated with Azure.”
Because all game data is housed in Azure, the team wanted to find a BI solution that could effectively produce BI information from that data. The team also needed to process this data in the same data center, minimizing storage costs and avoiding charges for data transfers across two data centers. The team also wanted full control over job priorities, so that the performance and delivery of analytical queries would not be affected by other processing jobs run at the same time. “We had to have a flexible solution that was not on-premises,” states Gregorio.
Microsoft HDInsight: Big Data Analytics in Azure
Although it considered building its own custom BI solution, the Halo 4 team decided to use the Windows Azure HDInsight Service, which is based on Apache Hadoop, an open-source software framework created by Yahoo! Hadoop can analyze huge amounts of unstructured data in a distributed manner. Designed for large groups of machines that do not share memory, Hadoop can operate on commodity servers and is ideal for running complex analytics.
HDInsight empowers users to gain new insights from unstructured data, while connecting that data to familiar BI tools. “Even though we knew we would be one of the earliest customers of HDInsight, it met all our requirements,” says Tamir Melamed, a development manager on the Halo 4 team. “It can run any possible queries, and it is the best format for integration with Azure. And because we owned the services that produce the data and the BI system, we knew we would be using resources in the best, most cost-effective way.”
The Halo 4 team wrote Azure-based services that convert raw game data collected in Azure into the Avro format, which is supported by Hadoop. This data is then pushed from the Azure services in the Avro format into Windows Azure binary large object (BLOB) storage, which HDInsight is able to utilize with the ASV protocol. The data can then be accessed by anyone with the right permissions from Windows Azure.
Every day, Hadoop handles millions of data-rich objects related to Halo 4, including preferred game modes, game length, and many other items. With Microsoft SQL Server PowerPivot for SharePoint as a front-end presentation layer, Azure BLOBs are created based on queries from the Halo 4 team.
PowerPivot for Excel loads data from HDInsight using the Hive ODBC driver software library for the Hive data warehouse framework in Hadoop. A PowerPivot workbook is then uploaded to PowerPivot for SharePoint and refreshed nightly within SharePoint, using the connection string stored in the workbook via the Hive ODBC driver to HDInsight. The Halo 4 team uses the workbooks to generate reports and facilitate their viewing of interactive data dashboards.
Using the Flexibility and Agility of Hadoop on Azure
For the Halo 4 team, a key benefit of using HDInsight was its flexibility, which allowed for separating the amount of the raw data from the processing size needed to consume that data. “With previous systems, we never had the separation between production and raw data, so there was always the question of how running analytics would affect production,” says Mark Vayman, lead program manager for the Halo services group. “Hadoop running on Azure BLOBs solved that problem.”
With Hadoop, the team was able to build a configuration system that can be used to turn various Azure data feeds on or off as needed. “That really helps us get optimal performance, and it’s a big advantage because we can use the same Azure data source to run compute for HDInsight on multiple clusters,” says Vayman. “It made it easy for us to drive business requests for analysis through an ad-hoc Hadoop cluster without affecting the jobs being run. So developers outside the immediate BI team can actually go in and run their own queries without being hindered by the development load our team has. Ultimately, the unique way in which Hadoop is implemented on Azure gives us these capabilities.”
Halo 4 developers have also benefited from the agility of Hadoop on Azure. “If we get a business request for analytics on Azure data, it’s very easy for us find a specific data point in Azure and get analytics on that data with HDInsight,” says Melamed. “We can easily launch a new Hadoop cluster in minutes, run a query, and get back to the business in a few hours or less. Azure is very agile by nature, and Hadoop on Azure is more powerful as a result.”
Shifting the Focus from Storage to Analysis
HDInsight was also instrumental in changing the Halo 4 team’s focus from data storage to useful data analysis. That’s because Hadoop applies structure to data when it’s consumed, as opposed to traditional data warehouse applications that structure data before it’s placed into a BI system. “In Windows Azure, Hadoop is essentially a place where all the raw data can be dumped,” says Brad Sarsfield, a Microsoft SQL Server developer. “Then we can decide to apply structure to that data at the point where it’s consumed.”
Once the Halo 4 team became aware of this capability, it shifted its mindset. “That realization had a subtle but profound effect on the team,” Sarsfield says. “At a certain point, they flipped from worrying about how to store and structure the data to concentrating on the types of questions they could ask from the data—for example, what game modes users were playing in, or how many players were playing at a given time. The team saw that it could much more readily respond to the initial requests for business insight about the game itself.”
Gaining New Insights from the Halo 4 “Infinity Challenge”
With an ability to focus more tightly on analysis, the Halo 4 team turned its attention to the Infinity Challenge. “Using Microsoft HDInsight, we were able to analyze the data during the five weeks of the Infinity Challenge,” says Vayman. “With the fast performance we got from the solution, we could feed that data to Virgin Gaming so they could update the leaderboards on the tournament website every day.”
In addition, because of the way the team set up Hadoop to work within Azure, the Halo team was able to perform analysis during the Infinity Challenge to detect cheaters and other abnormal player behavior. “HDInsight gives us the ability to easily read the data,” says Vayman. “In this case, there are many ways in which players try to gain extra points in games, and we were able to look back at previous data stored in Azure and identify user patterns that fit certain cheating characteristics, which was unexpected.”
After receiving this data from the Halo 4 team, Virgin Gaming sent out a notification that any player found or suspected of cheating would be immediately removed from the leaderboards and the tournament in general. “That was a great example of Hadoop on Azure giving us powerful analytical capabilities,” says Vayman.
Making Weekly Updates Based on User Trends
HDInsight gives the Halo 4 team daily updated BI data pulled from the game, which provides visibility into user trends. For example, the team can view how many users play every day, as well as the average length of a game and the specific game features that players use the most. Vayman says, “Having this kind of insight helps us gauge the overall health of the game and allows us to correlate the game’s sales numbers with the number of people that actually end up playing.”
Getting insights from Hadoop, in addition to Halo 4 user forums, also helps the Halo 4 team make frequent updates to the game. “Based on the user preference data we’re getting from Hadoop, we’re able to update game maps and game modes on a week-to-week basis,” says Vayman. “And the suggestions we get in the forums often find their way into the next week’s update. We can actually use this feedback to make changes and see if we attract new players. Hadoop and the forums are great tuning mechanisms for us.”
The team is also taking user feedback and giving it to the game’s designers, who can take it into consideration when thinking about creating future editions of Halo.
Targeting Players Through Email
The flexibility of the HDInsight BI solution also gives the Halo 4 team a way to reach out to players through customized campaigns, such as the series of email blasts the team sent to gamers in the initial weeks after the launch. During that campaign, the team set up Hadoop queries to identify users who started playing on a certain date. The team then wrote a file and placed it into a storage account on Windows Azure, where it was sent through SQL Server 2008 R2 Integration Services into a database owned by the Xbox marketing team.
The marketing team then used this data to send new players two emails: a generic “Welcome to Halo 4” email the day after a player began playing, and another custom email seven days later. This second email was actually one of five different emails, tailored to each user. Based on player preferences demonstrated during the week of play, this email suggested different game modes to players. The choice of which email each player received was determined by the HDInsight system. “That gave marketing a new way to possibly retain users and keep them interested in trying new aspects of the game,” Gregorio says. The Halo 4 marketing team plans to run similar email campaigns for the game until a new edition is released. “Basing an email campaign on HDInsight and Hadoop was a big win for the marketing team, and also for us,” adds Vayman. “It showed us that we were able to use data from HDInsight to customize emails, and to actually use BI to improve the player experience and affect game sales.”
Expanding the Use of Hadoop
Based on the success of HDInsight as a powerful BI tool, Microsoft has started to expand the solution to other internal groups. One group, Microsoft IT, is using HDInsight to improve its customer-facing website. “Microsoft IT is using some of the internal Azure service logs in Hadoop to mine data for use in identifying error patterns, in addition to creating reports on the site’s availability,” says Vayman. Another internal team that processes very large data volumes is also using Hadoop on Azure for analytics. “Halo 4 really helped lead the way for both projects,” Vayman says.
One reason Hadoop is becoming more widely used is that the technology continues to evolve into an increasingly powerful BI tool. “The traditional role of BI within Hadoop is expanding because of the raw capabilities of the platform,” says Sarsfield. “In addition to just BI reporting, we’ve been able to add predictive analytics, semantic indexing, and pattern classification, which can all be leveraged by the teams using Hadoop.”
Adoption is also growing because users do not have to be Hadoop experts to take advantage of the technology’s data insights. “By hooking Hadoop into a set of tools that are already familiar, such as Microsoft Excel or Microsoft SharePoint, people can take advantage of the power of Hadoop without needing to know the technical ins and outs. It’s really geared to the masses,” says Vayman. “A good example of that is the data about Infinity Challenge cheaters that we gave to Virgin Gaming. The people receiving that data are not Hadoop experts, but they can still easily use the data to make business decisions.”
No matter what new capabilities are added to it, there’s no doubt that HDInsight will continue to affect business. “With Hadoop on Windows Azure, we can mine data and understand our audience in a way we never could before,” says Vayman. “It’s really the BI solution for the future.”