Want to know how Microsoft does IT? IT Showcase is a preferred source of information technology expertise, straight from the top subject matter experts at Microsoft.
>
Migrating a critical high-performance platform to Azure with zero downtime
Migrating a critical high-performance platform to Azure with zero downtime
Migrating a critical high-performance platform to Azure with zero downtime
TechnicalCaseStudyIcon-Img  Technical Case Study
Tag Icon   Cloud and Enterprise
Published:
Mar 30, 2017
Star1 Star2 Star3 Star4 Star5
Enter below text to save article rating

Microsoft IT manages a high-performance platform for managing customer data—including privacy preferences for services like Windows Live, Xbox, and Office. We knew it would soon be time to retire the aging on-premises hardware infrastructure and migrate the platform to a high-performance environment in Azure. We evaluated the infrastructure and devised a plan—and completed the migration to a scalable, cost-effective cloud solution that exceeded our performance requirements with zero downtime for our services.

Technical Case Study Blank Img
 
Print
Powered by Microsoft Translator BingLogo_Img

Migrating a critical high-performance platform to Azure with zero downtime

At Microsoft IT, our strategy is to move workloads from on-premises datacenters to Azure. So, when the Microsoft Honoring Customer Permissions (HCP) platform was running on end-of-life hardware, our direction was clear—migrate to Azure. HCP is a critical Microsoft business platform that our customers interface with, so we created and executed a migration strategy that brought HCP to Azure—with zero downtime. Now we have a new Azure-based platform that’s more scalable, more resilient, and more cost-effective.

The Honoring Customer Permissions platform

We created the HCP platform to store and manage customers’ privacy data, including their contact information and consent to receive communication from Microsoft. HCP represents a significant part of the Microsoft effort to protect our customers’ information and to help ensure that their data privacy choices are consistent across all Microsoft services. More than 50 services use HCP, including key services like Windows Live, Xbox, Office, and OneDrive:

  • 185 organizations
  • 4 billion customer identities
  • 5 billion emails

Identifying HCP infrastructure issues

Several datacenters, containing 175 physical servers, ran the three application tiers that provided overall HCP functionality: data storage, middle, and web. Figure 1 shows the hardware allocated to each tier in our on-premises solution.

The graphic depicts a list of components under the heading of On-premises. THere are four tiers, from top to bottom: The Web tier, which contains 6 web servers with 8 procs and 16GB of RAM. The Middle tier, which contains 21 pipeline servers, 2 files servers with 16 procs and 28GB of RAM. The Data storage tier containing two sections: A Persona section, with 5 clusters, 20 SQL servers and 160 partitions with 16 procs and 48GB of RAM, and the Email   SMS section, with 2 clusters, 16 SQL Servers and 80 physical partitions. Finally, the operational data store tier, showing the Email and Persona data store and the Disaster recovery components stored in the BAY datacenter.
Figure 1. HCP on-premises infrastructure

The web tier provided the essential customer interfaces and functionality through several web servers running Windows Server 2012 R2 and Internet Information Services (IIS). The middle tier contained 21 pipeline servers that handled data exchange and two file servers for file storage. The data storage layer was based on SQL Server 2012 R2 running in Windows Server 2012 R2 failover clusters. We maintained both primary and secondary copies of the data storage layer in our primary on-premises datacenter—and for disaster recovery purposes, we also maintained another secondary copy in an offsite datacenter.

The on-premises infrastructure served its purpose, but we ran into several issues that required us to move HCP to a new infrastructure:

  • The server hardware that supported HCP was nearing end-of-life and required replacement.
  • One on-premises datacenter that hosted HCP hardware was slated for shutdown.
  • The overall goal of moving Microsoft IT infrastructure to the cloud meant that an on-premises solution like HCP would eventually be cloud-based.

Migrating HCP to Azure

We knew that the datacenter end-of-life scenario meant that we needed to move HCP onto a new infrastructure. And the new infrastructure had to continually protect customer data. Following our organizational mission to move infrastructure to the cloud, our teams initiated a migration planning process to move HCP from our on-premises datacenters into Azure.

Planning zero downtime migration

HCP is a widely-used platform at Microsoft. One of our first goals for the migration process was to have zero downtime for the applications that used HCP and, by association, for any of our customers who were indirectly using HCP. Our high-level goals for the migration process were:

  • Zero-downtime for HCP services. We wanted to prevent any essential HCP components from being unavailable at any time during the migration.
  • No privacy-related issues throughout the migration. HCP handles private customer data. Microsoft customer privacy is a top priority.
  • No impact to the customer. The migration had to be performed in a way that didn’t impact Microsoft customers—this meant no downtime, and a seamless migration on the front end.
  • Maintain a rollback strategy for each component of the platform. As part of the zero-downtime strategy, we wanted to ensure that we could roll back changes in any phase of the migration in case of an error or issue with the migration process.
  • Maintain performance on-par with the current platform. We wanted to ensure that the user experience for HCP in Azure was as good or better than the on-premises datacenters. We maintained specific service level agreement (SLA) response times for HCP transactions throughout the migration and after the migration was complete.

The migration process

To meet all our SLAs with zero downtime, we migrated the entire platform to Azure on infrastructure as a service (IaaS) virtual machines and virtual networks. We migrated the on-premises servers to Azure using a process we call lift and shift. With lift and shift, the data and applications on existing servers were moved to Azure IaaS virtual machines and storage, which provide the same functionality as the on-premises server. Operating system, applications, and disk contents were all preserved.

Planning Azure infrastructure

Using lift and shift made infrastructure planning for Azure relatively simple. We created an Azure IaaS virtual machine for each corresponding on-premises server platform tier. Table 1 shows the breakdown of Azure IaaS virtual machines.

Table 1. Azure IaaS virtual machine resources

Application layer

Qty

Server role

Virtual machine size

Datacenter

Web

6

Web server

D11 (4 cores, 14GB RAM, Standard storage)

Central US

Middle

21

Pipeline server

D12 (8 cores, 28GB RAM, Standard storage)

Central US

Middle

2

File server

D12 (8 cores, 28GB RAM, Standard storage)

Central US

Data storage

42

SQL Server

DS13 (8 cores, 56GB RAM, Premium storage)

Central US (primary)
South Central US (secondary)

When planning the migration, we used current system performance to determine computing and storage needs in Azure. To help ensure stability and maintain the ability to roll back as needed, the application architecture and topology wasn’t changed during the migration. The Azure virtual machines were chosen based on current performance and utilization of on-premises workloads.

Gaining efficiencies in Azure

We ended up with a less than 1:1 ratio for on-premises servers and Azure virtual machines—we gained this because we realized efficiencies in the data storage layer. In our on-premises environment, we maintained three copies of the data storage layer—two at the local datacenter and one in an off-site datacenter. Azure has built-in resiliency for storage and virtual machines, so we eliminated the secondary copy of the data storage layer in our Azure implementation. We keep one secondary copy in another Azure datacenter, which satisfies our failover and disaster recovery requirements. The standard functionality of Azure allowed us to remove 20 virtual machines from the Azure implementation, when compared to the on-premises version.

The graphic depicts two lists of components. On the left, under the heading of On-premises, are four tiers, from top to bottom: The Web tier, which contains 6 web servers with 8 procs and 16GB of RAM. The Middle tier, which contains 21 pipeline servers, 2 files servers with 16 procs and 28GB of RAM. The Data storage tier containing two sections: A Persona section, with 5 clusters, 20 SQL servers and 160 partitions with 16 procs and 48GB of RAM, and the Email   SMS section, with 2 clusters, 16 SQL Servers and 80 physical partitions. Finally, the operational data store tier, showing the Email and Persona data store and the Disaster recovery components stored in the BAY datacenter. On the right, under the heading of IaaS (Central US), are four tiers, from top to bottom: The Web tier, which contains 6 web servers as D11 VMs (4 procs and 14GB of RAM) Middle tier, which contains 21 pipeline servers, 2 file servers as D12 VMs with 8 procs and 28GB of RAM. The Data storage tier containing two sections: A Persona section, with 5 clusters, 20 SQL servers and 160 partitions as DS13 VMs with 8 procs and 56GB of RAM and the Email   SMS section, with 2 clusters, 16 SQL Servers and 80 physical partitionsDS13 VMs with 8 procs and 56GB of RAM. Finally, the operational data store tier, showing the Email and Persona data store and the Disaster recovery components stored in the South Central Azure region.
Figure 2. Comparison of on-premises and Azure architecture

Migration with zero downtime using a phased approach

An important first step was to shift our entire HCP infrastructure into Azure and keep the on-premises version running. But we also needed to adopt a very specific migration strategy to move our production environment over to Azure. This process involved the functional transfer of each tier of the application to the new Azure infrastructure, in the following order:

  1. Data storage
  2. Middle tier
  3. Web tier

The migration of each tier, at a high level, consisted of creating the Azure virtual machine infrastructure and redirecting the platform to point to the new production instance in Azure. The data storage layer proved to be the most challenging aspect of the migration and required a five-phase approach, as depicted in Figure 3.

The graphic depicts a 5-step process for migrating the data layer of HCP. Step 1 shows the on-premises Primary database replicating to the Secondary database and the DR secondary data base. Step 2 shows the on-premises Primary database replicating to two copies of the secondary database in Azure IaaS. Step 3 shows the transferring of the primary database from on-premises to Azure IaaS. Step 4 shows the Azure IaaS primary database maintaining two secondary copies in on-premises and one secondary copy in Azure IaaS. Step 5 shows the removal of on-premises components, leaving only the IaaS primary database which replicates with a DR secondary database in the IaaS DR region.
Figure 3. The data layer migration process for HCP from on-premises to Azure

The data layer migration process included the following:

  1. Prepare the HCP databases for replication.
  2. Create two copies of the primary on-premises data store in Azure.
  3. Switch the primary copy of the data store to one of the secondary copies hosted on Azure.
  4. Switch to Azure for production data store layer functionality, keeping the on-premises server online as disaster recovery secondary copies.
  5. Replicate the primary data store to a secondary copy hosted in the South Central US datacenter. Eliminate on premises secondary copies, leaving Azure as the only infrastructure. Only one copy of the database exists as a secondary, which is stored in another Azure region and acts as an active instance for disaster recovery.

Migration results and benefits

The migration process, simplified here, took place over six months with zero downtime—while accomplishing all our migration goals. Here are details about our results:

  • We had zero downtime for all 54 onboarded applications—and our customers—during and after the migration, involving collaboration with more than 20 different teams.
  • We migrated more than 120 TB of data, reducing our server footprint from 175 to 110.
  • We realized a cost avoidance of approximately 30 percent compared to on-premises changes.
  • We have a more robust disaster recovery solution. Our data storage layer now operates in an active/active model, which makes disaster recovery failover a more efficient process.
  • We reduced the infrastructure footprint of our data storage layer because of built-in Azure resiliency and reliability. This resulted in a 20 percent savings in storage operational costs.
  • We have natively supported geo-replication at 99.9 percent availability with zero maintenance and configuration.
  • We increased performance across several SLA metrics.
  • We have a natively scalable and elastic infrastructure. We can scale up virtual machine instance size to accommodate increased demand. And to save operational costs, virtual machine instances can be snoozed when they’re not required.

By migrating HCP to Azure IaaS, we quickly and quietly migrated and positioned HCP in an environment where it consistently and cost effectively protects our customers’ data well into the future.

For more information

Microsoft IT

microsoft.com/ITShowcase

 

© 2017 Microsoft Corporation. This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. The names of actual companies and products mentioned herein may be the trademarks of their respective owners.

X
Share Widget Slider share icon
Share

Share on Facebook

New to Facebook?
Get instant updates from your friends, industry experts, favorite celebrities, and what's happening around the world.

Share a link with your followers

twitter login icon
loader
Tweets
New to Twitter?
Get instant updates from your friends, industry experts, favorite celebrities, and what's happening around the world.

Share on LinkedIn

New to LinkedIn?
Get instant updates from your friends, industry experts, favorite celebrities, and what's happening around the world.
shareicon
Share
Feedback
icon_close
How Microsoft does IT
Overall, how satisfied are you with our site?
Additional Comments:
Please enter your feedback comment of minimum 30 characters