4-page Case Study
Posted: 6/5/2012
569
Rate This Evidence:

Microsoft Corporation Microsoft Moves Websites to Windows Azure, Maximizes Resource Utilization, Reduces Costs

Microsoft wanted to migrate its Microsoft TechNet and Microsoft Developer Network websites to a cloud environment without necessitating any code or architectural changes. It wanted the solution to provide performance that was equivalent to or better than its on-premises solution and that would allow ease of operations. The decision, implemented by the Enabling Platform Experience (EPX) group, was to move the websites to the Windows Azure cloud services environment. In 2011, the EPX group began the migration of Microsoft TechNet. Microsoft now enjoys a solution for the website that scales dynamically to meet demand, provides fast performance, and minimizes on-premises infrastructure and costs. By using Windows Azure, the EPX group will be able to reduce its forecasted server acquisitions by 20 percent.

Situation
In June 2011, the Enabling Platform Experience (EPX) group, part of the Microsoft Developer Division, began a journey that would see two of its largest developer and IT professional websites migrated from an entirely on-premises infrastructure to take advantage of the reliability, scalability, and availability of the cloud environment.

The team working on the migration had several requirements that the migration process had to meet:

  • No code or architecture changes. The migration should be accomplished with minimal changes to configuration and with no changes at all to the architecture or code of the applications. By eliminating the need to re-architect the application to run in the cloud, this approach would enable a much faster and less expensive migration.

  • Equivalent or better performance. Performance of the migrated application must be equivalent to or better than the on-premises solution. In particular, it must be able to scale dynamically to meet demand, while minimizing running costs.

  • Ease of operation. The migrated application must be easy to operate, monitor, manage, and maintain.

  • Reduced on-premises requirements. The result must provide opportunities to minimize on-premises infrastructure requirements and costs.

*
* Since migrating Microsoft TechNet to Windows Azure, we have achieved more scalability, maximized resources, and maintained high performance *

Purush Vankireddy
Director, Service Engineering, Microsoft

*
The EPX group at Microsoft is responsible for managing a number of Microsoft Developer online and offline experiences, including the Microsoft Developer Network (MSDN) and Microsoft TechNet. Usage of both of these sites is highly variable, with significant spikes when new products launch or during training or conferences. “To support this kind of variability along with data center redundancy, we had to provision three times the capacity required to support peak volumes, resulting in an aggregate server utilization of 20 percent over the course of a year,” says Jay Jensen, Sr. Systems Engineer, EPX group at Microsoft.

Solution
Microsoft decided that the first phase of the migration would be to move Microsoft TechNet to Windows Azure, the Microsoft cloud services development, hosting, and management environment. Windows Azure provides on-demand compute, storage, networking, and content delivery capabilities through Microsoft data centers around the world. The MSDN and TechNet websites have a similar architecture and are hosted on the same hardware. Because TechNet receives less traffic overall than MSDN, the decision was made to perform the migration for TechNet first, and then apply the lessons learned to the MSDN website. The midterm target for the EPX group is to move all applicable websites and applications to Windows Azure by the end of 2014.

Architecture of TechNet Before Migration
TechNet is designed for high web traffic. It experiences a high number of reads with no caching on the front-end web servers. TechNet has an average traffic level of more than 30 million unique users a month, in addition to other requests such as indexing by search engines.

Diagram of TechNet on-premises architecture
Figure 1. TechNet on-premises architecture
TechNet has a typical two-tier architecture, with web servers already in a virtualized environment. Visitors to TechNet access web front ends that, in turn, read content from a farm of database servers. This database layer is primarily hosted on high-end, four-terabyte servers with the content replicated in four different data centers. (Figure 1 illustrates the original on-premises architecture of TechNet.)

Content for the site is pushed to the databases from a content publishing system. A complicating factor in the migration was that TechNet was in the midst of a transition from one content management system to another, which meant that two code bases were sourcing the content for end users.

Migration Process
Microsoft engaged Accenture, a member of the Microsoft Partner Network, for the initial evaluation and feasibility assessment utilizing its Azure Migration Accelerator assets and the Premier Field Engineering (PFE) teams, who have vast experience in debugging and troubleshooting performance issues. Then, in June 2011, a team of just three service engineers embarked on the migration within an aggressive time frame to perform the migration.

The final design for the overall migration of all sites is a hybrid application where a portion of the application runs on Windows Azure, while the data layer, monitoring, and management functions, and the content publishing system remain on-premises. The result is a future-state architecture to which the team can migrate applications over time through gradual re-architecting.

To move the TechNet infrastructure to the cloud environment, the team made design decisions at each layer of the infrastructure. For traffic routing, the team utilized TechNet’s current global load balancing capabilities using Akamai networking services to direct traffic to Windows Azure. This enabled the team to pilot the approach with live traffic and divert incremental amounts of traffic from on-premises data centers.

Diagram of current TechNet hybrid application architecture
Figure 2. Current TechNet hybrid application architecture
For the web front ends, the team used the Windows Azure Virtual Machine (VM) role, which enabled them to use an existing VM image to seed the cloud migration. This reduced the probability that engineering changes would be required and provided the team with more control over the migration and configuration. Because of the two code bases, the team packaged each one into its own VM role and used a custom content-switching solution to drive traffic to the appropriate role.

To achieve elasticity and the consequent minimization of runtime costs, the team chose the Enterprise Library Autoscaling Application Block from the Microsoft patterns & practices group. This enabled the application capacity to automatically mirror demand by starting and stopping role instances in response to a range of factors, such as server load and resource usage. Incorporating the Autoscaling Application Block and configuring the autoscaling rules took the team just a few hours. "The Autoscaling Application Block allows MSDN and TechNet to automatically handle changes in the load levels over time. It helps minimize operational costs, while still providing excellent performance and availability to our users," says Dr. Grigori Melnik, Sr. Program Manager, Microsoft patterns & practices group.

*
* Initial migration of MSDN/TechNet not only helped us to improve the scale and reliability, we learned many things that will help our business to leverage cloud infrastructure in the near future. *

Lori Brownell
General Manager,
Microsoft

*
Databases were another key consideration during the migration. Because the TechNet content database is almost four terabytes, SQL Azure was not a viable option in the short term. Instead, the team created a hybrid cloud solution in which the web front end resides on Windows Azure and the data tier remains on-premises.

For content switching, the on-premises platform used a hardware-based solution. For the migrated solution, the team created an Application Request Routing (ARR) role to achieve the same results. (Figure 2 illustrates the hybrid application architecture at the time this case study was written.)

To achieve network connectivity and authentication/authorization between the on-premises databases and cloud-hosted web front ends, the team chose to adopt a solution provided by Microsoft Global Foundation Services (GFS). GFS is the engine that powers the infrastructure and many services for Microsoft global data centers. The GFS Services for Azure Applications (GSAA) team provides a solution that uses a Windows Azure plug-in or a base VM role image to manage traffic heuristics. It provides a framework that meets all of the medium business impact (MBI) requirements for Windows Azure data and GFS, which allows data to reside in and pass through the cloud environment.

GSAA also provides a new domain named azr.gbl (the Windows Azure domain) that has trust relationships with the Microsoft internal on-premises domain that hosts many online services. This enables the use of integrated authentication. With all of these controls in place, the GFS network team allows connectivity among Windows Azure hosted services and between internal systems and hosted services. The GSAA solution enables Windows Azure hosted services to interact seamlessly with Microsoft internal hosted services over GFS’s world-class network, without making intrusive security or network changes.

Four graphs showing the comparison of TechNet on-premises and Windows Azure–hosted performance
Figure 3. Comparison of TechNet on-premises and
Windows Azure–hosted performance
Performance Comparisons
A major goal for the move to Windows Azure was to maintain or improve performance for the migrated applications. The EPX Performance and Reliability team compared the performance of the original on-premises sites and the TechNet sites hosted by Windows Azure.

The charts in Figure 3 illustrate the page load time for the initial user experience (PLT1) for pages served from four regional data centers. Page load times are measured above fold time (AFT). Fold time is the point at which the browser clears the current page and starts to load the new page. Measuring performance from this point ensures that results are not skewed by factors such as DNS resolution time and proxy server negotiation.

For both the first-time user experience and the returning user experience, the difference in page load times was less than 200 milliseconds for the majority of pages, which is within the margin of acceptable performance. The difference is mainly due to latency of the Content Delivery Network (CDN) and advertisement delivery.

A few pages exhibited differences of up to 400 milliseconds in some regional data centers; the team is investigating individual performance improvements for these cases. However, extensive performance and reliability testing has shown that the overall performance after migration to Windows Azure is equivalent to the on-premises applications, and in some cases better for certain pages.

Proposed Near-Term Implementation
The EPX team is now pursuing initiatives aimed at providing a future architecture for TechNet and other websites and applications hosted on Windows Azure. The proposed architectural changes include:

  • Migration of the web front ends from VM roles to web roles in order to reduce support requirements, remove the need to manage the operating system, and to simplify deployment.

  • Migration of the databases to Windows Azure using the new Infrastructure as a Service (IaaS) capabilities.

  • Use of an on-premises virtual private cloud implemented with the Windows Server 8 operating system to allow content to be published from on-premises servers to Windows Azure.

(Figure 4 illustrates the proposed future architecture.)

Lessons Learned
The experience gained from the initial TechNet migration will be used as a template for the migration of other EPX sites. The initial migration of the TechNet website to Windows Azure provided the EPX group and other teams at Microsoft with many useful pointers for the future. For example, the assessment showed that numerous operational processes would need to change as EPX transformed to support cloud-based solutions. These processes include:

  • Logging and monitoring. Windows Azure Diagnostics transfers Windows logs and other trace information to Blob Storage as scheduled jobs. Data must be downloaded from Blob Storage, and the team chose Microsoft System Center Operations Manager and Virtual IP (VIP) monitoring tool (an internal HTTP monitoring solution) for this task. Local Instance health checks are also performed, while third-party providers monitor application pages (Keynote) and perform network traffic management.

  • Business continuity and disaster recovery (BCDR).Existing traffic management capabilities plus local instance health checks of pages enable a failover to or from Windows Azure at the cluster level. The health check pages incorporate functionality to test for issues such as loss of data layer connectivity.

  • Backup and restore. Existing systems manage backup and restore for on-premises data. Specific backup and restore facilities are not required in the cloud-hosted portion of the application because Windows Azure automatically replicates data, such as the log information persisted in Blob Storage.

  • Operating system updates. A service engineer connects to a “golden master” VM and applies operating system and security updates using msnpatch.exe, and then uses an automated deployment process to publish the VMs in Windows Azure.

  • Deployment. Operating system and Internet Information Services (IIS) updates are applied to a differencing disk. This is deployed to Windows Azure staging and a VIP Swap occurs to move it into live production. When minor changes are required, additional scripts can be used to push content deployments independently to each running Windows Azure role instance.

Guidelines for the Future
The ongoing migration has also revealed some useful general guidance for future migrations, which will benefit all designers and developers considering migration of their applications and websites to Windows Azure.

  • No need to reinvent the wheel: explore and apply known good practices.

  • Consider application and data security. Remember that Windows Azure is a public space.

  • Understand the capabilities and limitations of Windows Azure, outsourcing migration if required. Use the resources available on the Windows Azure portal, its forums and user groups.

  • Use available tools to evaluate code against known Platform as a Service (PaaS) migration challenges.

  • Understand the application and its potential risk areas. These may include server-specific configurations, special networking requirements, such as content switching or affinity, support for multiple sites, and connectivity to supporting systems or business layers.

  • Gain operational flexibility by allowing configuration and content to be modified independently from the package or VM that is deployed.

  • Take full advantage of Windows Azure services, such as Windows Azure Service Bus and SQL Azure Data Sync, and tools or frameworks, such as the Microsoft patterns & practices Enterprise Library Extensions for Windows Azure.

With any migration project, issues may occur that will only be discovered when something doesn’t work quite as anticipated. Some issues that the team came across were:

  • Note the time used. Windows Azure is always in Coordinated Universal Time (UTC), while on-premises services are likely to use local time.

  • Consider whether it is necessary to change the page size for web and worker roles based on the size of the role instance and application.

  • Always use the latest software development kit (SDK) version when developing applications and consult the Known Issues pages when upgrading the SDK version.

Diagram of proposed future TechNet hybrid application architecture
Figure 4. Proposed future TechNet hybrid application
architecture
Currently, the EPX development team has migrated 40 percent of TechNet and MSDN traffic to Windows Azure, utilizing the design configuration described above. “By approaching the migration as an infrastructure migration, with no core application code or architecture changes, we reduced the effort and testing required and completed the initial migration within three months of completing the feasibility assessment,” says Jay Arvin, Systems Engineer, EPX group at Microsoft. Since then, TechNet has been running for more than 60 days on Windows Azure and three more sites are lined up for migration. The migration of TechNet traffic will enable the EPX group to reduce its forecasted server acquisitions by 20 percent.

Benefits
By proceeding with the migration of two of its largest websites to Windows Azure, Microsoft has achieved a number of significant benefits. In terms of reliability, scalability, availability, and minimizing costs, the migration proves that Windows Azure works—and works exceedingly well.

“Since migrating Microsoft TechNet to Windows Azure, we have achieved more scalability, maximized resources, and maintained high performance,” says Purush Vankireddy, SE Director, EPX group at Microsoft.

Maximized Resources and Improved Scalability
The requirement to meet highly variable traffic patterns meant that the overall server utilization used to fall to 20 percent.. However, the over-provisioning was necessary to meet demand during busy periods. With the cloud-hosted solution, the EPX group now has elasticity through the easy addition and removal of servers to meet demand.

*
* To support this kind of variability along with data center redundancy, we had to provision three times the capacity required to support peak volumes, resulting in an aggregate server utilization of 20 percent over the course of a year. *

Jay Jensen
Sr. Systems Engineer, Microsoft

*
“With the migration of TechNet and MSDN to Windows Azure, we are able to bring down server acquisitions by 20 percent,” says Saravanan Vinayagam, SE Manager, EPX group at Microsoft.

Maintained High Performance
The benefits have been achieved without adverse effects on performance and service availability to website visitors. Early reviews of performance at the client for the solution hosted on Windows Azure, compared to the original on-premises deployment, show that the two are statistically equivalent for all pages when using local resources. In other words, the Windows Azure solution may be slightly faster or only slightly slower than the on-premises solution, depending upon the normal variations in Internet traffic.

“We are very pleased with the comparison numbers between the performance of the on-premises solution and the hosted by hybrid Windows Azure solution,” says Roopa Venkatasubramanya, Performance Lead at Microsoft.

Reduced Infrastructure and Maintenance Costs
Every organization is looking for ways to reduce energy usage and cost and to minimize investment in infrastructure. The use of hosted virtual servers can minimize initial and ongoing hardware, infrastructure, and maintenance costs and achieve significant savings in day-to-day running costs.

Ultimately, through the full migration of all on-premises MSDN and TechNet web front ends to Windows Azure, estimates suggest that Microsoft could save between 18 percent and 25 percent on hosting costs. The benefits become even more compelling when considering the reduction in cost and management associated with spikes in capacity. Dynamic scaling capabilities and simple configuration changes can change the number of role instances deployed in Windows Azure.

Reduction in costs has another important aspect. A significant environmental (and often regulatory) focus for all companies today is to minimize their carbon footprint. Microsoft has found that hosted solutions that support dynamic resource scaling help achieve a significant reduction in energy usage and help it to meet emissions targets.

Utilized Migration for Learning
Microsoft will move many of its websites and applications to Windows Azure over time. The knowledge and experience gained during the initial phase of migrating Microsoft TechNet will be invaluable for the migration of other applications.

“Initial migration of MSDN/TechNet not only helped us to improve the scale and reliability, we learned many things that will help our business leverage cloud infrastructure in the near future,” says Lori Brownell, General Manager, EPX group at Microsoft. “We recommend using all the technical resources that you have available, including using tools to evaluate code and taking advantage of Windows Azure services,” she says.

Windows Azure
Windows Azure provides developers the functionality to build applications that span from consumer to enterprise scenarios. The key components of the Windows Azure are:

Windows Azure. Windows Azure is the development, service hosting, and service management environment for the Windows Azure. It provides developers with on-demand compute, storage, bandwidth, content delivery, middleware, and marketplace capabilities to build, host, and scale web applications through Microsoft data centers.

SQL Azure. SQL Azure is a self-managed, multitenant relational cloud database service built on Microsoft SQL Server technologies. It provides built-in high availability, fault tolerance, and scale-out database capabilities, as well as cloud-based data synchronization and reporting, to build custom enterprise and web applications and extend the reach of data assets.

To learn more, visit:
www.windowsazure.com
www.sqlazure.com

For More Information
For more information about Microsoft products and services, call the Microsoft Sales Information Center at (800) 426-9400. In Canada, call the Microsoft Canada Information Centre at (877) 568-2495. Customers in the United States and Canada who are deaf or hard-of-hearing can reach Microsoft text telephone (TTY/TDD) services at (800) 892-5234. Outside the 50 United States and Canada, please contact your local Microsoft subsidiary. To access information using the World Wide Web, go to:
www.microsoft.com

For more information about Accenture products and services, visit the website at:
www.accenture.com

Solution Overview



Organization Size: 99000 employees

Organization Profile

Based in Redmond, Washington, Microsoft is a worldwide leader in software, services, and Internet technologies. It employs roughly 90,000 people and operates 112 country subsidiaries.


Business Situation

The Enabling Platform Experience group at Microsoft wanted to start migrating its larger websites to take advantage of the benefits of a cloud services environment, such as reliability and scalability.


Solution

Microsoft decided to move two of its largest websites, Microsoft TechNet and Microsoft Developer Network, from an entirely on-premises infrastructure to the Windows Azure cloud services environment.


Benefits

  • Maximized resources
  • Gained greater scalability
  • Maintained high performance
  • Reduced infrastructure andmaintenance costs


Software and Services
  • Microsoft Azure
  • Microsoft SQL Azure
  • Microsoft System Center Operations Manager

Vertical Industries
Architecture, Engineering & Construction

Country/Region
United States

Languages
English

Partner(s)
Accenture

RSS