How Microsoft is modernizing its internal network using automation

Dec 11, 2019   |  

After Microsoft moved its workload of 60,000 on-premises servers to Microsoft Azure, employees could set up systems and virtual machines (VMs) with a push of a few buttons.

Although network hardware servers have changed over time, the way that network engineers work isn’t nearly as modern.

“With computers, we have modernized our processes to follow DevOps processes,” says Bart Dworak, a software engineering manager on the Network Automation Delivery Team in Core Services Engineering and Operations (CSEO). “For the most part, those processes did not exist with networking.”

Two years ago, Dworak says, network engineers still created and ran command-line-based scripts and created configuration change reports.

“We would sign into network devices and submit changes using the command line,” Dworak says. “In other, more modern systems, the cloud provides desired-state configurations. We should be able to do the same thing with networks.”

It became clear that Microsoft needed modern technology for configuring and managing the network, especially as the number of managed network devices increased on Microsoft’s corporate network. This increase occurred because of higher network utilization by users, applications, and devices as well as more complex configurations.

“When I started at Microsoft in 2015, our network supported 13,000 managed devices,” Dworak says. “Now, we surpassed 17,000. We’re adding more devices because our users want more bandwidth as they move to the cloud so they can do more things on the network.”

[To learn more, read about how Microsoft is using a software-defined strategy to secure its network, and how it is using Azure ExpressRoute hybrid technology to secure the company.]

Dworak and the Network Automation Delivery Team saw an opportunity to fill a gap in the company’s legacy network-management toolkit. They decided to apply the concept of infrastructure as code to the domain of networking.

“Network as code provides a means to automate network device configuration and transform our culture,” says Steve Kern, a CSEO senior program manager and leader of the Network Automation Delivery Team.

The members of the Network Automation Delivery Team knew that implementing the concept of network as code would take time, but they had a clear vision.

“If you’ve worked in a networking organization, change can seem like your enemy,” Kern says. “We wanted to make sure changes were controlled and we had a routine, peer-reviewed rhythm of business that accounted for the changes that were pushed out to devices.”

The team has applied the concept of network as code to automate processes like changing the credentials on more than 17,000 devices at Microsoft, which now occurs in days rather than weeks. The team is also looking into regular telemetry data streaming, which would inform asset and configuration management.

“We want network devices to stream data to us, rather than us collecting data from them,” Dworak says. “That way, we can gain a better understanding of our network with a higher granularity than what is available today.”

The Network Automation Delivery Team has been working on the automation process since 2017. To do this, the team members built a Git repository and started with simple automation to gain momentum. Then, they identified other opportunities to apply the concept of GitOps—a set of practices for deployment, management, and monitoring—to deliver network services to Microsoft employees.

Implementing network as code has led to an estimated savings of 15 years of labor and vendor spending on deployments and network devices changes. As network technology shifts, so does the role of network engineers.

“We’re freeing up network engineers so they can build better, faster, and more reliable networks,” Kern says. “Our aspiration is that network engineers will become network developers who write the code. Many of them are doing that already.”

Additionally, the team is automating how it troubleshoots and responds to outages. If the company’s network event system detects that a wireless access point (AP) is down, it will automatically conduct diagnostics and attempt to address the AP network outage.

“The building AP is restored to service in less time than it would take to wake up a network engineer in the middle of the night, sign in, and troubleshoot and remediate the problem,” Kern says.

Network as code also applies a DevOps mentality to network domain by applying software development and business operations practices to iterate quickly.

“We wanted to bring DevOps principles from the industry and ensure that development and operations teams were one and the same,” Kern says. “If you build something, you own it.”

In the future, the network team hopes to create interfaces for each piece of network gear and have application developers interact with the API during the build process. This would enable the team to run consistent deployments and configurations by restoring a network device entirely from a source-code repository.

Dworak believes that network as code will enable transformation to occur across the company.

“Digital transformation is like remodeling a house. You can remodel your kitchen, living room, and other parts of your house, but first you have to have a solid foundation,” he says. “Your network is part of the foundation—transforming networking will allow others to transform faster.”

To learn more, read about how Microsoft is using a software-defined strategy to secure its network, and how it is using Azure ExpressRoute hybrid technology to secure the company.

Tags: , , , , , , , ,