As a leader in semiconductor, computer processors, and related technologies, AMD has a responsibility to serve its customers, keep pace with the industry, and help set the standard for how servers, computers, and embedded systems run. To maintain its execution track record, the IT team at AMD used Microsoft Azure high-performance computing (HPC), HBv3 virtual machines, and other Azure resources to build scalable capacity and optimize the company’s cloud capabilities, accelerating time to market and eliminating weeks and even months of delay.
“With Azure HPC, we quickly address variabilities in compute demand, which has a significant impact on our customers because we can show them how we will accommodate needs if they change.”
Rajiv Malhotra, Senior Director, IT, AMD
Customer zero for innovation
AMD has a never-ending appetite for more computing resources. Whether it’s developing a new motherboard chipset, an advanced microprocessor, or a more powerful graphics processor, the company—one of the global leaders in semiconductor technology—stretches the limits of its cloud and on-premises infrastructures in its quest for continuous innovation.
Unlike many traditional IT departments, AMD IT has the unique role of acting as customer zero for new AMD products. It deploys them into production environments and works closely with AMD engineering teams to test and refine each product, helping ensure a successful release to business and consumer markets.
“Our roadmap continues to have new products added to it,” says Philip Steinke, Fellow, CAD Infrastructure and Physical Design at AMD. “The complexity of the products also keeps increasing to deliver all the features customers are looking for.”
AMD IT uses cloud technology to gain flexibility and shorten ramp-up times when it needs to feed its hunger for computing resources and scale up to complete jobs. “For each new product generation, we need more and more compute power to implement the designs, make sure they behave as expected, and get them out the door to manufacturing,” says Steinke.
Peaks in demand and time-sensitive opportunities
To keep product design and verification running at maximum efficiency and to take on new projects, AMD IT recognized it would need to increase its computing power and speed up job times. The team wanted the capacity to scale up virtual machines (VMs) configured for high-performance computing (HPC) to meet bursts of demand and then scale back down when the machines aren’t needed.
“Whatever the number of jobs we’re running, we have about 20 to 30 percent of that same number of jobs just waiting to be run,” says Rajiv Malhotra, Senior Director, IT at AMD. “For all practical purposes, we’re 100 percent utilized and don’t have a lot of flex on-premises. The impact of this lack of flexibility is a lost opportunity cost.”
To broaden its capabilities, bring flexibility to its legacy on-premises infrastructure, increase access to computing capacity, and eliminate time spent on procurement, AMD IT decided to run its electronic design automation (EDA) workloads with Microsoft Azure high-performance computing (HPC) resources on Azure HBv3 virtual machines.
Capitalizing on capacity with Azure HBv3 VMs configured for HPC
With HBv3 VMs optimally configured for Azure HPC and powered by AMD’s own advanced EPYC™ processors, AMD IT now has reliable node-to-node interconnectivity and can deliver quick scaling to high core volumes and excellent performance for EDA scenarios. HPC works well for both on-premises and cloud infrastructures, so AMD IT can quickly and easily create burst capacity across its hybrid environments as need arises. The company’s IT leaders consider this level of adaptability key to preserving capacity and reliability.
“When there was a demand surge prior to using Azure, we had to decide which project we wanted to disrupt least,” says Malhotra. “With Azure HPC, we quickly address variabilities in compute demand, which has a significant impact on our customers because we can show them how we will accommodate needs if they change.”
In a standard year-long project cycle, unanticipated tasks always come up and multiple projects will likely overlap. AMD IT has the flexibility with Azure to strategically plan for which machines and processes it will need at any given time. That helps the IT practice make a positive impact on the company’s bottom line.
“We use Azure HPC for a range of workload types, including workloads that run on very big machines and need big systems with lots of RAM—but they might only run for eight hours and only once every 24 hours,” Steinke says. “Now we can get resources when we need them and only pay for the capacity we use, without having to let machines sit idle.”
Speeding up design cycle times and reducing time to market
Seizing the opportunity to customize its technology stack and, in turn, improve time to solution, AMD IT also built a robust engineering Unix environment around the dynamic AMD EPYC CPU-powered HBv3 VMs, using Azure HPC Cache and Azure NetApp Files to bring its workloads to the cloud. HPC Cache helps keep EDA jobs moving by dividing computing capacity across both Azure and AMD’s on-premises storage, giving users a faster response time. Similarly, the IT team uses Azure NetApp Files to provide high-level storage for data generated in Azure and used by multiple jobs that need fast read-write access. This helps keep CPU cores from getting stuck waiting to do their work.
“Our real benchmark here was that the work we wanted to do in the cloud had to be able to at least match what we achieve using our high-performance EPYC processor cores and file servers in our on-premises datacenter,” says Steinke. “We were able to meet and exceed that baseline performance using Azure.”
By combining these Azure resources, AMD IT built the access to the VMs and elastic computing capacity that it needed to get through its product design cycles faster and accelerate time to market.
Flexibility and visibility enhanced by AI and machine learning
Since incorporating Azure into its production mix, AMD IT has greatly benefitted from the added flexibility and wider scope of resources that teams now take advantage of for planning and execution. Running reports requires up to 80 full servers used concurrently each day—a task that can slow down dramatically given added pressure on already limited resources.
“Before we brought in Azure HPC, teams may have had to stagger their report runs, doing 40 runs one day and 40 another day, and they only had visibility into all the details every two days,” Steinke explains. “Making use of the additional resources that Azure HPC provides, teams can run all their reports and get to view them on a daily basis.”
AMD IT wants to focus on getting even more value out of the cloud by using advancements in metrics and analytics to help guarantee consistently strong project execution. “We want to use AI and machine learning deployed in Azure HPC to give us unique insights into how workflows consume the compute resources, how they run, and potentially how we can converge faster by getting deeper knowledge and forecasts,” says Malhotra.
AMD IT continues to hit and exceed its performance targets while further developing its relationship with Azure. “I think that our successful use of Azure HPC helps show that it’s a fully tested and proven solution,” says Steinke.
Find out more about AMD on Twitter, Facebook, and LinkedIn.“Our real benchmark here was that the work we wanted to do in the cloud had to be able to at least match what we achieve using our high-performance EPYC processor cores and file servers in our on-premises datacenter. We were able to meet and exceed that baseline performance using Azure.”
Philip Steinke, Fellow, CAD Infrastructure and Physical Design, AMD
Follow Microsoft