My excellent cloud adventure: surviving the murky depths of change

Apr 5, 2017   |  

When I’m not doing IT, I like to go SCUBA diving. And for the last four years I’ve been diving a closed-circuit rebreather. These are special systems that recirculate the air you breathe during a dive, removing the carbon dioxide, adding oxygen, and using helium and nitrogen mixtures to balance out inert gases.

Rebreather diving is transforming the SCUBA experience. It’s incredibly more efficient and it opens up new frontiers in deep diving. Done wrong, of course, it can be catastrophic. It is, in other words, a nice parallel to my experience in helping Microsoft IT move Microsoft to the cloud.

The cloud transition, like the move to rebreathers, comes with a certain set of pressures that cannot be ignored. In my case, the pressure arrived just over a year ago. That’s when my CIO, Jim DuBois, called me in and told me that my team was not making the kind of progress with the cloud he wanted. And then he handed me our Finance portfolio.

To frame this: The Finance portfolio drives the company’s big financial engines. It’s our biggest portfolio, with the most entrenched legacy apps, and the most risk. Jim’s thinking was, “I’ll give the hardest portfolio to the cloud guy because if he can do it, no one else can claim that theirs is too hard.”

Then, as that was sinking in, Jim asked us, “So, when will we be all-up 90 percent in the cloud?” We did some thumbnail calculations and told him it would be midway through fiscal year 2019. Jim nodded, smiled, and said, “So we’ll be there by the end of FY 2017, right?” And we took a breath and said, “Yes we will!”

Adding to the fun, Jim had also handed down an annual cloud budget that was significantly less than what we thought we needed. Jim not only made it clear we’d have to meet our deadline with a smaller budget, he decentralized the budget so that every engineering leader, including me, was responsible for our piece of it.

So. Take on migration of the Finance portfolio, cut a year and half off our projections, and restrict our budget, just like that. No pressure.

An interesting thing about SCUBA rebreathers is that you need to recast your view of diving. Rebreathers are more complex to use than open circuit scuba systems, and they have more potential points of failure, but you get to do incredible new things. Along with that, you also have more ways to self-rescue. Being safe means paying great attention to the systems and “overlearning” the practical skills of operation and fault recovery. Likewise, the move to the cloud changes what you watch on a day-to-day basis, how you build and operate, how you think about your work day, and expands the knobs you can turn to make things different and better.

I start my morning with a series of dashboards, which give me a real-time view of what’s happening across our environment. The first one I look at is my cloud movement – how many on-premise virtual machines (VMs) I still have left, how fast are we moving them to Azure, how many have missed their targets, what’s the rate for the next month? I then look at my subscription count to see if we’re peeling down that to our targets, and I look to see if they are being kept ‘healthy’, i.e. in compliance. I also look at the number of VMs moving to the new ARM-based subs, which are more effective for automation. And then I look at how we’re doing on patch compliance for our IaaS footprint.

Finally, I look at my Azure resource optimization dashboard, which shows me what my current Azure bill is and what might be over-provisioned in my space. Recently, I saw that we were over-provisioned in SQL, and it only took a few minutes to realize that if my engineers just picked smaller SKUs, we could save $30,000 a month. That’s just one click into the portal for my engineers: “Resize.” Done.

New practices and disciplines like these are enabling us to make real progress. In the last 13 months on the Finance portfolio, we’ve been able to move 70 percent of it to Azure. And by the time we hit June we’ll be at over 90 percent. We’re decomposing some major properties, including one that is currently sitting on the biggest hardware known to humanity – two 72-core, two-terabyte machines – onto Azure PaaS, using various technologies including SCALA, Spark, and HD Insight for a lot of the work.

When you make a big change, what you value in life changes and how you value it. We used to value the efficiency of data centers. Now we care about how I can run my business as effectively as possible with the lowest cloud bill possible. Every quarter, my team is accountable for a set of metrics that define how efficiently we’re running. We’re trying to target less than 3 percent waste.

The beauty of this, is that by learning what it takes for a real enterprise to transform to the cloud, we’re able to teach Microsoft product teams, and, in many cases, directly influence the course of product development.

The Azure team has embraced this way of thinking so much that we have a standing monthly meeting with Jason Zander, their engineering vice president. He listens when we call out gaps and his leaders work with us on fixes, improvements, and new services for customers. Since the start of the year, they’ve created two new services based on our needs and we’re working with them to add those to the product. This is a tremendous shift, and it’s smart. Jason has told us that his group listens to us because we tell him what big customers would be telling him nine months later. He says that Microsoft IT gives him the advantage of being able to solve those challenges before customers have to tell him about it.

Hard as it is to move stuff to the cloud, it’s even harder to change the bad behaviors that linger once you’ve decided to make the journey. Changing how people operate and how they think requires a bit of magic, and that’s what I’ll cover in my next blog. Till then, dive safely!

