Techniques for Minimizing Number of Roles

When architecting a cloud service, one of the main benefits that we try to take best advantage of is the ability to scale more-or-less infinitely. To do this, we may decompose existing EXE’s and DLL’s into parts that will scale independently (as separate web/worker/vm roles), thus capturing economies of scale that the platform offers.

While this practice will produce optimum cost at scale, it can increase costs dramatically when getting started. During an ADS (Architecture Design Session) where I know that a company’s service is just getting started and has few customers, I will point out these opportunities to decompose into separate roles downstream, but recommend that they be kept together until scale is required.

Similarly, if an overall service is comprised of modules that normally are deployed separately, they can be combined into roles to contain costs, but this must be done carefully. For example, suppose a windows service is to be combined with web services on a web role. Installing the windows service is no problem, using a startup command file. However, one must be careful to ensure that the health of the windows service is monitored and health information passed down to the fabric controller in case of a failure so actions can be taken to remediate. When a web role or worker role is created, an agent is installed as part of the role that is responsible for communicating health to the fabric controller, but your windows service has no such agent.

Similar comments apply to combining other types of processes (command line, DLL, etc) into single roles. For example, instead of building a worker role, build a command line program and include the EXE in your project files for your web role. Use a startup command file to launch the command line program. It can poll a Windows Azure Storage Queue or similar mechanism to get tasks to work on. Be aware that combining codes like this will likely exert downward pressure on overall system throughput. Later, when it’s time to scale up, put the code into a worker role so it can scale independently of the web role. But remember to build some kind of “health check” mechanism between your command line program (or whatever) and the web/worker role’s main code.

When combining modules into a role, and if you have a choice in the matter, combine modules that depend on different resources. For example, combine one module that relies mainly on CPU and another that is more storage or network bound. If you combine modules that have the same resource requirement, you’ll probably need to scale that role more frequently than if the two modules are kept separate, thus eliminating the savings you were hoping for.

Besides combining different functions into a single role instance, you can minimize instances by simply reducing the instance count of a role to 1 and accepting the fact that your instance will be taken offline and patched once per month. You won’t be notified when this will happen, and it’s just as likely it’ll happen during business hours as not.