Abstract

Our cloud services are losing too many battles to faults like software bugs, resource interference, and hardware failures. Many tools can help us win these battles: model checkers to verify, fault injection to find bugs, replay to debug, and many more. Unfortunately, tools are currently
afterthoughts in cloud service designs that must either be tediously tangled into service implementations or integrated transparently in ways that fail to effectively capture
the service’s problematic non-deterministic (concurrent, asynchronous, and resource access) behavior.

This paper makes tooling a first-class concern by having services encoded with tasks whose interactions reliably capture all non-deterministic behavior needed by tools. Task interactions are then exposed in aspects that are useful in encoding cross-cutting behavior; combined, tools encoded as task aspects can integrate with services effectively and transparently. We show how task aspects can be used to ease the development of an online production data service that runs on a hundred machines.