Distributed actor systems are widely used for developing interactive scalable cloud services, such as social networks and on-line games. By modeling an application as a dynamic set of lightweight communicating “actors”, developers can easily build complex distributed applications, while the underlying runtime system deals with low-level complexities of a distributed environment.
We present ActOp — a data-driven, application-independent runtime mechanism for optimizing end-to-end service latency of actor-based distributed applications. ActOp targets the two dominant factors affecting latency: the overhead of remote inter-actor communications across servers, and the intra-server queuing delay. ActOp automatically identifies frequently communicating actors and migrates them to the same server transparently to the running application. The migration decisions are driven by a novel scalable distributed graph partitioning algorithm which does not rely on a single server to store the whole communication graph, thereby enabling efficient actor placement even for applications with rapidly changing graphs (e.g., chat services). Further, each server autonomously reduces the queuing delay by learning an internal queuing model and configuring threads according to instantaneous request rate and application demands.
We prototype ActOp by integrating it with Orleans – a popular open-source actor system [4, 13]. Experiments with realistic workloads show latency improvements of up to 75% for the 99th percentile, up to 63% for the mean, with up to 2x increase in peak system throughput.