The Trill Incremental Analytics Engine

MSR-TR-2014-54 |

This technical report introduces Trill – a new query processor for analytics. Trill fulfills a combination of three requirements for a query processor to serve the diverse big data analytics space: (1) Query Model: Trill is based on a tempo-relational model that enables it to handle streaming and relational queries with early results, across the latency spectrum from real-time to offline; (2) Fabric and Language Integration: Trill is architected as a high-level language library that supports rich data-types and user libraries, and integrates well with existing distribution fabrics and applications; and (3) Performance: Trill’s throughput is high across the latency spectrum. For streaming data, Trill’s throughput is 2-4 orders of magnitude higher than today’s comparable streaming engines. For offline relational queries, Trill’s throughput is comparable to a major modern commercial columnar DBMS.

Trill uses a streaming batched-columnar data representation with a new dynamic compilation-based system architecture that addresses all these requirements. In this technical report, we describe Trill’s new design and architecture, and report experimental results that demonstrate Trill’s high performance across diverse analytics scenarios. We also describe how Trill’s ability to support diverse analytics has resulted in its adoption across many usage scenarios at Microsoft.