Trill Moves Big Data Faster, by Orders of Magnitude

Published

Posted by George Thomas Jr.

Trill's throughput compared to SPE-X (opens in new tab)In today’s high-productivity computing environments that process dizzying amounts of data each millisecond, a research project named for “a trillion events per day” may seem relatively ordinary.

But when you understand that Trill (opens in new tab), a new high-performance streaming analytics engine developed by Microsoft researchers, can process data at two to four orders of magnitude faster than today’s streaming engines, well, now you’re getting into “wow” territory, especially considering Trill is just a .NET library:

  • As a single-node engine library, any .NET application, service, or platform can easily include it and start processing queries;
  • A temporal query language allows users to express complex queries over real-time and/or offline data sets; and,
  • Trill’s high performance across intended usage scenarios means users can get results significantly faster than before.

Microsoft Research Podcast

Collaborators: Renewable energy storage with Bichlien Nguyen and David Kwabi

Dr. Bichlien Nguyen and Dr. David Kwabi explore their work in flow batteries and how machine learning can help more effectively search the vast organic chemistry space to identify compounds with properties just right for storing waterpower and other renewables.

“Prior systems have only achieved subsets of these benefits, but Trill provides all of these advantages in one package, so to speak.” says Badrish Chandramouli (opens in new tab), one of the Microsoft researchers who developed Trill.

Its secret? Trill incorporates new techniques and algorithms that process events in batches, with the data within those batches organized in new ways that enable queries to execute much more efficiently than before, but to users it’s the same as working with a .NET library — no need to leave the .NET environment.

Bing Ads customers, in fact, already are enjoying the paradigm shift, seeing results in less than an hour of launching Bing ad campaigns (opens in new tab).

And it doesn’t end there.

“While it can be integrated into today’s distribution fabrics such as SCOPE (in Bing ads) and Orleans (opens in new tab) (in Halo (opens in new tab)) to achieve scale-out, we are currently looking at developing new techniques to achieve even better performance in distributed computing and Internet-of-Things (opens in new tab) scenarios,” Chandramouli says.

Started in early 2012 by Chandramouli and fellow researcher Jonathan Goldstein, and detailed in Trill: A High-Performance Incremental Query Processor for Diverse Analytics (opens in new tab) (1.5 MB .pdf), its roots can be traced to earlier research in Complex Event Detection and Response algebra (CEDR), dating back to 2007, and published in Consistent Streaming Through Time: A Vision for Event Stream Processing (opens in new tab) (660 KB .pdf). And in the interim, a successive paper that introduced the idea of using a single language and engine to handle real-time and offline datasets, Temporal Analytics on Big Data for Web Advertising (opens in new tab), won Best-paper at IDCE 2012.

“From CEDR to Trill to multiple Microsoft products: This body of work is a great example of how within Microsoft Research we evolve from science to technology to business impact,” says Jeannette Wing (opens in new tab), Corporate Vice President, Microsoft Research. “It also shows the nature and value of long-term research, where patience and persistence really pay off.”

While not directly available to the public, Trill also is being used elsewhere at Microsoft, as a query processor within the Azure Stream Analytics (opens in new tab) service, currently under public preview. Additional collaborators on Trill include: Mike Barnett (opens in new tab), Rob DeLine (opens in new tab), Danyel Fisher (opens in new tab), John Platt, James Terwilliger (opens in new tab), and John Wernsing.

Continue reading

See all blog posts