Portrait of Jonathan Goldstein

Jonathan Goldstein

Principal Researcher


I am currently a principal researcher in the Database group at Microsoft, and have worked at Microsoft for 15+ years, where I've pursued technology and impact interests in both research and product roles.



Established: September 19, 2013

Trill is a high-performance in-memory incremental analytics engine. It can handle both real-time and offline data, and is based on a temporal data and query model. Trill can be used as a streaming engine, a lightweight in-memory relational engine, and as a progressive query processor (for early query results on partial data). You can learn more about Trill from the publications below, or from our slides here pdf | pptx.


Established: September 12, 2013

Tempe is a web service for exploratory data analysis. Below are images of the notebook pages mentioned in our submission to ICSE 2014.

User Experience with Big Data

Established: May 24, 2012

Big data analytics requires new workflows: high latency queries, massively-parallel code, and cloud computing infrastructures all make handling a big dataset different (and harder) than working on a local machine. We are exploring user experiences for analysts, and thinking about new ways to deal with big datasets. BigDataUX: building a better user experience for Big Data. Lots of different definitions can be found for "big data," but they all have one aspect in common: big…


Established: November 21, 2011

In the streams research project, we propose novel architectures, efficient processing techniques, models, and applications to support time-oriented queries over real-time and offline data streams. Our current focus in the project centers around Trill, a high-performance streaming analytics engine that is now used across Microsoft. Our currect focus areas include efficient query processing, scale-out, resiliency, streaming state management, and unstructured data support.











Microsoft CEP Server and Online Behavioral Targeting
Mohamed Ali, Ciprian Gerea, Balan S. Raman, Beysim Sezgin, Tiho Tarnavski, Tomer Verona, Ping Wang, Peter Zabback, Asvin Ananthanarayan, Anton Kirilov, Ming Lu, Alex Raizman, Ramkumar Krishnan, Roman Schindlauer, Torsten Grabs, Sharon Bjeletich, Badrish Chandramouli, Jonathan Goldstein, Sudin Bhat, Ying Li, Vincenzo Di Nicola, Xianfang Wang, David Maier, Stephan Grell, Olivier Nano, Ivo Santos, in International Conference on Very Large Data Bases (VLDB), Lyon, France, August 1, 2009, View abstract, Download PDF





Link description

Leading Edge of the Cloud


May 1, 2014


David Gauthier, Victor Bahl, Albert Greenberg, and Jonathan Goldstein





For the past 8 years, those interests have centered around the confluence of near real-time data processing and big data. Over the years, I’ve pursued those interests through the CEDR streaming research project, which defined the basic algebra and query processing algorithms for streaming queries, the StreamInsight data processing product, which made this technology available to Microsoft customers, and most recently Tempe, which focuses on the visualization of ad-hoc streaming queries, and Trill, which processes streaming and temporal queries at unprecedented levels of performance, resulting in a one size fits all engine for tempo-relational analytics. Trill is widely used within Microsoft, including in the Azure Streaming Analytics Service, and in Bing advertising.

This work has taken me on a fascinating journey, and has resulted in some unanticipated discoveries, such as Ping-Pong Patience Sort, and deterministic progressive analytics.

Prior to working on near real-time data processing and big data, I’ve pursued interests related to SQL materialized views, SQL query optimization, high speed data compression, and high dimensional indexing.