Microsoft Research Blog

Microsoft Research Blog

The Microsoft Research blog provides in-depth views and perspectives from our researchers, scientists and engineers, plus information about noteworthy events and conferences, scholarships, and fellowships designed for academic and scientific communities.

Optimizing imperative functions in relational databases with Froid

August 27, 2018 | By Karthik Ramachandra, Senior Applied Scientist

For decades, databases have supported declarative SQL as well as imperative functions and procedures as ways for users to express data processing tasks. While the evaluation of declarative SQL has received a lot of attention resulting in highly sophisticated techniques, improvements in the efficient evaluation of imperative programs have remained elusive.

Imperative User-Defined Functions (UDFs) and procedures offer several benefits over SQL, including code modularity, reusability and readability. Because of these benefits, imperative programs often are preferred by users and are widely used. Unfortunately, their poor performance discourages and even prohibits their use in many situations. Recognizing the importance of this problem and how little attention it has received, a team of computer scientists at Microsoft’s Gray Systems Lab resolved to rethink the way imperative functions are evaluated in databases.

The outcome of their efforts is Froid, an extensible framework for optimizing imperative programs in relational databases. This work first appeared in a paper at PVLDB titled, “Froid: Optimization of Imperative Programs in a Relational Database” and is going to be presented at VLDB 2018. The purpose of Froid is to enable developers to use the abstraction of UDFs without compromising on performance. Froid introduces a radically different approach to evaluating imperative functions that results in performance improvements of up to multiple orders or magnitude over the existing state of the art.

The root cause of poor performance of UDFs can be attributed to what is known as the impedance mismatch between two distinct programming paradigms at play – the declarative paradigm of SQL and the imperative paradigm of procedural code. Reconciling this mismatch is crucial in order to address this problem and forms the crux of this work.

At the core of Froid is a novel technique that automatically can transform an entire multi-statement UDF into an equivalent relational algebraic expression. This form is now amenable to cost-based optimization and results in highly efficient, set-oriented, parallelizable execution strategies as opposed to inefficient, iterative, serial execution of UDFs.

Today’s database systems are very good at optimizing SQL queries – essentially relational algebraic expressions. One of the key advantages of Froid’s approach is that it leverages existing query optimization rules and techniques by transforming the imperative program into a form that the database query optimizer already understands. In other words, Froid reconciles the declarative-imperative impedance mismatch by transforming imperative programs into a declarative form.

Imperative programs typically go through a compiler that performs several transformations and optimizations. It can be argued that we could use well-known compiler optimization techniques to optimize UDFs as well. However, UDFs executing in a relational database present a different set of challenges and opportunities. For instance, relational operations such as JOINs and GROUP BYs could be implicit in a UDF, and a traditional compiler will not be able to infer such operations.

Froid’s approach not only infers such implicit relational operations but it also automatically brings the benefits of many of these compiler optimizations to UDFs. The relational algebraic transformations built into Froid can be used to arrive at the same result as that of applying compiler optimizations (such as dead code elimination, program slicing and constant folding) to imperative code.

Users prefer to write imperative functions and procedures in a language of their choice, the popular ones being T-SQL, C#, Python, R, Java, and so on. Although Froid’s current focus is on T-SQL UDFs, the underlying technique is language-agnostic; extending it to other imperative languages is quite straightforward.

Based on an experimental evaluation on several real workloads, we demonstrate that Froid can optimize a fairly large subset of UDFs in the wild and it can result in significant benefits in both improving performance and reducing resource (CPU and IO) utilization. Figure 1 shows the factor of improvement observed thanks to Froid on one of the customer workloads. This workload had 93 UDFs and Froid was able to optimize 86 of them. UDFs are plotted along the x-axis, ordered by the observed improvement with Froid (in descending order). The y-axis shows the factor of improvement (in log scale). You can see that the UDFs run about 5x to 1000x faster due to Froid. More details and evaluations can be found in our paper.

Figure 1 –An impressive optimization of UDFs thanks to Froid.

We believe this work will enable and encourage wider use of UDFs to build modular, reusable and maintainable applications without compromising on performance and we’re looking forward to discussing the possibilities with you in person in Rio de Janeiro at VLDB 2018!

Up Next

Data management, analysis and visualization

Microsoft Unveils FASTER – a key-value store for large state management

At SIGMOD 2018, a team from Microsoft Research will be presenting a new embedded key-value store called FASTER, described in their paper “FASTER: A Concurrent Key-Value Store with In-Place Updates”. As its name suggests, FASTER makes a major leap forward in terms of supporting fast and frequent lookups and updates of large amounts of state […]

Microsoft blog editor

Data Science education at UC Berkeley

Data management, analysis and visualization

A new understanding of the world through grassroots Data Science education at UC Berkeley

By Vani Mandava, Director, Data Science, Microsoft Research While some may regard data science as an easy passport to a job for the tech savvy, Luis Macias has different ideas. The fourth-year undergraduate student, who is majoring in American Studies at University of California, Berkeley (UC Berkeley), wants to turn the hype of data science […]

Microsoft blog editor

Data management, analysis and visualization

Project PrivTree: Blurring your “where” for location privacy

By Winnie Cui, Senior Research Manager, Microsoft Research Asia Data scientist, Anthony Tockar, used publicly available location data to show how celebrities can be tracked throughout New York City, while working on his Master’s Degree at Northwestern University. By cross-referencing public news and photos about celebrities hailing cabs in NYC, Tockar found out exactly where […]

Microsoft blog editor