Microsoft Research Blog

Microsoft Research Blog

The Microsoft Research blog provides in-depth views and perspectives from our researchers, scientists and engineers, plus information about noteworthy events and conferences, scholarships, and fellowships designed for academic and scientific communities.

Optimizing imperative functions in relational databases with Froid

August 27, 2018 | By Karthik Ramachandra, Senior Applied Scientist

For decades, databases have supported declarative SQL as well as imperative functions and procedures as ways for users to express data processing tasks. While the evaluation of declarative SQL has received a lot of attention resulting in highly sophisticated techniques, improvements in the efficient evaluation of imperative programs have remained elusive.

Imperative User-Defined Functions (UDFs) and procedures offer several benefits over SQL, including code modularity, reusability and readability. Because of these benefits, imperative programs often are preferred by users and are widely used. Unfortunately, their poor performance discourages and even prohibits their use in many situations. Recognizing the importance of this problem and how little attention it has received, a team of computer scientists at Microsoft’s Gray Systems Lab resolved to rethink the way imperative functions are evaluated in databases.

The outcome of their efforts is Froid, an extensible framework for optimizing imperative programs in relational databases. This work first appeared in a paper at PVLDB titled, “Froid: Optimization of Imperative Programs in a Relational Database” and is going to be presented at VLDB 2018. The purpose of Froid is to enable developers to use the abstraction of UDFs without compromising on performance. Froid introduces a radically different approach to evaluating imperative functions that results in performance improvements of up to multiple orders or magnitude over the existing state of the art.

The root cause of poor performance of UDFs can be attributed to what is known as the impedance mismatch between two distinct programming paradigms at play – the declarative paradigm of SQL and the imperative paradigm of procedural code. Reconciling this mismatch is crucial in order to address this problem and forms the crux of this work.

At the core of Froid is a novel technique that automatically can transform an entire multi-statement UDF into an equivalent relational algebraic expression. This form is now amenable to cost-based optimization and results in highly efficient, set-oriented, parallelizable execution strategies as opposed to inefficient, iterative, serial execution of UDFs.

Today’s database systems are very good at optimizing SQL queries – essentially relational algebraic expressions. One of the key advantages of Froid’s approach is that it leverages existing query optimization rules and techniques by transforming the imperative program into a form that the database query optimizer already understands. In other words, Froid reconciles the declarative-imperative impedance mismatch by transforming imperative programs into a declarative form.

Imperative programs typically go through a compiler that performs several transformations and optimizations. It can be argued that we could use well-known compiler optimization techniques to optimize UDFs as well. However, UDFs executing in a relational database present a different set of challenges and opportunities. For instance, relational operations such as JOINs and GROUP BYs could be implicit in a UDF, and a traditional compiler will not be able to infer such operations.

Froid’s approach not only infers such implicit relational operations but it also automatically brings the benefits of many of these compiler optimizations to UDFs. The relational algebraic transformations built into Froid can be used to arrive at the same result as that of applying compiler optimizations (such as dead code elimination, program slicing and constant folding) to imperative code.

Users prefer to write imperative functions and procedures in a language of their choice, the popular ones being T-SQL, C#, Python, R, Java, and so on. Although Froid’s current focus is on T-SQL UDFs, the underlying technique is language-agnostic; extending it to other imperative languages is quite straightforward.

Based on an experimental evaluation on several real workloads, we demonstrate that Froid can optimize a fairly large subset of UDFs in the wild and it can result in significant benefits in both improving performance and reducing resource (CPU and IO) utilization. Figure 1 shows the factor of improvement observed thanks to Froid on one of the customer workloads. This workload had 93 UDFs and Froid was able to optimize 86 of them. UDFs are plotted along the x-axis, ordered by the observed improvement with Froid (in descending order). The y-axis shows the factor of improvement (in log scale). You can see that the UDFs run about 5x to 1000x faster due to Froid. More details and evaluations can be found in our paper.

Figure 1 –An impressive optimization of UDFs thanks to Froid.

We believe this work will enable and encourage wider use of UDFs to build modular, reusable and maintainable applications without compromising on performance and we’re looking forward to discussing the possibilities with you in person in Rio de Janeiro at VLDB 2018!

Up Next

Black and white photo of Karthik Ramachandra

Data platforms and analytics

Froid and the relational database query quandary with Dr. Karthik Ramachandra

Episode 73, April 24, 2019 - In the world of relational databases, structured query language, or SQL, has long been King of the Queries, primarily because of its ubiquity and unparalleled performance. But many users prefer a mix of imperative programming, along with declarative SQL, because its user-defined functions (or UDFs) allow for good software engineering practices like modularity, readability and re-usability. Sadly, these benefits have traditionally come with a huge performance penalty, rendering them impractical in most situations. That bothered Dr. Karthik Ramachandra, a Senior Applied Scientist at Microsoft Research India, so he’s spent a great deal of his career working on improving an imperative complement to SQL in database systems. Today, Dr. Ramachandra gives us an overview of the historic trade-offs between declarative and imperative programming paradigms, tells us some fantastic stories, including The Tale of Two Engineers and The UDF Story, Parts 1 and 2, and introduces us to Froid – that’s F-R-O-I-D, not the Austrian psychoanalyst – which is an extensible, language-agnostic framework for optimizing imperative functions in databases, offering the benefits of UDFs without sacrificing performance.

Microsoft blog editor

Paul Bennett

Artificial intelligence, Data platforms and analytics, Systems and networking

Building contextually intelligent assistants with Dr. Paul Bennett

Episode 59, January 16, 2019 - Dr. Bennett brings us up to speed on the science of contextually intelligent assistants, explains how what we think our machines can do actually shapes what we expect them to do, and shares how current research in machine learning and data science is helping machines reason on our behalf in the quest to help us find the right information effortlessly.

Microsoft blog editor

Data platforms and analytics, Search and information retrieval

Getting LinkedIn to Data Science with Dr. Igor Perisic

Episode 11, February 7, 2018 - Big data is a big deal, and if you follow the popular technical press, you’ll have heard all the metaphors: data is the new oil, the new bacon, the new currency, the new electricity. It’s even been called the new black. While data may not actually be any of these things, we can say this: in today’s networked world, data is increasingly valuable, and it is essential to research, both basic and applied.

Microsoft blog editor