Microsoft Research Blog

Microsoft Research Blog

The Microsoft Research blog provides in-depth views and perspectives from our researchers, scientists and engineers, plus information about noteworthy events and conferences, scholarships, and fellowships designed for academic and scientific communities.

Deep Learning for Program Synthesis

April 21, 2017 | By Microsoft blog editor

By Rishabh Singh, Jacob Devlin, Abdelrahman Mohamed, and Pushmeet Kohli, Microsoft Research

Despite the many advances in computing over the past decades, the actual process of writing computer software has not fundamentally changed — a programmer must manually code the exact algorithmic logic of a program in a step-by-step manner using a specialized programming language. Although programming languages have become much more user-friendly over the years, learning how to program is still a major endeavor that most computer users have not undertaken.

In a recent paper, we report our latest work in deep learning for program synthesis, where deep neural networks learn how to generate computer programs based on a user’s intent. The user simply provides a few input/output (I/O) examples to specify the desired program behavior, and the system uses these to generate a corresponding program.

For example, imagine that a user has a list of names that she wishes to format in a specific way, as shown below. She provides a few input-output examples, and the system auto-fills the remaining outputs (shown in light gray). In cases where users have hundreds or thousands of input strings, this can be a major time saver.

input/output string for Program Synthesis

The system performs this task by generating a program in a domain specific language (DSL). The user does not have to understand the details of the DSL, and in fact she generally does not see the program at all. In our DSL, the correct program corresponding to the above example is:

Concat(
   ToCase(
       GetToken(
           input,
           Type=Word,
           Index=-1),
       Type=Proper),
   Const(", "),
   ToCase(
       SubString(
         GetToken(
             input,
             Type=Word,
             Index=1),
         Start=0,
         End=1),
       Type=Proper),
   Const("."))

There are two key challenges in program synthesis. First, there are trillions of possible programs in our expressive DSL, and the correct program has likely never been seen before by the system. Second, because the I/O examples are hand-generated by a human, they often contain noise (e.g., typos), as in the example above (Notice Useato instead of Uesato in the second example output).

Previous approaches to solving this problem — most notably the FlashFill system in Microsoft Excel — have relied on hand-crafted rules and heuristics to guide the search for the desired program. However, this approach makes it very difficult to extend the capabilities of the DSL, requires years of manual effort, and is also extremely sensitive to any noise in the I/O examples.

Our system, called RobustFill, leverages recent advances in deep learning to take a data-driven approach to program synthesis without the need for any hand-crafted rules. Instead, it uses an attentional sequence-to-sequence neural network, first pioneered for use in language translation, to generate the program based on the I/O examples. A sketch of our network is shown below, and the details are provided.

RobustFill for Program Synthesis

We trained our system on millions of randomly generated I/O + program pairs, but determined it could perform well on real-world data, because it could learn the semantics of the DSL. Overall, our system achieved 92 percent accuracy on a set of real-world benchmarks.  What’s especially encouraging is the system is able to maintain its high accuracy even when the I/O examples contain significant noise.

Implications for Programming

The ability to successfully train neural architectures for learning to program in a rich functional language (FlashFill DSL) not only marks an exciting breakthrough in neural program synthesis, but also is a small but noteworthy step towards achieving more general artificial intelligence. It addresses the key challenges of incorporating interpretability and also touches on the important topic of connecting distributed representations with symbolic representations of knowledge.

We are now working on extensions of these architectures for learning programs in DSLs with stateful variables and control flow to generate richer program classes.  We believe work along with direction will require us to study and tackle the fundamental technical issues involved in program synthesis and induction.

More information can be found in the papers below:

RobustFill: Neural Program Learning under Noisy I/O” by Jacob Devlin, Jonathan Uesato, Surya Bhupatiraju, Rishabh Singh, Abdelrahman Mohamed, and Pushmeet Kohli, in arXiv:1703.07469

Neuro-Symbolic Program Synthesis” by Emilio Parisotto, Abdelrahman Mohamed, Rishabh Singh, Lihong Li, Denny Zhou, and Pushmeet Kohli, in 5th International Conference on Learning Representations (ICLR 2017)

Contributing interns: Emilio Parisotto (CMU), Jonathan Uesato (MIT), Surya Bhuptairaju (MIT)

Up Next

Artificial intelligence, Programming languages and software engineering

Neural-Guided Deductive Search: A best of both worlds approach to program synthesis

Program synthesis — automatically generating a program that satisfies a given specification — is a major challenge in AI. In addition to changing the way we design software, it has the potential to revolutionize task automation. End users without programming skills can easily provide input-output examples of the desired program behavior. The Flash Fill feature […]

Alex Polozov

Researcher

Artificial intelligence, Programming languages and software engineering

Four Big Bets For Better AI Research: A Personal Journey

It’s a big shift to change from being motivated by getting published in prestigious conferences and journals, to being motivated by solving real problems for real people. Halfway through my 18-year research career working on program synthesis–the task of automatically constructing a program that satisfies a given high-level specification–I had a series of epiphanies that […]

Sumit Gulwani

Partner Research Manager

Human-computer interaction, Programming languages and software engineering

Neural Program Synthesis and the Quest to Democratize Programming with Dr. Rishabh Singh

Episode 10, January 31, 2018 - We can program computers to do almost anything. But what about programming computers to… program computers? That’s a task that Dr. Rishabh Singh, and the team in the Cognition group at Microsoft Research, are tackling with Neural Program Synthesis, also known as artificial programming.

Microsoft blog editor