Flash Fill Gives Excel a Smart Charge
By Douglas Gantenbein, Senior Writer, Microsoft News Center
Sitting next to him was a businesswoman. She was delighted to learn that Gulwani was both a Ph.D. in computer science and a Microsoft employee. She popped open her laptop and asked, “Is there a way to merge two columns in Excel, when one column has a first name, the other the last, so that a column has both first and last names?”
“I had to admit to her that I knew next to nothing about Excel,” Gulwani recalls. “But when I got home, I opened my own laptop and looked for Excel search words. I found hundreds of help topics. And that made me think: Maybe this is the next problem I should solve.”
Solve it he did. In a collaborative effort that spanned both Microsoft Research and the Office 2013 team, Microsoft has introduced a feature in Excel 2013 called Flash Fill. Flash Fill earned enthusiastic reviews even before Office 2013 was released Jan. 29, and it is sure to be popular with users. Flash Fill takes tedious tasks now often performed manually in Excel—or by writing one-off programs—and executes them automatically. And it does so simply, requiring only the typing in an example of what the user wants.
Gulwani credits an interdisciplinary approach to helping make Flash Fill possible. It relies on his expertise in program verification and program synthesis, an approach to generating small pieces of code based on a user’s needs. Machine learning, a strong focus of Microsoft Research, provides the ability to rank the most suitable programs Flash Fill uses in a task. And contributions from human-computer-interaction studies underpin Flash Fill’s effort to avoid the discoverability issue associated with new features by automatically invoking the synthesis engine after the user provides indication of a repetitive task.
How does Flash Fill work? Let’s say an Excel user has a column of Social Security numbers. But they’re not formatted correctly–they’re 123456789, not 123-45-6789. An Excel user simply creates a new column adjacent to the existing column, then types in the correct example: 123-45-6789. Flash Fill immediately fills in all rows below the example with properly formatted figures. The user needs just to click to accept them all.
Flash Fill works for dozens of tasks: removing or adding formatting to names or other word groups, correcting inconsistencies in punctuation or the use of symbols such as brackets, converting dates of birth from something such as 8-4-1954 to 8/4/54, and many others.
“Flash Fill brings the power of text manipulation to the hands of common, non-technical, everyday users,” says Chad Rothschiller, a program manager with the Excel 2013 team who worked closely with Gulwani. “Even people who aren’t familiar with Excel can use the feature. It takes complex functionality that someone would otherwise have to build in formulas or code and puts it in the hands of common users.”
Flash Fill performs these chores in a way that even a newcomer to Excel can appreciate. That’s a big improvement over the previous state of things, in which a user had to know programming to create tools that can automate everyday Excel jobs.
But Flash Fill’s big leap is in how it performs Excel tasks. Computer scientists have long known that a software program can be taught certain tasks through a process called “learning by demonstration.” But that requires the software user to show the software exactly how to do something.
Gulwani and his team developed Flash Fill to learn by example, not demonstration. A user simply shows Flash Fill what he or she wants to do by filling in an Excel cell with the desired result, and Flash Fill quickly invokes an underlying program that can perform the task.
It’s the difference between teaching someone how to make a pizza step by step and simply showing them a picture of a pizza and minutes later eating a hot pie.
But that simplicity comes with a price.
“The biggest challenge,” Gulwani says, “is that learning by example is not always a precise description of the user’s intent—there is a lot of ambiguity involved.
“Take the example of Rick Rashid [Microsoft Research’s chief research officer]. Let’s say you want to convert Rick Rashid to Rashid, R. Where does that ‘R’ come from? Is it the ‘R’ of Rick or the ‘R’ of Rashid? It’s very hard for a program to understand.”
For each situation, Flash Fill synthesizes millions of small programs—10-20 lines of code—that might accomplish the task. It sounds implausible, but Gulwani’s deep research background in synthesizing code makes it possible. Then, using machine-learning techniques, Flash Fill sorts through these programs to find the one best-suited for the job.
Gulwani and his team designed Flash Fill to generalize user examples into a small program, then to run that program on the remaining spreadsheet rows to populate the output column.
The underlying technology is based on recent advances in automated program generation from examples. Since the number of programs that are consistent with a small number of examples is so often huge, Flash Fill uses efficient data structures and algorithms to represent and compute these programs. It then uses machine-learning-based ranking techniques to identify the most likely program from this set.
But despite the potential of Flash Fill, Gulwani found himself facing a hard sales pitch when he first suggested its inclusion in Office 2013. Early in 2010, he gave a group of Office 2013 project managers a talk that showed how Flash Fill, then called QuickCode, could work but did not include actual demos.
“I am giving the presentation,” he recalls, “and they stop me in the middle of it and say: ‘Sumit, you don’t need to convince us this is practically useful. But we don’t believe you can do this. It looks too much like magic.’”
To show that his idea wasn’t “magic,” So Gulwani spent the next three weeks programming code so he could demonstrate Flash Fill. That caught the attention of the Office 2013 team, and work began on incorporating the new feature.
Rothschiller recalls his first impressions of the feature:
“Whenever you see something for the first time,” he says, “it takes a while to sink in—especially if the technology is complex or powerful. The general idea was really interesting, but we had some questions. Could it be broadened to cover a few more of the basic scenarios? Can we put this in the hands of basic users? And how much developer time would it take to get the code working inside the Office codebase?
“The hard thing about complex technology is making it simple enough. So the initial demos went really deep, but everything took several clicks and lots of setup to get things to work. The most compelling part of the underlying technology is that everyone can look at data and provide some examples of the output they’d like to see. It’s way harder to encode that into a programming language, script, or formula.
Now, Rothschiller says, “Flash Fill brings the power of text manipulation to the hands of common, non-technical, everyday users. Even people who aren’t familiar with Excel can use the feature.”
Rothschiller took a lead role in designing the new feature for ease of use. And both the Office team and Gulwani’s team worked to port the new feature’s C# programming language into Office’s native C++. Gulwani, together with intern Rishabh Singh and Dany Rouhana, a research software-design engineer, worked closely with Rothschiller, as well as Weide Zhong, a senior software engineer on the Office team, to integrate Flash Fill with Excel 2013.
As Fast as Possible
The Office team also worked to design the research code to make it compatible with Office and more user-friendly. Gulwani, Singh, and Zhong, meanwhile, worked to give Flash Fill the intelligence to understand, in most cases, a user’s intent with just one example, and almost never more than two or three. Their goals were to ensure that Flash Fill was as fast as possible, so users would not have to wait for it to perform a task.
Reviews of Flash Fill from technical journalists who reviewed the release-to-manufacturing version of Excel 2013 have heaped praise on Flash Fill.
“Simply type your intended target text in the next field,” wrote Jonathan Bray in PC Pro, “hit the Flash Fill button on the Data Ribbon, and Excel will complete the job for you, splitting out the text all down the appropriate column. Genius.”
Added Ars Technica: “One of the shock-and-awe features of the Excel 2013 demo was the “flash fill” feature. This is a piece of background logic that watches what you’re putting into cells in a column and looks for patterns. If it recognizes the data as being extracted from text in an adjacent cell, it will automatically “ghost” the rest of the matching content into the column as you type.”
Sociable Blog chimed in: “Kiss formulas and macros goodbye, and say hello to Flash Fill. This new feature is the current buzz among Excel power users.”
There is more to come.
“To start with,” Gulwani says, “we can perform more sophisticated data—manipulation tasks from examples, such as manipulating data types like numbers and dates, and semi-structured data like tables and webpages. However, this technology also has surprising applications in the area of computer-aided education. For instance, it can help teachers generate multiple similar problems from an example problem, for student practice or for use in exams. It can help users create structured content by providing predictive auto-completion for repetitive drawings, such as a ladder or a wheel, and mathematical terms. Ultimately, it even could lead to programming robots by examples.”
Gulwani says several Microsoft Research people helped make Flash Fill a reality: Ben Zorn, a research manager with Microsoft Research who helped initiate the connections with the Office team; Singh, a graduate student from the Massachusetts Institute of Technology who developed extensions to the core Flash Fill technology; Rouhana, who fixed bugs, helped with the product contribution, and, together with Shobana Balakrishnan, principal research program manager with Microsoft Research’s eXtreme Computing Group, helped manage interactions with the Excel team.