Transform Data by Example

Transform Data by Example

Established: May 17, 2017

Overview

 

Do you spend hours trying to clean, standardize, and transform data such as date-time, addresses, phone numbers, and URLs? Transform-Data-by-Example is an Excel add-in that helps you find the desired transformation function easily. All you need to do is to provide a few examples of the desired output, and Transform-Data-by-Example will automatically find relevant data transformation functions from a large collection that it has already indexed. Moreover, you can add your own data transformation code into the collection, and Transform-Data-by-Example will make them instantly searchable for you, all seamlessly within Excel. The add-in works on Excel 2016, Excel Online, Excel 2016 for Mac, and Excel 2016 for iPad.

 

Get Your Add-in now

 officedownload_small

How To Use

Once you installed the add-in, you can launch it within Excel by clicking on the “Insert” tab, then clicking “My Add-ins

step-insertaddin

1. Enter a few (at least two) output examples. Select an empty cell in the column you want to fill in and click “Get Transformations” button.

2. The engine returns one or more transformations that are consistent with your input/output examples. You can preview the results of a suggestion by hovering on it. You can compare two suggestions by clicking pin to pin one  and hovering over the other. You can apply and unapply the results of a suggestion by clicking on the “Apply/Unapply”  icon.

3. If none of the returned suggestions can solve your task, consider providing a few additional examples and try “Get Transformations” again.

Sample Data

Sample: Name, Phone number, Address, etc.

 

FAQ

Q1. What Excel versions are supported?

Currently, Transform Data by Example only works with Excel 2016 and Excel Online. Earlier versions of Excel are not currently supported due to JavaScript compatibility issues.

Q2. What transformation functionalities are supported and indexed?

Transform Data by Example indexes a large number of transformation functions from .NET libraries and GitHub with permissive licenses, and certain frequently accessed Wikipedia tables that can be used for transformation (e.g., states-to-abbreviations). With these resources, Transform Data by Example already supports many transformation scenarios in head domains such as name, phone number, address, URL, etc., as well as a variety of tail domains.

While Transform Data by Example supports transformations in many useful domains, it will not support all transformation functionalities out-of-box, because it is possible that the crawler has not seen the libraries that can be useful for your transformation task. If you have the code/library/web-service for the required transformation, you can use the extensibility feature to make these additional transformation functionalities available in your local version of the add-in.

Q3. What Excel native functions are used by Transform Data by Example?

While Transform Data by Example is currently an Excel add-in, it does not rely exclusively on Excel functions. In fact, it indexes and searches transformation functions from sources such as .NET and GitHub, which provide functionalities well beyond native Excel functions. It should also be noted that presently, not all Excel functionalities are indexed by Transform Data by Example.

Q4. What is the difference between Transform Data by Example and FlashFill which is also an Excel feature?

FlashFill is a popular feature in Excel that also uses an example-driven interface to automatically produce transformations. The difference is that FlashFill is mainly intended for simple string-based transformations, while Transform Data by Example leverages a large collection of sophisticated functions that enables it to perform semantic and domain-specific transformations beyond string manipulations. Please refer to our sample examples for more information.

Q5. How many output examples are needed to use Transform Data by Example?

We recommend users to provide 3 or more (ideally different) output examples, but it is not a hard requirement. Using 1 or 2 output examples can still work fine in certain cases, but for some datasets the transformation problem may become under-specified with too few output examples. This can force our system to generate many incorrect transformations intermixed with correct ones, creating difficulty for users to find the correct results. We recommend 3 or more output examples as a general rule of thumb.

Troubleshooting Problems

Q1. ERROR message: “Please select an empty cell inside the data range of the column that you are trying to fill.”

This warning message is triggered in two cases:

  • No valid data region was detected for transformations to be performed. There are two ways to select data region in Transform Data by Example: (1) The user can select any empty cell in the column for which output needs to be produced, in such a case a contiguous non-empty region to the left is automatically inferred as the data region (see the first figure below), or (2) The user explicitly selects a region where the last column is used as the output column, in which values are only partially filled (see the second figure below). No valid data region can be detected if neither condition above is met. Please explicitly select a data region and click “Get Transformations” again.
  • All cells in the output column are already filled in. If there are no empty cells to populate in the output column (see example in the figure below) this warning message is raised. Please review the data region shown in a shaded area on your spreadsheet, and adjust as necessary.

Q2. No result is returned after clicking “Get Transformations”.

First, please verify that a valid data region is selected. If you see warning messages displayed at the top of the result pane, please refer to Q1 above to ensure that a valid data region is selected.

If the data region selected is valid, please check that example values provided in the output column are correct. Currently, the programs we generate are required to be consistent with all output examples provided by users. As such, a minor typo in the output column can prevent results from being produced. In some cases, you may consider removing examples that may be too restrictive.

Lastly, it is possible that the crawler has not seen the libraries that can be useful for your transformation task. However, if you have the code/library/web-service for the required transformation, you can use the extensibility feature of Transform Data by Example to make additional transformation functionalities available to you. Note that all extensions you make to the service are visible only to you and not other users.

Q3. Too many results are returned, none of which look relevant.

This typically happens when the transformation task is under-specified, because not enough examples are provided. Please consider providing more (ideally different) output examples, and please refer to Q5 of FAQ above for more information.

Q4. I clicked on “Apply/Unapply” by mistake and now all empty cells in the output column are filled with values. How do I undo this action?

To undo this action, simply click on the same “Apply/Unapply” button again. This button works in toggle mode. Note that the Excel built-in “undo” functionality does not work for data populated by the Add-in.

Q5. ERROR message “Sorry, we had to restart because this add-in wasn’t responding.”

This happens when the connection to the backend service is lost or unstable. Please right-click any white open area in the Add-in, then click “reload” and try again.

Q6. My output data are formatted into strange numbers that I don’t recognize.

Occasionally, numbers are recognized as dates and other built-in data types, and will be automatically formatted by Excel. Please select the data region in question, right-click your mouse, click “format text” in the menu, then click “Text” from the list of options, and finally click “OK”. This will disable auto-formatting in the data region. Please refer to this article for more information.

Q7. When I scroll on the Add-In, Excel appears to freeze and become unresponsive.

There is a known issue where Excel the presence of an Add-In causes Excel to freeze. As a workaround, if you change focus to any other application on your machine, and then change focus back to Excel, it will become responsive again.

Extensibility

Even though Transform Data by Example already indexes a large collection of data transformation functions, it is also designed to be extensible, because in different industries and application domains, there are often domain-specific problems and proprietary data sets (such as your enterprise-specific data) that Transform Data by Example has not seen and does not support yet.  Users can add their own data transformation functions using the extensibility feature, as shown below.

As a simple demo example, suppose you want to add a hex-to-decimal conversion functionality. You can easily create an Azure Function and register it with Transform Data by Example.

While more tutorial of Azure Function can be found here, for this example we only need to modify the hello-world created for HTTP-triggered C# function, by adding a simple function that performs hex-to-decimal conversion (highlighted in the second figure below, the code is adapted from a StackOverflow solution here).

Azure function: Hello World

Azure function: Extended with Hex-to-Decimal (changes highlighted)

We then change the HTTP method from POST to GET in the dropdown menu at top-right — for simplicity we will pass parameters in URL by HTTP GET.

Finally, run this Azure Function from your browser by copying the provided function URL, and adding a “&name=ff” parameter at the end to test conversion for Hex value “ff” (shown highlighted in the address box in the picture below). This will invoke the newly created Azure Function deployed in the cloud. The expected result string will show up in the browser as below.

Now we can simply add that function URL above to Transform Data by Example using the extensibility interface. From the Transform Data by Example  add-in, click on the menu button (shown below).

In the next page, paste the function URL (with the parameter string) into the text box and click “Add”.

Now if we invoke our test case by searching from Excel spreadsheet again, the result will show up as the top suggestion.

 

Release Notes

This version of add-in works on Excel 2016, Excel Online,  and Excel 2016 for Mac (ver. 15.20 or later).

 

Need help?  Have feedback?

We will carefully review all your requests. Please visit our forum and let us know. Thanks.

 

Privacy  | Terms | Feedback | Rate us | Follow us on Twitter