Data Formulator: Vibe with your data, in control
- Karen Easterbrook, Microsoft; Chenglong Wang, Microsoft
Data Formulator is an AI-powered tool for analysts to iteratively explore and visualize data. Started with data in any format (screenshot, text, csv, or database), users can work with AI agents with a novel blended interface that combines user interface interactions (UI) and natural language (NL) inputs to communicate their intents, control branching exploration directions, and create reports to share their insights.
Explore more
- Data Formulator on GitHub (opens in new tab)
- Data Formulator 2: Iterative Creation of Data Visualizations, with AI Transforming Data Along the Way
- Data Formulator on Azure AI Foundry Labs (opens in new tab)
Transcript
Data Formulator: Vibe with your data, in control
[MUSIC]
[MUSIC FADES INTO SWEEPING SOUND]
KAREN EASTERBROOK: At Microsoft Research, we see fresh innovations emerge out of the labs ready to reshape how people and organizations harness technology in their everyday work.
A fellow Redmond-based colleague, Chenglong Wang, principal researcher, works with his team to reimagine how analysts interact with data. By combining natural language and intuitive interface design, they introduced Data Formulator. This tool helps users guide AI collaboratively—to visualize, explore, and truly “vibe” with data, in control.
Over to you, Chenglong.
[MUSIC]
[MUSIC FADES INTO SWEEPING SOUND]
CHENGLONG WANG: Hello, everyone. I’m Chenglong from Microsoft Research Redmond. Today, I’d like to introduce you [to] Data Formulator. It’s an interactive AI-powered visualization tool for you to explore data with AI agents, in control.
Why do we build this tool? For example, given a dataset about US product prices from 2005 to 2025, we may be curious about, how much does a product price grow over the years, as everything around us has become more expensive lately, right? The best way to answer this question is to explore the data and show insights with some visualizations. But how can we explore effectively?
To get these insights, we need to go through an iterative data exploration process.
We first need to formulate plans based on our data. Then we need programming skills to transform and implement the visualizations. We often don’t solve it in one step. We may need to go back to take another branch, to backtrack, and revisit some steps. After a few iterations, we should find some insights and we’re ready to share.
This process, however, can be challenging. It requires a user to have both data science expertise to formulate good questions and also requires good programming skills to implement these designs. How can we make this accessible to everyone who is interested in data?
Luckily, with AI agents that can generate code from user instructions, coding is no longer a big barrier. However, to make data analysis truly accessible, we still need to lower the interaction friction between user and AI.
The first challenge is that natural language can be quite universal, but it can be verbose for describing the visualization intent and it may not be very precise. The second is that chat interface is a way how we go with interact with agents, but it doesn’t support iteration naturally as we want for data analysis. Third, agents can be difficult to control if they are working in a black box. The user may easily lose control over what agent is doing.
In order to solve this problem, we introduce Data Formulator. It’s an interactive data analysis tool for people to work with agents effectively to explore data.
It first features a multi-modal interaction. The multi-modal UI combines UI interaction and natural language so that users can specify their design both precisely and concisely. Second, to support iteration, we organize exploration into threads, allowing users to backtrack, follow up, and revisit. On top of that, we developed agent mode. Instead of running in a black box, it operates on top of the data thread so the user can take control whenever they want.
Let me show you the experience of using Data Formulator to explore data.
Let’s explore, how much do US product prices increase over the decades? Let’s first load the data. This data. We can start with agent mode; ask AI agent to automatically explore the data on our behalf. The agent first generates a plan, and then it’s trying to generate code to execute the plan by transforming the data and visualize it.
This is the first chart recommended by the agent. It’s a line chart for temporal price trends. Wow, everything has been more expensive now.
It’s now recommended a second chart. It’s a bar chart to show the product volatility. It seems egg price is most volatile.
Now we can take control; we can ask a follow-up question using natural language: how does COVID and 2008 crisis affect prices?
It now generates a stacked bar chart, not certainly the best way to visualize it. We can use UI to change it to a grouped bar chart. Now it’s clear. It seems that COVID affected prices more significantly. That’s very interesting.
We can now go back to the previous branch to explore something different. Hmm, what should we explore here? We can ask the AI agent for some ideas. It seems it’s interesting to see the price correlation with gas price. Let’s go with that.
Now we get a bar chart to show the price correlation with gas price. It seems that most product prices are correlated to the gas, except … tomato?
We can continue to explore, but now we can also write a report to share our findings. We can ask AI agents to generate a report grounded on our data. It will leverage the computation, data, and charts to compose a report.
As you can see, with AI agents automating the implementations, we can easily vibe with our data. The blended UI-plus-natural-language-interaction approach allows us to easily work with agents. While agent is automating both steps, we can easily take control when we need to go to the exploration path we like.
So this is Data Formulator. It is an interactive tool for people to explore and visualize data. It features a multi-modal interface that makes it easy for users to specify visualization ideas. It’s providing a new design, data threads, so that users can perform nonlinear interactions with AI agents in exploration tasks. Thirdly, the agent plus interactive modes allow users to vibe with data but still having full control. We invite you to try Data Formulator on data-formulator.ai. Explore with some data and let us know what you find.
-
-
Karen Easterbrook
Senior Director
-
Chenglong Wang
Senior Researcher
-
-
研究フォーラムSeason 2, Episode 2
-
-
Accelerating MRI image reconstruction with Tyger
- Karen Easterbrook,
- Ilyana Rosenberg
-
-
-