Frames Dataset

Frames Dataset



A generation of voice assistants such as Siri, Cortana, and Google Now have been popular spoken dialogue systems. More recently, we have seen a rise in text-based conversational agents (aka chatbots). Text is preferred to voice by many users for privacy reasons and in order to avoid bad speech recognition in noisy environments. These agents are also welcome as an alternative to downloading and installing applications. This makes a lot of sense when completing simple tasks such as ordering a cab or asking for the weather.

In most cases, much like voice assistants, these chatbots only support very simple and sequential interactions. The reason is that the user’s goal is well-defined and dialogue flow can be easily hand-crafted. However, there are other use-cases such as customer service, or travel booking where there is a decision-making process.

Frames is precisely meant to encourage research towards conversational agents which can support decision-making in complex settings, in this case – booking a vacation including flights and a hotel. More than just searching a database, we believe the next generation of conversational agents will need to help users explore a database, compare items, and reach a decision.

The dialogues in Frames were collected in a Wizard-of-Oz fashion. Two humans talked to each other via a chat interface. One was playing the role of the user and the other one was playing the role of the conversational agent. We call the latter a wizard as a reference to the Wizard of Oz, the man behind the curtain. The wizards had access to a database of 250+ packages, each composed of a hotel and round-trip flights. We gave users a few constraints for each dialogue and we asked them to find the best deal. This resulted in complex dialogues where a user would often consider different options, compare packages, and progressively build the description of her ideal trip.

Frame Tracking

With this dataset, we also present a new task: frame tracking. Our main observation is that decision-making is tightly linked to memory. In effect, to choose a trip, users and wizards talked about different possibilities, compared them and went back-and-forth between cities, dates, or vacation packages.

Current systems are memory-less. They implement slot-filling for search as a sequential process where the user is asked for constraints one after the other until a database query can be formulated. Only one set of constraints is kept in memory. For instance, in the illustration below, on the left, when the user mentions Montreal, it overwrites Toronto as destination city. However, behaviours observed in Frames imply that slot values should not be overwritten. One use-case is comparisons: it is common that users ask to compare different items and in this case, different sets of constraints are involved (for instance, different destinations). Frame tracking consists of keeping in memory all the different sets of constraints mentioned by the user. It is a generalization of the state tracking task to a setting where not only the current frame is memorized.

Adding this kind of conversational memory is key to building agents which do not simply serve as a natural language interface for searching a database but instead accompany users in their exploration and help them find the best item.

Data Collection

Dialogues were performed by 12 participants over a period of 20 days.

We deployed a Slack bot named wozbot enabling participants to pair up. Wizards were given a link to a search interface at the beginning of each dialogue. The search interface was a simple graphical interface with all the searchable fields in the database (destination, origin, budget, dates, etc.).

For each dialogue, a user was paired up with an available wizard and received a new task.


Task 1

Find a vacation between September 1st and September 8th to Havana from Stuttgart for under $700. Dates are not flexible. If not available, then end the conversation.

Why this task?

The setting is simple and the user has a good idea of what she wants. Therefore, the agent only needs to help the user find suitable packages and book one. This situation is the first one that a conversational agent for travel booking should handle.


Task 2

Find a vacation between September 1st and September 8th to Havana from Stuttgart for under $700. Dates are not flexible. If not available, then end the conversation.

Why this task?

Find a vacation between September 1st and September 8th to Havana from Stuttgart for under $700. Dates are not flexible. If not available, then end the conversation.


Task 3

You either want to go to New York, Tokyo, Berlin, or Paris from Montreal. You want to travel sometime between August 23rd and September 1st. Ask for information about each package. Compare the packages and pick the one you like best.

Why this task?

For this task, the user has to compare options for different cities. Here, we model the case where a user looks at specific destinations and tries to find the best trip. This requires extensive exploration of the database and a memory of the different options which have been discussed.


See other datasets from Microsoft Montreal:
NewsQA | FigureQA




Average number of turns per dialogue



Average user satisfaction (from 1-5)



Data Analysis


User and Machine Actions

Here, we provide some insight into the annotation scheme: each frame is given an id (here, 1 and 2). When values from one frame are used in another frame, we use a ref tag to point to the reference frame.

User I’d like to book a trip to boston from London on Saturday, August 13, 2016 for 8 adults. I have a tight budget of 1700. Frame #1
Act 1 inform(intent=book)  
Act 2 inform(dst_city = boston, or_city = London, str_date = Saturday\, August 13\, 2016, n_adults = 8, budget = 1700)”  
Wizard “Hi…I checked a few options for you, and unfortunately, we do not currently have any trips that meet this criteria. Would you like to book an alternate travel option?” Frame #1
Act 1 no_result  
Act 2 suggest(dst_city)  
User “Yes, how about going to Detroit from London on August 13, 2016 for 5 adults. For this trip, my budget would be 1900.” Frame #2
Act 1 inform(dst_city = Detroit, n_adults = 5, budget = 1900, ref = [1{or_city = London, str = August 13\, 2016}])  
Wizard “I checked the availability for those dates and there were no trips available. Would you like to select some alternate dates?” Frame #2
Act 1 no_result(str_date, end_date)  
Act 2 suggest(str_date, end_date)  



Frame Change Statistics

Maximum number of frame changes

Average number of frame changes


Frame Statistics

Maximum number of frames in a conversation

Average number of frames in a conversation



Notice: Hotel names, flight information, and vacation packages are fictitious and any resemblance to actual hotel names or vacation packages is coincidental. Use at your own risk.

Dataset Format

We provide the Frames dialogues in JSON format. Each dialogue has five main fields: user_id, wizard_id, id, userSurveyRating and turns. More details on these fields can be found in the paper.

These are some of the important fields in the Frames dataset, however, we do encourage you to go over the paper for further description of the various fields in the dataset.

Global Properties

Key Name Example
id Refers to a unique identification for the dialogue.
user_id Refers to a unique identifier for the user taking part in the dialogue.
wizard_id Refers to a unique identifier for the wizard taking part in the dialogue.


Key Name Example
userSurveyRating A value that represents the user’s satisfaction with the Wizard’s service, ranging from 1 – complete dissatisfaction to 5 – complete satisfaction.
wizardSurveyTaskSuccessful A boolean which is true if the wizard thinks at the end of the dialogue that the user’s goal was achieved.



Key Name Example
author The author of the message in a dialogue. i.e. “user” or “wizard”.
text The sentence that the author uttered. It is the exact text that the author of a turn said. E.g. “text”: “Consider it done. Have a great trip!”.
labels JSON object which has three keys: active_frame, acts, and acts_without_refs. The active_frame is the id of the currently active frame. The acts are the dialogue acts for the current utterance. Each act has a name and arguments args. The name is the name of the dialogue act, for instance, offer, or inform. The args contain the slot types (key) and slot values (val), for instance budget=$2000. Slot values are optional. An act contains a ref tag whenever a user or wizard refers to a past frame. The acts_without_refs are similar to the acts except that they do not have these ref tags. We define the frame tracking task as the task that takes as input the acts_without_refs and outputs the acts.
timestamp Unix timestamp denoting the time at which the current turn occurred.
frames List of frames up to the current turn. Each frame has the following keys: frame_id, frame_parent_id, requests, binary_questions, compare_requests, and info.
db It can only occur during a wizard’s turn. It is a list of search queries made by the wizard with the associated list of search results.

E.g. “db”: {“search”: [{“ORIGIN_CITY”: “Montreal”}], “result”: []}


Key Name Example
frame_id Id of the frame.
frame_parent_id Id of the parent frame.
requests, binary_questions, compare_requests Requests are questions related to one frame, for instance “what is the price of this package?”. Compare_requests concern several frames. For example, the user might ask to compare different packages: “What is the guest rating of these two hotels?”. Binary_questions are questions with both a slot type and a slot value. These are special cases of requests and compare_requests, for instance “are both hotels 3.5 stars?”.
info The info contains all the constraints set by the user or the wizard in the frame. These constraints are expressed as slot types which have a value. Note that each slot can have multiple values, which accumulate as long as the frame does not change. For example, the price can be both “1000 USD” and “cheapest”. There are two additional fields to keep track of specific aspects of the dialogue:

REJECTED a boolean value expressing if the user negated or affirmed an offer made by the wizard.

MOREINFO a boolean value expressing whether the user wants to know more about the frame in question

GitHub Repo

Code to reproduce the evaluation of the frame-tracking model in A Frame Tracking Model for Memory-Enhanced Dialogue Systems: