Data2Text: Automated Text Generation from Structured Data

Established: September 1, 2016

The Data2Text (or Data-to-Text) project aims to automatically generate fluent and fact-based descriptions or utterances given data tables. Typical business applications for text generation include the generation of financial and sports news stories, the generation of product descriptions, the analysis and interpretation of business data, and the analysis and interpretation of Internet of Things data, etc. See below for a few Data2Text applications.

Product Description Generation graphical user interface, application

Writing Assistant

Fact-based QA

Fact-based Conversation

Analytic Narrative Generation graphical user interface, application, Word

The mainstream methods of data-to-text generation include rule-based, template-based approaches and neural network-based approaches. Rule-based and template-based approaches are the mainstream approaches in the relevant applications, as they are clearly interpretable and controllable, making it easier to ensure the correctness of the generated text contents. However, how to create rules and extract high-quality templates require labor-intensive manual feature engineering. On the contrary, the neural network-based models are mainly data-driven, do not need too much human intervention, and can easily produce rich and smooth text description. However, users often can not directly manipulate the content generation and it is difficult to ensure that generated texts are faithful to their input data.

Fact-based Data-to-Text Generation

The Data2Text project aims to develop automated high-fidelity data-to-text generation technologies to address the shortcomings of template-based and the neural network-based approaches.