Content Transfer through Grounded Text Generation

Established: May 14, 2019

Recent work in neural generation has attracted significant interest in controlling the form of text, such as style, persona, and politeness. However, there has been less work on controlling neural text generation for content. This project introduces the notion of Content Transfer for long-form text generation, where the task is to generate a next sentence in a document that both fits its context and is grounded in a content-rich external textual source such as a news story. Our experiments on Wikipedia data show significant improvements against competitive baselines. As another contribution of this project, we release a benchmark dataset of 640k Wikipedia referenced sentences paired with the source articles to encourage exploration of this new task.

Source code, models, and dataset are available on github (opens in new tab).

This work is described in an upcoming NAACL-19 paper, and is also available on arXiv (opens in new tab). If you use of this dataset or models, please cite:
@inproceedings{prabhumoye2019towards, author = {Prabhumoye, Shrimai and Quirk, Chris and Galley, Michel}, title = {Towards Content Transfer through Grounded Text Generation}, year = 2019, booktitle = {Proc. of NAACL} }

People

Shrimai Prabhumoye

PhD student

CMU

Learn more

Chris Quirk

Partner Researcher

Learn more

Michel Galley

Senior Principal Research Manager

Learn more