Recent work in neural generation has attracted significant interest in controlling the form of text, such as style, persona, and politeness. However, there has been less work on controlling neural text generation for content. This project introduces the notion of Content Transfer for long-form text generation, where the task is to generate a next sentence in a document that both fits its context and is grounded in a content-rich external textual source such as a news story. Our experiments on Wikipedia data show significant improvements against competitive baselines. As another contribution of this project, we release a benchmark dataset of 640k Wikipedia referenced sentences paired with the source articles to encourage exploration of this new task.
Source code, models, and dataset are available on github (opens in new tab).
This work is described in an upcoming NAACL-19 paper, and is also available on arXiv (opens in new tab). If you use of this dataset or models, please cite:
@inproceedings{prabhumoye2019towards,
author = {Prabhumoye, Shrimai and Quirk, Chris and Galley, Michel},
title = {Towards Content Transfer through Grounded Text Generation},
year = 2019,
booktitle = {Proc. of NAACL}
}
People
Shrimai Prabhumoye
PhD student
CMU
Chris Quirk
Partner Researcher
Michel Galley
Senior Principal Research Manager