Microsoft Icecaps


Microsoft Research blog


With natural language processing rapidly increasing in popularity, more and more tools have become available to the public to build large systems. Some of these tools are intended for general-purpose NLP, while others focus on specific domains such as language modeling and text generation. However, few are designed to target conversational scenarios and the specific needs they entail.

Microsoft Icecaps was created to offer researchers and developers an open-source toolkit with a focus on conversational modeling. With a design emphasizing flexibility, modularity, and ease of use, Icecaps empowers users to build customized neural conversational systems that produce personalized, diverse, and informed responses.


Icecaps 0.1 provides a wide array of features for users to build and customize conversational systems.

  • Icecaps’ design is based on a component-chaining architecture, where models are represented as chains of components (e.g. encoders and decoders) that data flows through. This enables complex multi-task learning environments with shared components between tasks.
  • Personalization embeddings, SpaceFusion, and MRC-based knowledge grounding models are recent advances in conversational modeling included in our toolkit.
  • We provide customized decoding tools that allow users to employ maximum mutual information, token filtering, and repetition penalties to improve response quality and diversity.
  • Data processing tools are provided for users to easily convert their text data sets into binarized TFRecords. Our data processor features various text preprocessing tools, including byte pair encoding and fixed-length multi-turn context extraction.

We also have a number of features in the works for Icecaps 0.2:

  • More models, including stochastic answer networks and personalized transformers
  • Lexical and contextual embedding generators
  • New data processing features, including functionality for processing tree-structured JSON data
  • An interactive GUI-based decoding session with improved flexibility


The GitHub repository for Icecaps can be found here. The repository features example scripts that users may use as templates to bootstrap their own projects.

For more information on Icecaps’ features and design, you can view our systems demonstration paper on Icecaps, published at ACL 2019, here.