Microsoft Icecaps

Publications

Microsoft Research Blog

Overview

With natural language processing rapidly increasing in popularity, more and more tools have become available to the public to build large systems. Some of these tools are intended for general-purpose NLP, while others focus on specific domains such as language modeling and text generation. However, few are designed to target conversational scenarios and the specific needs they entail.

Microsoft Icecaps was created to offer researchers and developers an open-source toolkit with a focus on conversational modeling. With a design emphasizing flexibility, modularity, and ease of use, Icecaps empowers users to build customized neural conversational systems that produce personalized, diverse, and informed responses.

Features

Icecaps provides a wide array of features for users to build and customize conversational systems.

  • Icecaps’ design is based on a component-chaining architecture, where models are represented as chains of components (e.g. encoders and decoders) that data flows through. This enables complex multi-task learning environments with shared components between tasks.
  • Personalization embeddings, SpaceFusion, and MRC-based knowledge grounding models are recent advances in conversational modeling included in our toolkit.
  • We provide customized decoding tools that allow users to employ maximum mutual information, token filtering, and repetition penalties to improve response quality and diversity.
  • Data processing tools are provided for users to easily convert their text data sets into binarized TFRecords. Our data processor features various text preprocessing tools, including byte pair encoding and fixed-length multi-turn context extraction.

Icecaps v0.2.0

Our most recent version of Icecaps, v0.2.0, introduced the following functionalities:

  • Personalization embeddings for transformer models
  • Early stopping variant for performing validation across all saved checkpoints
  • Implementations for both SpaceFusion and StyleFusion
  • New text data processing features, including sorting and trait grounding
  • Tree data processing features from JSON files using the new JSONDataProcessor

Resources

The GitHub repository for Icecaps can be found here. The repository features example scripts that users may use as templates to bootstrap their own projects.

For more information on Icecaps’ features and design, you can view our systems demonstration paper on Icecaps, published at ACL 2019, here.

People