Research Tools: code, datasets, & models

Tool

Lost in Conversation

Lost in Conversation is a code repository to facilitate benchmarking LLMs on multi-turn task completion and the reproduction of experiments included in the accompanying paper: “LLMs Get Lost in Multi-Turn Conversation”.

GitHub

Tool

TypeAgent

TypeAgent is sample code that explores an architecture for building a single personal agent with natural language interfaces leveraging current advances in LLM technology. The goal of the TypeAgent team is to explore how to…

Access

Tool

Muse

Developed by Microsoft Research in collaboration with game studio Ninja Theory, Muse is a World and Human Action Model (WHAM) – a generative AI model of a video game that can generate game visuals, controller…

Access

Tool

Raha: Failure analysis of a production WAN

Raha (MetaOpt) uses our open-source heuristic analyzer to quantify the impact of failures on a traffic engineered WAN.

GitHub

Tool

Implicit Language Models are RNNs

Implicit Language models are language models that calculate their outputs as a fixed-point iteration rather than a single forward pass. Implicit models offer increased representational power compared to transformers as we show in the accompanying…

GitHub

Tool

HORIZON: A Benchmark for in-the-wild User Behavior Modeling

This repository provides the code to construct the HORIZON benchmark — a large-scale, cross-domain benchmark built by refactoring the popular Amazon-Reviews 2023 dataset for evaluating sequential recommendation and user behavior modeling. We do not release…

GitHub

Tool

MarkItDown

MarkItDown is a lightweight Python utility for converting various files to Markdown for use with LLMs and related text analysis pipelines. To this end, it is most comparable to textract, but with a focus on…

GitHub

Tool

Code Release for Reprompting: Automated Chain-of-Thought Prompt Inference Through Gibbs Sampling

We introduce Reprompting, an iterative sampling algorithm that automatically learns the Chain-of-Thought (CoT) recipes for a given task without human intervention. Through Gibbs sampling, Reprompting infers the CoT recipes that work consistently well for a…

GitHub

Tool

fLSA: Learning Semantic Structures in Document Collections Using Foundation Models

Humans can learn to solve new tasks by inducing high-level strategies from example solutions to similar problems and then adapting these strategies to solve unseen problems. Can we use large language models to induce such…

GitHub

Tool

Magma

Magma is a multimodal foundation model designed to both understand and act in digital and physical environments. Magma builds on the foundation models paradigm that pretraining on a larger amount of more diverse datasets allows…

Access Video Project