Debug-Gym
debug-gym is a text-based interactive debugging framework, designed for debugging Python programs.
debug-gym is a text-based interactive debugging framework, designed for debugging Python programs.
Text Adventure Learning Environment Suite (TALES) – Benchmark to evaluate language models on interactive text environments. This repository contains the files needed to benchmark language agents on a curated list of text-based games from the following frameworks: Jericho, TextWorld, TextWorld-Express,…
Structured Entity Extraction and the Approximate Entity Set OverlaP (AESOP) metric are designed to appropriately assess model performance.
ORCAS is a click-based dataset associated with the TREC Deep Learning Track. It covers 1.4 million of the TREC DL documents, providing 18 million connections to 10 million distinct queries.
Tip-of-the-tongue (ToT) known-item retrieval is defined as “an item identification task in which the searcher has previously experienced an item but cannot recall a reliable identifier” (i.e., “It’s on the tip of my tongue…”). The TREC ToT track aims to…
The TREC Deep Learning Track studies information retrieval in a large training data regime. This is the case where the number of training queries with at least one positive label is at least in the tens of thousands, if not…
We view Large Language Models as stochastic language layers in a network, where the learnable parameters are the natural language prompts at each layer. We stack two such layers, feeding the output of one layer to the next. We call…
INTREPID (stands for INTeractive learning via REPresentatIon Discovery) is a library that contains various interactive learning algorithms that learn a representation (or a latent state) from observational data in order to complete their tasks.
Codebase for generative modeling of protein sequence and structure, including code for CNNs and GNNs and custom data handling code.