Research Tools: code, datasets, & models

Tool

GenAIScript

Scripting environment with convenient tooling for file ingestion, prompt development and structured data extraction.

Tool

This repo contains the code described in LLMR (opens in new tab), implementing the Large Language Model for Mixed Reality framework. This package serves as a prototype for “Speaking the world into existence”, which allows the…

GitHub

Tool

promptbase

promptbase is an evolving collection of resources, best practices, and example scripts for eliciting the best performance from foundation models like GPT-4. We currently host scripts demonstrating the Medprompt methodology, including examples of how we…

GitHub

Tool

Phi-2

The phi-2 is a language model with 2.7 billion parameters. The phi-2 model was trained using the same data sources as phi-1, augmented with a new data source that consists of various NLP synthetic texts…

Access

Tool

Phi-1.5

The language model phi-1.5 is a Transformer with 1.3 billion parameters. It was trained using the same data sources as phi-1, augmented with a new data source that consists of various NLP synthetic texts. When…

Access Publication

Tool

Phi-1

The language model phi-1 is a Transformer with 1.3 billion parameters, specialized for basic Python coding. Its training involved a variety of data sources, including subsets of Python codes from The Stack v1.2, Q&A content…

Access Publication

Tool

InferredBugs

InferredBugs is a metadata-rich dataset of bugs and fixes in Java and C# programming languages, extracted using Infer (for Java) and InferSharp (for C#). The dataset has been constructed by systematically analyzing open-source repositories, scrutinizing…

GitHub Publication

Tool

NoFunEval

This repository hosts the official code and data artifact for the paper “NoFunEval: Funny How Code LMs Falter on Requirements Beyond Functional Correctness”. The work is a comprehensive evaluation of code language models on real-world code…

GitHub Publication

Tool

Lymphoma lesion segmentation DNN

Lymphoma lesion segmentation and quantitation plays a pivotal role in the diagnosis, treatment planning, and monitoring of lymphoma patients. Accurate segmentation allows for the precise delineation of pathological regions, aiding clinicians in assessing disease extent…

GitHub Publication

Tool

Sarathi-Serve

Sarathi-Serve (a research prototype) is a high throughput and low-latency LLM serving framework. This repository contains a benchmark suite for evaluating LLM performance from a systems point of view. It contains various workloads and scheduling…

GitHub Publication