Research Tools: code, datasets, & models

Tool

Chartifact

Declarative, interactive data documents Chartifact is a low-code document format for creating interactive, data-driven pages such as reports, dashboards, and presentations. It travels like a document and works like a mini app. Designed for use…

GitHub

Tool

LiteBox

A security-focused library OS [!NOTE]This project is currently actively evolving and improving. While we areworking toward a stable release, some APIs and interfaces may change as thedesign continues to mature. You are welcome to explore…

GitHub

Tool

TestExplora

This repository is the official implementation of the paper “TestExplora: Benchmarking LLMs for Proactive Bug Discovery via Repository-Level Test Generation” It can be used for baseline evaluation using the prompts mentioned in the paper. TestExplora…

GitHub Publication

Tool

SABER: Scaling-Aware Best-of-N Estimation of Risk

Scaling-Aware Best-of-N Estimation of Risk A Python package for predicting large-scale adversarial risk in Large Language Models under Best-of-N sampling. Paper: https://arxiv.org/pdf/2601.22636 (opens in new tab) Standard LLM safety evaluations use single-shot (ASR@1) metrics,…

GitHub Publication

Tool

SigmaCollab

SigmaCollab is a dataset that enables research on human-AI physically situated collaboration. The dataset consists of a set of 85 sessions in which untrained participants were guided by a mixed-reality assistive AI agent in performing…

GitHub

Tool

latent-zoning-networks

Generative modeling, representation learning, and classification are three core problems in machine learning (ML), yet their state-of-the-art (SoTA) solutions remain largely disjoint. In this paper, we ask: Can a unified principle address all three? Such…

GitHub

Tool

MarS: a Financial Market Simulation Engine Powered by Generative Foundation Model

We propose Large Market Model (LMM), an order-level generative foundation model, for financial market simulation, akin to language modeling in the digital world. Our financial Market Simulation engine (MarS), powered by LMM, addresses the domain-specific…

GitHub

Tool

MIRA: Medical Time Series Foundation Model for Real-World Health Data

MIRA is a foundation model for medical time-series, designed to learn a unified representation space across heterogeneous clinical datasets and support zero-shot forecasting in real-world healthcare settings. Unlike conventional time-series models that operate on fixed…

GitHub Publication

Tool

OptiMind

OptiMind-SFT is a specialized 20B parameter model designed to bridge the gap between natural language and executable optimization solvers. It automates the translation of complex decision-making problems—such as supply chain planning, scheduling, and resource allocation—into…

Access Publication

Tool

Model-based Testing using LLMs

This repsitory contains the code for our paper Eywa: Automating Model-based Testing using LLMs. Our framework uses LLMs to automatically construct modular protocol models from natural-language specifications and applies symbolic execution and differential testing to…

GitHub