BusyBox
BusyBox is a physical 3D-printable device for benchmarking affordance generalization in robot foundation models. It features Please check out our website (opens in new tab) for more details. For fully building a instrumented BusyBox capable…
Member of Technical Staff, Principal Tech Lead Manager, Image Generation
We are hiring a Principal Tech Lead Manager to own and grow Copilot’s image generation capabilities. You will set the technical direction for image generation, lead a team of Applied AI engineers and platform engineers,…
Member of Technical Staff, Senior Applied AI Engineer, Image Generation
We’re hiring a Senior Applied AI Engineer, Image Generation to join a fast‑moving, high‑ownership team building next‑generation AI assistant and productivity capabilities. This role blends LLM product engineering, evaluation science, hillclimbing, and internal tool building…
TestExplora
This repository is the official implementation of the paper “TestExplora: Benchmarking LLMs for Proactive Bug Discovery via Repository-Level Test Generation” It can be used for baseline evaluation using the prompts mentioned in the paper. TestExplora…
Systematic debugging for AI agents: Introducing the AgentRx framework
As AI agents transition from simple chatbots to autonomous systems capable of managing cloud incidents, navigating complex web interfaces, and executing multi-step API workflows, a new challenge has emerged: transparency. When a human makes a…