Publication
Tabularis Revilio: Converting Text to Tables
Publication
Arithmetic Solving in Z3
Dataset Source Code
RepoClassBench
RepoClassBench (RCB): is a repository-level code-generation benchmark. Retrieve-RepoTools-Reflect (RRR) is a framework for code generation using Language Models (LLMs) with static-analysis tools in an agent setup.
Microsoft Research Blog
RUBICON: Evaluating conversations between humans and AI systems
RUBICON evaluates AI-driven conversations and improves their quality by learning detailed domain-specific rubrics from minimal data. It gathers insights on AI assistant performance while maintaining user privacy and data security.