IndiMathBench: Autoformalization Mathematical Reasoning Problems with a Human Touch

ArXiv |

Reliable autoformalization remains an elusive goal even in the era of large language models (LLMs). Even the best LLMs struggle to translate natural language into formal constructs in languages like Lean. High-quality data for formal theorem proving has been a key bottleneck given the resource costs associated with manual curation and validation of these translations. On these lines, we introduce INDIMATHBENCH, a human-verified benchmark designed to evaluate mathematical theorem proving, curated using an AI-powered human-assisted pipeline for formalizing natural language problems in Lean. INDIMATHBENCH is composed of 312 formal Lean4 theorems paired with their corresponding informal problem statements, sourced from Indian Mathematics Olympiads. Our pipeline uses category-based retrieval and a self-debug loop with feedback from a symbolic checker to generate candidate formalizations. Multiple such formalizations are generated using an ensemble of LLMs. These formulas are presented with a summary to the human through an interactive dashboard. The dashboard enables efficient validation and repair by the human. We analyze the performance of several state-of-the-art models on INDIMATHBENCH, which will facilitate further research on automated theorem proving. INDIMATHBENCH is available at https://github.com/prmbiy/IndiMathBench.