NOVA: An Agentic Framework for Automated Histopathology Analysis and Discovery
- Anurag Vaidya ,
- Felix Meissen ,
- Daniel Coelho de Castro ,
- Shruthi Bannur ,
- Tristan Lazard ,
- Drew F. K. Williamson ,
- Faisal Mahmood ,
- Javier Alvarez-Valle ,
- Stephanie Hyland ,
- Kenza Bouzid
Machine Learning for Health Symposium 2025 |
Published by PMLR
Digitized histopathology analysis involves complex, time-intensive workflows and specialized expertise, limiting its accessibility. We introduce NOVA, an agentic framework that translates scientific queries into executable analysis pipelines by iteratively generating and running Python code. NOVA integrates 49 domain-specific tools (e.g., nuclei segmentation, whole-slide encoding) built on open-source software, and can also create new tools ad hoc. To evaluate such systems, we present SlideQuest, a 90-question benchmark — verified by pathologists and biomedical scientists — spanning data processing, quantitative analysis, and hypothesis testing. Unlike prior biomedical benchmarks focused on knowledge recall or diagnostic QA, SlideQuest demands multi-step reasoning, iterative coding, and computational problem solving. Quantitative evaluation shows NOVA outperforms coding-agent baselines, and a pathologist-verified case study links morphology to prognostically relevant PAM50 subtypes, demonstrating its scalable discovery potential.