SABER: Scaling-Aware Best-of-N Estimation of Risk
Scaling-Aware Best-of-N Estimation of Risk A Python package for predicting large-scale adversarial risk in Large Language Models under Best-of-N sampling. Paper: https://arxiv.org/pdf/2601.22636 (opens in new tab) Standard LLM safety evaluations use single-shot (ASR@1) metrics, but real attackers can exploit…