Consensus-Robust Transfer Attacks via Parameter and Representation Perturbations

Shixin Li; Xiaojing Ma; Zewei Li; Xiaofan Bai; Pingyi Hu; Dongmei Zhang; Bin Benjamin Zhu

Consensus-Robust Transfer Attacks via Parameter and Representation Perturbations

Shixin Li ,
Xiaojing Ma ,
Zewei Li ,
Xiaofan Bai ,
Pingyi Hu ,
Dongmei Zhang ,
Bin Benjamin Zhu

NeurIPS 2025 | October 2025

下载 BibTex

Adversarial attacks threaten the reliability of deep neural networks, particularly in black-box settings where transferability is essential. However, existing transfer-based attacks often fail when the target model’s architecture or training diverges from the surrogate, due to decision-boundary variation and representation drift. We introduce CORTA, a consensus-robust transfer attack that explicitly models these two sources of transfer failure as parameter and representation perturbations on the surrogate model. We formalize transferability as a distributionally robust optimization (DRO) problem over an uncertainty set of plausible targets, and provide efficient first-order approximations with theoretical guarantees. CORTA enforces consensus misclassification by jointly regularizing parameter sensitivity and promoting robustness to feature blending on the surrogate. Extensive experiments on ImageNet and CIFAR-100 show that CORTA consistently outperforms state-of-the-art transfer-based black-box attacks, including ensemble methods, across both convolutional and transformer architectures. For example, when transferring from ResNet-18 to Swin-B on CIFAR-100, CORTA achieves a 19.1\% higher transfer success rate than the strongest baseline. Our approach establishes a new benchmark for robust black-box adversarial evaluation.