AMID: Model-Agnostic Dataset Distillation by Adversarial Mutual Information Minimization
- Aoqi Wu ,
- Junming Liu ,
- Yuwei Zhang ,
- Weiquan Huang ,
- Liang Hu ,
- Yifan Yang ,
- Qi Zhang ,
- Jiaxing Miao ,
- Yuhan Tang ,
- Zhongyuan Lai
ACM Web Conference 2026 |
The escalating energy consumption and carbon footprint of training large-scale Web AI models pose urgent challenges for sustainable development. Dataset Distillation (DD) offers a promising avenue for green AI by compressing large datasets into small synthetic ones for efficient training. However, most existing DD methods overfit to the inductive biases of specific source architectures (e.g., CNNs or ViTs), resulting in poor cross-model generalization. This limitation necessitates redundant re-distillation processes for different architectures, severely undermining the energy-saving potential of DD. To address this, we introduce Adversarial Mutual Information Distillation (AMID), a rigorous framework designed to create highly reusable and robust synthetic datasets. From an information-theoretic perspective, we cast model-agnosticism as minimizing the mutual information (MI) between the synthetic data and the specific identity of the distillation model. We convert this intractable objective into a tractable two-player adversarial game, which unifies knowledge preservation with adversarial unlearning of architectural bias. Extensive experiments on CIFAR-10 and Tiny ImageNet demonstrate that AMID achieves state-of-the-art cross-architecture generalization across diverse CNNs and ViTs. Crucially, our analysis confirms that AMID significantly reduces the computational overhead and CO2 emissions of downstream training while maintaining robust performance, paving the way for energy-efficient, transferable, and sustainable Web AI ecosystems.