Parameter-Free Last-Iterate Convergence of Counterfactual Regret Minimization Algorithms

Linjian Meng; Youzhi Zhang; Shangdong Yang; Tianyu Ding; Zhenxing Ge; Wenbin Li; Tianpei Yang; Bo An; Yang Gao

Parameter-Free Last-Iterate Convergence of Counterfactual Regret Minimization Algorithms

Linjian Meng ,
Youzhi Zhang ,
Shangdong Yang ,
Tianyu Ding ,
Zhenxing Ge ,
Wenbin Li ,
Tianpei Yang ,
Bo An ,
Yang Gao

NeurIPS 2025 | October 2025

Download BibTex

To establish last-iterate convergence for Counterfactual Regret Minimization (CFR) algorithms in learning a Nash equilibrium (NE) of extensive-form games (EFGs), recent studies reformulate learning an NE of the original EFG as learning the NEs of a sequence of (perturbed) regularized EFGs. Hence, proving last-iterate convergence in solving the original EFG reduces to proving last-iterate convergence in solving (perturbed) regularized EFGs. However, these studies only establish last-iterate convergence for Online Mirror Descent (OMD)-based CFR algorithms instead of Regret Matching (RM)-based CFR algorithms in solving perturbed regularized EFGs, resulting in a poor empirical convergence rate, as RM-based CFR algorithms typically outperform OMD-based CFR algorithms. In addition, as solving multiple perturbed regularized EFGs is required, fine-tuning across multiple perturbed regularized EFGs is infeasible, making parameter-free algorithms highly desirable. This paper show that CFR, a classical parameter-free RM-based CFR algorithm, achieves last-iterate convergence in learning an NE of perturbed regularized EFGs. This is the first parameter-free last-iterate convergence for RM-based CFR algorithms in perturbed regularized EFGs. Leveraging CFR to solve perturbed regularized EFGs, we get Reward Transformation CFR (RTCFR). Importantly, we extend prior work on the parameter-free property of CFR, enhancing its stability, which is vital for the empirical convergence of RTCFR. Experiments show that RTCFR exhibits a significantly faster empirical convergence rate than existing algorithms that achieve theoretical last-iterate convergence.