Parameter-Free Last-Iterate Convergence of Counterfactual Regret Minimization Algorithms

  • Linjian Meng ,
  • Youzhi Zhang ,
  • Shangdong Yang ,
  • ,
  • Zhenxing Ge ,
  • Wenbin Li ,
  • Tianpei Yang ,
  • Bo An ,
  • Yang Gao

NeurIPS 2025 |

To establish last-iterate convergence for Counterfactual Regret Minimization (CFR) algorithms in learning a Nash equilibrium (NE) of extensive-form games (EFGs), recent studies reformulate learning an NE of the original EFG as learning the NEs of a sequence of (perturbed) regularized EFGs. Hence, proving last-iterate convergence in solving the original EFG reduces to proving last-iterate convergence in solving (perturbed) regularized EFGs. However, these studies only establish last-iterate convergence for Online Mirror Descent (OMD)-based CFR algorithms instead of Regret Matching (RM)-based CFR algorithms in solving perturbed regularized EFGs, resulting in a poor empirical convergence rate, as RM-based CFR algorithms typically outperform OMD-based CFR algorithms. In addition, as solving multiple perturbed regularized EFGs is required, fine-tuning across multiple perturbed regularized EFGs is infeasible, making parameter-free algorithms highly desirable. This paper show that CFR, a classical parameter-free RM-based CFR algorithm, achieves last-iterate convergence in learning an NE of perturbed regularized EFGs. This is the first parameter-free last-iterate convergence for RM-based CFR algorithms in perturbed regularized EFGs. Leveraging CFR to solve perturbed regularized EFGs, we get Reward Transformation CFR (RTCFR). Importantly, we extend prior work on the parameter-free property of CFR, enhancing its stability, which is vital for the empirical convergence of RTCFR. Experiments show that RTCFR exhibits a significantly faster empirical convergence rate than existing algorithms that achieve theoretical last-iterate convergence.