Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs
Zhihe Yang, Xufang Luo, Zilong Wang, Dongqi Han, Zhiyuan He, Dongsheng Li, Yunjian Xu
2026 International Conference on Learning Representations | April 2026
Zhihe Yang, Xufang Luo, Zilong Wang, Dongqi Han, Zhiyuan He, Dongsheng Li, Yunjian Xu
2026 International Conference on Learning Representations | April 2026
Zhiyuan He, Xufang Luo, Yike Zhang, Yuqing Yang, Lili Qiu
September 2025
Siyun Zhao, Yuqing Yang, Zilong Wang, Zhiyuan He, Luna K. Qiu, Lili Qiu
September 2024
Zhihe Yang, Xufang Luo, Zilong Wang, Dongqi Han, Zhiyuan He, Dongsheng Li, Yunjian Xu
2026 International Conference on Learning Representations | April 2026
Zhihe Yang, Xufang Luo, Zilong Wang, Dongqi Han, Zhiyuan He, Dongsheng Li, Yunjian Xu
2026 International Conference on Learning Representations | April 2026
Zhiyuan He, Xufang Luo, Yike Zhang, Yuqing Yang, Lili Qiu
September 2025
Siyun Zhao, Yuqing Yang, Zilong Wang, Zhiyuan He, Luna K. Qiu, Lili Qiu
September 2024
Zhihe Yang, Xufang Luo, Zilong Wang, Dongqi Han, Zhiyuan He, Dongsheng Li, Yunjian Xu
2026 International Conference on Learning Representations | April 2026
Zhiyuan He, Xufang Luo, Yike Zhang, Yuqing Yang, Lili Qiu
September 2025
Siyun Zhao, Yuqing Yang, Zilong Wang, Zhiyuan He, Luna K. Qiu, Lili Qiu
September 2024