Beyond Two-Stage Training: Cooperative SFT and RL for LLM Reasoning
Liang Chen, Xueting Han, Li Shen, Jing Bai, Kam-Fai Wong
September 2025
Liang Chen, Xueting Han, Li Shen, Jing Bai, Kam-Fai Wong
September 2025
Liang Chen, Xueting Han, Li Shen, Jing Bai, Kam-Fai Wong
September 2025
Liang Chen, Xueting Han, Li Shen, Jing Bai, Kam-Fai Wong
September 2025