Shengjie Luo, Shanda Li, Tianle Cai, Di He, Dinglan Peng, Shuxin Zheng, Guolin Ke, Liwei Wang, Tie-Yan Liu, Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding, NeurIPS 2021
Shuqi Lu, Di He, Chenyan Xiong, Guolin Ke, Waleed Malik, Zhicheng Dou, Paul Bennett, Tie-Yan Liu, Arnold Overwijk, Less is More: Pre-train a Strong Text Encoder for Dense Retrieval Using a Weak Decoder, EMNLP 2021
Qiyu Wu, Chen Xing, Yatao Li, Guolin Ke, Di He, Tie-Yan Liu, Taking Notes on the Fly Helps Language Pre-training, ICLR 2021.
Guolin Ke, Di He, Tie-Yan Liu, Rethinking Positional Encoding in Language Pre-training, ICLR 2021.
Ruibin Xiong, Yunchang Yang, Di He, Kai Zheng, Shuxin Zheng, Chen Xing, Huishuai Zhang, Yanyan Lan, Liwei Wang, Tie-Yan Liu, On Layer Normalization in the Transformer Architecture, ICML 2020.
Linyuan Gong, Di He, Zhuohan Li, Tao Qin, Liwei Wang, Tie-Yan Liu, Efficient Training of BERT by Progressively Stacking, ICML 2019.
Zhenhui Xu, Linyuan Gong, Guolin Ke, Di He, Shuxin Zheng, Liwei Wang, Jiang Bian, Tie-Yan Liu, MC-BERT: Efficient Language Pre-Training via a Meta Controller, arXiv preprint arXiv:2006.05744, 2020.