Tie-Yan Liu (刘铁岩) is a Distinguished Scientist at Microsoft Research AI4Science (微软研究院科学智能中心亚洲负责人, 微软亚洲研究院副院长). He is a fellow of the IEEE, the ACM, and the AAIA. He is an adjunct professor at Tsinghua University (THU), Hong Kong University of Science and Technology (HKUST), University of Science and Technology of China (USTC), and Huazhong University of Science and Technology. He is also an honorary professor at Nottingham University.
As a researcher in an industrial lab, Tie-Yan is making his unique contributions to the world. On one hand, many of his technologies have been transferred to Microsoft’s products and online services, such as Bing, Microsoft Advertising, Windows, Xbox, and Azure. On the other hand, he has been actively contributing to the academic community. Over the years, Tie-Yan and his team have contributed hundreds of high-impact papers at top conferences and journals – a good indicator of their influence and impact. He has won quite a few awards, including the best student paper award at SIGIR (2008) and ACML (2018), the most cited paper award at Journal of Visual Communications and Image Representation (2004-2006), the most cited Chinese researcher (2017-2019), China AI Leader Award – Technical Innovation (2018), and the Most Influential Scholar Award by AMiner (2007-2017). He has been invited to serve as general chair, PC chair, or area chair for a dozen of top conferences including WWW/WebConf, SIGIR, NeurIPS, ICLR, ICML, IJCAI, AAAI, KDD, ACL, ICTIR, as well as associate editor/editorial board member of IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), ACM Transactions on Information Systems (TOIS), ACM Transactions on Web (TWEB), Information Retrieval Journal, NeuroComputing, and Foundations and Trends in Information Retrieval. Tie-Yan Liu and his works have been reported by many International media, including National Public Radio, CNET, MIT Technology Review, and PCTech Magazine.
Tie-Yan’s major research achievements are summarized as follows.
[Machine learning for ranking]
Ranking is a key problem in web search, advertising, and recommender systems. Machine learning can relieve people from manually designing complex ranking heuristics, however, the task is non-trivial because ranking differs from typical machine learning problems like classification and regression.
Tie-Yan is a pioneer in machine learning for ranking (a.k.a. learning to rank). He formulated ranking as a problem of listwise permutation, which opened a new space for algorithm design. He proposed the taxonomy of learning to rank (pointwise, pairwise, and listwise) and laid down their theoretical foundations. He published many impactful papers on learning to rank, the top three of which have 6000+ citations. Tie-Yan’s seminal contribution to the field of learning to rank has been widely recognized (https://en.wikipedia.org/wiki/Learning_to_rank).
Tie-Yan is a major advocator of learning to rank – he gave the first batch of tutorials, organized the first series of workshops, wrote the first textbook, and released LETOR (https://www.microsoft.com/en-us/research/project/letor-learning-rank-information-retrieval/), an indispensable experimental platform for learning to rank. With his efforts, learning to rank has become an important research direction and also a key technology in almost all major search engines.
[Machine learning from tabular data]
While deep learning is playing a dominant role in many places, it is not the best choice for learning from tabular data, an important scenario for industrial decision making. In this space, Gradient Boosting Tree (GBDT) is frequently used, however, traditional GBDTs have their own limitations, especially for high-dimensional and large-scale data.
Tie-Yan and his collaborators invented LightGBM, which has significantly pushed the frontier of GBDT. The secrete sauce of LightGBM is three-fold: (1) Gradient-based one-side sampling enables accurate estimation of information gain using only a small proportion of data; (2) Exclusive feature bundling reduces the number of features without information loss; (3) Leaf-wise tree split, different from previous depth-wise and level-wise methods, produces more complex trees (and higher accuracy) at the same computational cost. Their combinations together with parallel learning make LightGBM more efficient, scalable, and accurate than previous models.
Today, LightGBM has become a de-facto tool for learning from tabular data. The paper of LightGBM has received 5000+ citations, its open-source implementation (https://github.com/Microsoft/LightGBM) has attracted 14000+ stars on Github and been leveraged by 6000+ other open-source projects. The Python package of LigthGBM has 4+ million installs every month. LightGBM is one of the most popularly used machine learning tools by winners of Kaggle and KDD Cup. Inside Microsoft, LightGBM has been used by 10+ products. LightGBM has also been adopted by other companies and there are ~2000 patents filed by other companies based on LightGBM.
[Machine learning under resource constraints]
Recently training big models has become a trend, however, this imposes incredibly high requirements on computational resources and data resources. Tie-Yan and his collaborators made many algorithmic innovations (e.g., dual learning, multiplicative factorization, feedforward transformer) to tackle this challenge:
- [Reducing requirement on labeled training data] Dual Learning uses games to model the symmetric nature of AI tasks and enables effective learning without massive labeled training data. It helped machine translation to achieve human parity for the first time in 2018 and supports 80% rare languages in Azure Translation Services today.
- [Reducing computational cost of model learning] Multiplicative factorization (together with Metropolis Hasting) reduces the per-sample complexity for generative model learning by several orders of magnitude. The instantiation of this technique in topic model results in a new algorithm called LightLDA, which enabled the first system that learned millions of topics using a moderate computer cluster.
- [Reducing computational cost of model inference] Feedforward Transformer enables efficient non-autoregressive inference for sequence generation. The application of this technique to speech synthesis results in a new algorithm called FastSpeech, which achieved 270x speedup as compared to previous neural TTS models and enabled the first real-time neural TTS service in the industry on cheap GPUs. Now most languages in Azure Speech Services are supported by FastSpeech.
[Machine learning for industry]
Digital transformation is very critical for industries, such as finance, logistics, telecom, and healthcare. In the past five years, Tie-Yan and his team have spent significant efforts on using machine learning technologies to tackle key industrial challenges. It was found that many industrial decision problems could be modeled by spatio-temporal forecasting and resource optimization. Tie-Yan’s team developed a deep learning based forecasting framework called FOST (GitHub – microsoft/FOST), and a multi-agent reinforcement learning based resource optimization framework called MARO (GitHub – microsoft/MARO). They have also developed a dedicated open-source toolkit for the finance vertical, named QLIB (GitHub – microsoft/QLIB). These tools have been used to help many industrial partners (such as AMC, China Taiping, OOCL, Fareastone, SF-Express, Humana) to achieve successful digital and intelligent transformations. One of his recent works on digital transformation was awarded as “30 Best AI Use Case of 2019” by Synced.
[Machine learning for science]
Recently, using machine learning technologies to help scientific discovery has become a new trend. Tie-Yan and his team are very active in this important research field, and mainly working on the following two topics. First, the configurations of molecules dictate their chemical and physical properties. If one can accurately and efficiently predict these properties, it opens the door to a revolution in the way humanity approaches many existential challenges, such as drug discovery for health and catalyst design for sustainability. As a cross-lab effort in MSR, Tie-Yan and his team are working on AI for molecular representation and dynamics simulation. They developed Graphormer for molecule representation, which won the KDD Cup 2021 on molecular property prediction and the direct track of Open Catalyst Challenge 2021. Graphormer was published at NeurIPS and was just open-sourced: GitHub – microsoft/Graphormer. Second, there are a lot of observational data in natural science, and it is critical to well utilize these data to discover new scientific laws. In particular, Tie-Yan and his collaborations apply deep learning and symbolic regression to discover the nonconservative components in the observational data as the new Physics. This work was published at Physical Review E and opens up a new window for AI-assisted scientific discovery.
On July 7, 2022, Microsoft Research announced the establishment of a new global organization, named MSR AI4Science. Tie-Yan is responsible for leading the Asia team in this new organization. His team will be focused on using AI/machine learning to bring orders-of-magnitude accelerations to scientific computing, and to uncover deep connections between different scientific observations, identify mappings between microscope and macroscope, and conduct intelligent search in the gigantic scientific space. The goal of the team is use AI to bring disruptive opportunities to solving those tasks highly fundamental and critical for us, as human beings, to understand and change the physical world around us, such as environmental sustainability, drug discovery, and life sciences.
We Are Hiring!
- We are hiring at all levels (especially senior researchers)! If your major is machine learning (especially deep learning and distributed machine learning), biology, chemistry, physics, or material science, and you have the passion to change the world, please look at our open positions or send your resume directly to firstname.lastname@example.org.