July 22, 2022 August 21, 2023

MSR Asia Theory Lecture Series

Location: Virtual

MSR Asia Theory Lecture Series is a forum where we invite researchers around the world to share the latest theoretical advances in big data, artificial intelligence, and related areas. The Lecture series are broadcast live over Teams and Bilibili. If you would like to receive the information about the upcoming talks, please send email “Subscribe to the Lecture Series” to


4/27/2023: Understanding Adam and AdamW through proximal updates, scale-freeness, and relaxed smoothness, Francesco Orabona

  • Abstract: Adam and AdamW are the most commonly used algorithms for training deep neural networks due to their remarkable performance. However, despite a massive amount of research, it is fair to say that we are still far from understanding the true reasons why they work so well. In this talk, I’ll show you some recent results on unique characteristics of Adam and AdamW.
    First, I’ll show how AdamW can be easily understood as an approximation of a proximal update on the squared L2 regularizer. Next, I’ll show that, contrary to Adam, AdamW’s update is “scale-free”, i.e., its update is invariant to component-wise rescaling of the gradients. I’ll show how scale-freeness provides an automatic preconditioning and how it correlates with the better performance of AdamW over Adam on deep learning experiments. Finally, I’ll show the first analysis of a (minor) variant of Adam, that has a provably advantage over SGD for functions that satisfy a relaxed smoothness assumption, like the objective functions of Transformers.

    Francesco Orabona

    Bio: Francesco Orabona is an Associate Professor of Electrical & Computer Engineering at Boston University. His research interests lie in online learning, optimization, and statistical learning theory. He obtained his Ph.D. from the University of Genova in 2007. He previously was an Assistant Professor of Computer Science at Stony Brook University, a Senior Research Scientist at Yahoo Labs, and a Research Assistant Professor at the Toyota Technological Institute at Chicago. He received a Faculty Early Career Development (CAREER) from NSF in 2021 and a Google Research Award in 2017.

    Video (opens in new tab)

3/10/2023: Modeling Multiagent Game Dynamics: Approaches to Equilibrium Computation and Incentive Analysis, Xiaotie Deng

  • Abstract: This talk explores various research approaches to modeling the computation of equilibria and analysis of incentives in game dynamics. We discuss computational complexity, sequential and interactive optimization, and equilibrium analysis in multiagent systems.

    a man wearing glasses and smiling at the camera

    Bio: Xiaotie Deng is a Chair Professor at Peking University with a Ph.D. from Stanford University. His research focuses on algorithmic game theory, particularly in the context of the Internet and Blockchain Economics. Deng has taught at several universities and is a fellow of the ACM, IEEE, and CSIAM. He is a foreign member of Academia Europaea and received the 2022 Test of Time Award from ACM SIGecom.

    Video (opens in new tab)

1/12/2023: Passive and Active Multi-Task Representation Learning, Simon (Shaolei) Du

  • Abstract: Representation learning has been widely used in many applications. In this talk, I will present our work which uncovers when and why representation learning provably improves the sample efficiency, from a statistical learning point of view. Furthermore, I will talk about how to actively select the most relevant task to boost the performance.

    Simon Du

    Bio: Simon Shaolei Du is an assistant professor in the Paul G. Allen School of Computer Science & Engineering at the Universityof Washington. His research interests are broadly in machine learning, such as deep learning, representation learning, and reinforcement learning. Prior to starting as faculty, he was a postdoc at the Institute for Advanced Study of Princeton. He completed his Ph.D. in Machine Learning at Carnegie Mellon University. Simon’s research has been recognized by a Samsung AI Researcher of the Year Award, an NSF CAREER award, an Nvidia Pioneer Award, a AAAI New Faculty Highlights, and a Distinguished Dissertation Award honorable mention from CMU.

    Video (opens in new tab)

12/22/2022: Reward-free Reinforcement Learning via Sample-Efficient Representation Learning, Yingbin Liang

  • Abstract: As reward-free reinforcement learning (RL) becomes a powerful framework for a variety of multi-objective applications, representation learning arises as an effective technique to deal with the curse of dimensionality in reward-free RL. However, the existing algorithms of representation learning in reward-free RL still suffers seriously from high sample complexity, although they are polynomially efficient. In this talk, I will first present a novel representation learning algorithm that we propose for reward-free RL. We show that such an algorithm provably finds near-optimal policy as well as attaining near-accurate system identification via reward-free exploration, with significantly improved sample complexity compared to the best-known result before. I will then present our characterization of the benefit of representation learning in reward-free multitask (a.k.a. meta) RL as well as the benefit of employing the learned representation from upstream to downstream tasks. I will conclude my talk with remarks of future directions. The work to be presented was jointly with Yuan Cheng (USTC), Ruiquan Huang (PSU), Dr. Songtao Feng (OSU), Prof. Jing Yang (PSU), and Prof. Hong Zhang (USTC).

    a person posing for the camera

    Bio: Dr. Yingbin Liang is currently a Professor at the Department of Electrical and Computer Engineering at the Ohio State University (OSU), and a core faculty of the Ohio State Translational Data Analytics Institute (TDAI). She also serves as the Deputy Director of the AI-Edge Institute at OSU. Dr. Liang received the Ph.D. degree in Electrical Engineering from the University of Illinois at Urbana-Champaign in 2005, and served on the faculty of University of Hawaii and Syracuse University before she joined OSU. Dr. Liang’s research interests include machine learning, optimization, information theory, and statistical signal processing. Dr. Liang received the National Science Foundation CAREER Award and the State of Hawaii Governor Innovation Award in 2009. She also received EURASIP Best Paper Award in 2014.

    Video (opens in new tab)

11/04/2022: Player-optimal Stable Regret for Bandit Learning in Matching Markets, Shuai Li

  • Abstract: The problem of matching markets has been studied for a long history in the literature due to its wide range of applications. Finding a stable matching is a common equilibrium objective in this problem. Since market participants are usually uncertain of their preferences, a rich line of recent works study the online setting where one-side participants (players) learn their unknown preferences from iterative interactions with the other side (arms). Most previous works in this line are only able to derive theoretical guarantees for player-pessimal stable regret, which is defined compared with the players’ least-preferred stable matching. 

    However, under the pessimal stable matching, players only obtain the least reward among all stable matchings. To maximize players’ profits, the player-optimal stable matching would be the most desirable Though Basu et al. [2021] successfully bring an upper bound for player-optimal stable regret, their result can be exponentially large if players’ preference gap is small. Whether a polynomial guarantee for this regret exists is a significant but still open problem. In this work, we provide a new algorithm and show that the optimal stable regret of each player can be upper bounded by O(K log T / ∆^2) where K is the number of arms, T is the horizon and ∆ is the players’ minimum preference gap. This result significantly improves previous works which either has a weaker player-pessimal stable matching objective or applies only for markets with special assumptions. When the preferences of participants satisfy some special conditions, our regret upper bound also matches the previously derived lower bound This work is accepted at SODA 2023.

    Shuai Li

    Bio: Shuai Li is currently an Assistant Professor in the John Hopcroft Center of Shanghai Jiao Tong University. She received PhD degree in Computer Science from the Chinese University of Hong Kong, master degree in Mathematics from University of the Chinese Academy of Sciences and bachelor degree in Mathematics from Zhejiang University. Her research interests include machine learning theory, bandit algorithms and reinforcement learning algorithms. She has published 40+ papers in top machine learning conferences like ICML/NeurIPS/AAAI/IJCAI/KDD and serves as reviewers in these conferences. She is a recipient of Shanghai Sailing Program 2020 and Google PhD fellowship 2018.


10/13/2022: What Should a Good Deep Neural Network Look Like? Insights from a Layer-Peeled Model and the Law of Equi-Separation, Weijie Su

  • Abstract: In this talk, we will investigate the emergence of geometric patterns in well-trained deep learning models by making use of a layer-peeled model and the law of equi-separation. The former is a nonconvex optimization program that models the last-layer features and weights. We use the model to shed light on the neural collapse phenomenon of Papyan, Han, and Donoho, and to predict a hitherto-unknown phenomenon that we term minority collapse in imbalanced training. This is based on joint work with Cong Fang, Hangfeng He, and Qi Long.

    The law of equi-separation is a pervasive empirical phenomenon that describes how data are separated according to their class membership from the bottom to the top layer in a well-trained neural network. We will show that, through extensive computational experiments, neural networks improve data separation through layers in a simple exponential manner. This law leads to roughly equal ratios of separation that a single layer is able to improve, thereby showing that all layers are created equal. We will conclude the talk by discussing the implications of this law on the interpretation, robustness, and generalization of deep learning, as well as on the inadequacy of some existing approaches toward demystifying deep learning. This is based on joint work with Hangfeng He.

    a man wearing glasses and looking at the camera

    Bio: Weijie Su is an Associate Professor in the Wharton Statistics and Data Science Department and, by courtesy, in the Department of Computer and Information Science, at the University of Pennsylvania. He is a co-director of Penn Research in Machine Learning. Prior to joining Penn, he received his Ph.D. from Stanford University in 2016 and his bachelor’s degree from Peking University in 2011. His research interests span privacy-preserving data analysis, deep learning theory, optimization, high-dimensional statistics, and mechanism design. He is a recipient of the Stanford Theodore Anderson Dissertation Award in 2016, an NSF CAREER Award in 2019, an Alfred Sloan Research Fellowship in 2020, the SIAM Early Career Prize in Data Science in 2022, and the IMS Peter Gavin Hall Prize in 2022.

    Video (opens in new tab)

09/22/2022: On the (Non)smoothness of Neural Network Training, Jingzhao Zhang

  • Abstract: In this talk, we will discuss the following question―why is neural network training non-smooth from an optimization perspective, and how should we analyze convergence for non smooth problems. We start by showing that the non-smoothness is essential to standard neural network training procedures, and that network training converges in an unstable manner. We then provide theoretical models for understanding why optimization in neural network is unstable, and how new definitions of convergence can reconcile theory with practice.


    Bio: Jingzhao Zhang is an assistant professor at Tsinghua, IIIS. He graduated from MIT EECS under the supervision of Prof Ali Jadbabaie and Prof Suvrit Sra. His research focuses on providing theoretical justifications and analyses to practical large-scale optimization algorithms. He is also interested in machine learning applications, especially those involving dynamical system formulations.

    Video (opens in new tab)

08/25/2022: Local Elasticity of Neural Networks and Its Inspired Theory, Zhun Deng

  • Abstract: In this talk, I will briefly review local elasticity of neural networks proposed by He et al. Then, based on that, I will introduce a new type of stability notion, which can improve over classical stability notions with respect to generalization behavior in certain situations. Specifically, among different notions of stability, uniform stability is arguably the most popular one, which yields exponential generalization bounds. However, uniform stability only considers the worst-case loss change (or so-called sensitivity) by removing a single data point, which is distribution-independent and therefore undesirable. There are many cases that the worst-case sensitivity of the loss is much larger than the average sensitivity taken over the single data point that is removed, especially in some advanced models such as random feature models or neural networks. Many previous works try to mitigate the distribution independent issue by proposing weaker notions of stability, however, they either only yield polynomial bounds or the bounds derived do not vanish as sample size goes to infinity. Given that, we propose locally elastic stability as a weaker and distribution-dependent stability notion, which still yields exponential generalization bounds. We further demonstrate that locally elastic stability implies tighter generalization bounds than those derived based on uniform stability in many situations by revisiting the examples of bounded support vector machines, regularized least square regressions, and stochastic gradient descent.

    Zhun Deng

    Bio: Zhun is a postdoctoral researcher with Toniann Pitassi (opens in new tab) and Richard Zemel (opens in new tab) at Columbia University, and also part of Simons Collaboration on the Theory of Algorithmic Fairness (opens in new tab). Previously, Zhun got his Ph.D. in Computer Science at Harvard University, advised by Cynthia Dwork (opens in new tab). His research interests lie at the intersection of theoretical computer science, machine learning, and social science. His work aims to make data science more trustworthy, statistically rigorous, and aligned with societal values. Here is the website: (opens in new tab).

    Video (opens in new tab)

08/04/2022: Toward Understanding Self-Supervised Pre-training, Tengyu Ma

  • Abstract:  AI is undergoing a paradigm shift the rise of models that are pretrained with self-supervised learning and then adapted to a wide range of downstream tasks. Despite the unprecedented empirical success, why and how pretrained models work still largely remains a mystery. This talk will discuss recent works on analyzing contrastive learning, a family of popular self-supervised pretraining methods that learn visual representations/embeddings of images from unlabeled data. We will develop a framework that views contrastive learning as a parametric version of spectral clustering on a so-called population positive-pair graph. We will also analyze the adaptability of the representations and provide sample complexity bounds. Finally, I will briefly discuss two follow-up works that study self-supervised representations’ performance under imbalanced pretraining datasets and for shifting test distributions.  Joint works with Jeff Z. Haochen, Colin Wei, Kendrick Shen, Robbie Jones, Ananya Kumar, Sang Michael Xie, Adrien Gaidon, and Percy Liang.

    Tengyu Ma

    Bio: Tengyu Ma is an assistant professor of Computer Science and Statistics at Stanford University. He received his Ph.D. from Princeton University and B.E. from Tsinghua University. His research interests include topics in machine learning and algorithms, such as deep learning and its theory, non-convex optimization, deep reinforcement learning, representation learning, and high-dimensional statistics. He is a recipient of the ACM Doctoral Dissertation Award Honorable Mention, the Sloan Fellowship, and NSF CAREER Award.

    Video (opens in new tab) | Slides (opens in new tab)

07/22/2022: Unveiling Transformers with LEGO, Sebastien Bubeck

  • Abstract:

    The discovery of the transformer architecture was a paradigm shifting event for deep learning. However, these architectures are arguably even harder to understand than say convolutional neural networks. In this work we propose a synthetic task, called LEGO, to probe the inner workings of transformers. We obtain some insights on multi-head attention, the effect of pretraining, as well as overfitting issues. Joint work with Yi Zhang, Arturs Backurs, Ronen Eldan, Suriya Gunasekar, and Tal Wagner.

    a person posing for the camera

    Bio: Sebastien Bubeck manages the Machine Learning Foundations team in MSR Redmond. He has worked on multi-armed bandits, convex optimization, online algorithms, and adversarial examples, winning best papers at COLT (2009 and 2016), ALT (2018), and NeurIPS (2018 and 2021). At the moment he is trying to understand Transformers.

    Video (opens in new tab)