November 4, 2022

MSR Asia Theory Lecture Series

MSR Asia Theory Lecture Series is a forum where we invite researchers around the world to share the latest theoretical advances in big data, artificial intelligence, and related areas. The Lecture series are broadcast live over Teams and Bilibili. If you would like to receive the information about the upcoming talks, please send email “Subscribe to the Lecture Series” to


11/04/2022: Player-optimal Stable Regret for Bandit Learning in Matching Markets, Shuai Li

  • Abstract: The problem of matching markets has been studied for a long history in the literature due to its wide range of applications. Finding a stable matching is a common equilibrium objective in this problem. Since market participants are usually uncertain of their preferences, a rich line of recent works study the online setting where one-side participants (players) learn their unknown preferences from iterative interactions with the other side (arms). Most previous works in this line are only able to derive theoretical guarantees for player-pessimal stable regret, which is defined compared with the players’ least-preferred stable matching. 

    However, under the pessimal stable matching, players only obtain the least reward among all stable matchings. To maximize players’ profits, the player-optimal stable matching would be the most desirable Though Basu et al. [2021] successfully bring an upper bound for player-optimal stable regret, their result can be exponentially large if players’ preference gap is small. Whether a polynomial guarantee for this regret exists is a significant but still open problem. In this work, we provide a new algorithm and show that the optimal stable regret of each player can be upper bounded by O(K log T / ∆^2) where K is the number of arms, T is the horizon and ∆ is the players’ minimum preference gap. This result significantly improves previous works which either has a weaker player-pessimal stable matching objective or applies only for markets with special assumptions. When the preferences of participants satisfy some special conditions, our regret upper bound also matches the previously derived lower bound This work is accepted at SODA 2023.

    Shuai Li

    Bio: Shuai Li is currently an Assistant Professor in the John Hopcroft Center of Shanghai Jiao Tong University. She received PhD degree in Computer Science from the Chinese University of Hong Kong, master degree in Mathematics from University of the Chinese Academy of Sciences and bachelor degree in Mathematics from Zhejiang University. Her research interests include machine learning theory, bandit algorithms and reinforcement learning algorithms. She has published 40+ papers in top machine learning conferences like ICML/NeurIPS/AAAI/IJCAI/KDD and serves as reviewers in these conferences. She is a recipient of Shanghai Sailing Program 2020 and Google PhD fellowship 2018.


10/13/2022: What Should a Good Deep Neural Network Look Like? Insights from a Layer-Peeled Model and the Law of Equi-Separation, Weijie Su

  • Abstract: In this talk, we will investigate the emergence of geometric patterns in well-trained deep learning models by making use of a layer-peeled model and the law of equi-separation. The former is a nonconvex optimization program that models the last-layer features and weights. We use the model to shed light on the neural collapse phenomenon of Papyan, Han, and Donoho, and to predict a hitherto-unknown phenomenon that we term minority collapse in imbalanced training. This is based on joint work with Cong Fang, Hangfeng He, and Qi Long.

    The law of equi-separation is a pervasive empirical phenomenon that describes how data are separated according to their class membership from the bottom to the top layer in a well-trained neural network. We will show that, through extensive computational experiments, neural networks improve data separation through layers in a simple exponential manner. This law leads to roughly equal ratios of separation that a single layer is able to improve, thereby showing that all layers are created equal. We will conclude the talk by discussing the implications of this law on the interpretation, robustness, and generalization of deep learning, as well as on the inadequacy of some existing approaches toward demystifying deep learning. This is based on joint work with Hangfeng He.

    a man wearing glasses and looking at the camera

    Bio: Weijie Su is an Associate Professor in the Wharton Statistics and Data Science Department and, by courtesy, in the Department of Computer and Information Science, at the University of Pennsylvania. He is a co-director of Penn Research in Machine Learning. Prior to joining Penn, he received his Ph.D. from Stanford University in 2016 and his bachelor’s degree from Peking University in 2011. His research interests span privacy-preserving data analysis, deep learning theory, optimization, high-dimensional statistics, and mechanism design. He is a recipient of the Stanford Theodore Anderson Dissertation Award in 2016, an NSF CAREER Award in 2019, an Alfred Sloan Research Fellowship in 2020, the SIAM Early Career Prize in Data Science in 2022, and the IMS Peter Gavin Hall Prize in 2022.


09/22/2022: On the (Non)smoothness of Neural Network Training, Jingzhao Zhang

  • Abstract: In this talk, we will discuss the following question―why is neural network training non-smooth from an optimization perspective, and how should we analyze convergence for non smooth problems. We start by showing that the non-smoothness is essential to standard neural network training procedures, and that network training converges in an unstable manner. We then provide theoretical models for understanding why optimization in neural network is unstable, and how new definitions of convergence can reconcile theory with practice.


    Bio: Jingzhao Zhang is an assistant professor at Tsinghua, IIIS. He graduated from MIT EECS under the supervision of Prof Ali Jadbabaie and Prof Suvrit Sra. His research focuses on providing theoretical justifications and analyses to practical large-scale optimization algorithms. He is also interested in machine learning applications, especially those involving dynamical system formulations.


08/25/2022: Local Elasticity of Neural Networks and Its Inspired Theory, Zhun Deng

  • Abstract: In this talk, I will briefly review local elasticity of neural networks proposed by He et al. Then, based on that, I will introduce a new type of stability notion, which can improve over classical stability notions with respect to generalization behavior in certain situations. Specifically, among different notions of stability, uniform stability is arguably the most popular one, which yields exponential generalization bounds. However, uniform stability only considers the worst-case loss change (or so-called sensitivity) by removing a single data point, which is distribution-independent and therefore undesirable. There are many cases that the worst-case sensitivity of the loss is much larger than the average sensitivity taken over the single data point that is removed, especially in some advanced models such as random feature models or neural networks. Many previous works try to mitigate the distribution independent issue by proposing weaker notions of stability, however, they either only yield polynomial bounds or the bounds derived do not vanish as sample size goes to infinity. Given that, we propose locally elastic stability as a weaker and distribution-dependent stability notion, which still yields exponential generalization bounds. We further demonstrate that locally elastic stability implies tighter generalization bounds than those derived based on uniform stability in many situations by revisiting the examples of bounded support vector machines, regularized least square regressions, and stochastic gradient descent.

    Zhun Deng

    Bio: Zhun is a postdoctoral researcher with Toniann Pitassi and Richard Zemel at Columbia University, and also part of Simons Collaboration on the Theory of Algorithmic Fairness. Previously, Zhun got his Ph.D. in Computer Science at Harvard University, advised by Cynthia Dwork. His research interests lie at the intersection of theoretical computer science, machine learning, and social science. His work aims to make data science more trustworthy, statistically rigorous, and aligned with societal values. Here is the website:


08/04/2022: Toward Understanding Self-Supervised Pre-training, Tengyu Ma

  • Abstract:  AI is undergoing a paradigm shift the rise of models that are pretrained with self-supervised learning and then adapted to a wide range of downstream tasks. Despite the unprecedented empirical success, why and how pretrained models work still largely remains a mystery. This talk will discuss recent works on analyzing contrastive learning, a family of popular self-supervised pretraining methods that learn visual representations/embeddings of images from unlabeled data. We will develop a framework that views contrastive learning as a parametric version of spectral clustering on a so-called population positive-pair graph. We will also analyze the adaptability of the representations and provide sample complexity bounds. Finally, I will briefly discuss two follow-up works that study self-supervised representations’ performance under imbalanced pretraining datasets and for shifting test distributions.  Joint works with Jeff Z. Haochen, Colin Wei, Kendrick Shen, Robbie Jones, Ananya Kumar, Sang Michael Xie, Adrien Gaidon, and Percy Liang.

    Tengyu Ma

    Bio: Tengyu Ma is an assistant professor of Computer Science and Statistics at Stanford University. He received his Ph.D. from Princeton University and B.E. from Tsinghua University. His research interests include topics in machine learning and algorithms, such as deep learning and its theory, non-convex optimization, deep reinforcement learning, representation learning, and high-dimensional statistics. He is a recipient of the ACM Doctoral Dissertation Award Honorable Mention, the Sloan Fellowship, and NSF CAREER Award.

    Video | Slides

07/22/2022: Unveiling Transformers with LEGO, Sebastien Bubeck

  • Abstract:

    The discovery of the transformer architecture was a paradigm shifting event for deep learning. However, these architectures are arguably even harder to understand than say convolutional neural networks. In this work we propose a synthetic task, called LEGO, to probe the inner workings of transformers. We obtain some insights on multi-head attention, the effect of pretraining, as well as overfitting issues. Joint work with Yi Zhang, Arturs Backurs, Ronen Eldan, Suriya Gunasekar, and Tal Wagner.

    a person posing for the camera

    Bio: Sebastien Bubeck manages the Machine Learning Foundations team in MSR Redmond. He has worked on multi-armed bandits, convex optimization, online algorithms, and adversarial examples, winning best papers at COLT (2009 and 2016), ALT (2018), and NeurIPS (2018 and 2021). At the moment he is trying to understand Transformers.