{"id":788912,"date":"2021-10-26T23:32:11","date_gmt":"2021-10-27T06:32:11","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-project&#038;p=788912"},"modified":"2021-11-25T19:01:08","modified_gmt":"2021-11-26T03:01:08","slug":"optimization-in-deep-learning","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/optimization-in-deep-learning\/","title":{"rendered":"Optimization in Deep Learning"},"content":{"rendered":"<section class=\"mb-3 moray-highlight\">\n\t<div class=\"card-img-overlay mx-lg-0\">\n\t\t<div class=\"card-background bg-gray-200 has-background- card-background--full-bleed\">\n\t\t\t\t\t<\/div>\n\t\t<!-- Foreground -->\n\t\t<div class=\"card-foreground d-flex mt-md-n5 my-lg-5 px-g px-lg-0\">\n\t\t\t<!-- Container -->\n\t\t\t<div class=\"container d-flex mt-md-n5 my-lg-5 align-self-center\">\n\t\t\t\t<!-- Card wrapper -->\n\t\t\t\t<div class=\"w-100 w-lg-col-5\">\n\t\t\t\t\t<!-- Card -->\n\t\t\t\t\t<div class=\"card material-md-card py-5 px-md-5\">\n\t\t\t\t\t\t<div class=\"card-body \">\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\n<h1 id=\"optimization-in-deep-learning\">Optimization in Deep Learning<\/h1>\n\n\n\n<p><\/p>\n\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t<\/div>\n<\/section>\n\n\n\n\n\n<ul class=\"wp-block-list\" type=\"1\"><li>&nbsp;Understand the dynamics of the optimizers in deep learning and its convergence rate and implicit regularization.<\/li><\/ul>\n\n\n\n<p>One important aspect to open the black-box of deep neural networks is to understand the dynamics of the optimization process in deep learning. We investigate the influence of the noise in the stochastic optimization algorithms, the influence of the local Hessian properties, and the optimization path of SGD in deep learning.<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Optimization theory motivated architecture design.<\/li><\/ul>\n\n\n\n<p>We analyze the forward\/backward stability of the deep neural network and establish the conditions that guarantee stability of residual connection and multi-branch structure. Based on our new understanding, we design new neural network architectures by ensuring the stability of signal propagation across layers.<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Rethinking the invariance properties in deep learning models.<\/li><\/ul>\n\n\n\n<p>Group-invariance\/equivariance widely exists in deep learning, e.g., Positively scale-invariance of ReLU networks, rotation invariance of 3D point clouds, translation and scaling equivariance of fluid dynamics, etc. By removing redundancy, representing neural networks by the corresponding group-invariant variables can help both optimization and generalization.<\/p>\n\n\n\n\n\n<ul class=\"wp-block-list\"><li>Qi Meng*, Shuxin Zheng*, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, and Tie-Yan Liu, G-SGD: Optimizing ReLU Neural Networks in its Positively Scale-Invariant Space, In&nbsp;<em>Proceedings of the 7th International Conference of Learning Representations<\/em>&nbsp;(ICLR),&nbsp;2019<\/li><\/ul>\n\n\n\n<ul class=\"wp-block-list\"><li>Xufang Luo, Qi Meng, Wei Chen, Yunhong Wang, and Tie-Yan Liu, Path-BN: Towards Effective Batch Normalization in the Path Space for ReLU Networks,&nbsp;<em>In the Conference on Uncertainty in Artificial Intelligence<\/em>&nbsp;(UAI),<em>&nbsp;2021<\/em><\/li><\/ul>\n\n\n\n<ul class=\"wp-block-list\"><li><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/huzhang\/\">Huishuai Zhang<\/a>, Da Yu, Mingyang Yi, Wei Chen and&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/tyliu\/\">Tie-Yan Liu<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/convergence-theory-of-learning-over-parameterized-resnet-a-full-characterization\/\">Stabilize Deep ResNet with A Sharp Scaling Factor \u03c4<\/a>, <em>Machine Learning Journal,&nbsp;August 2021<\/em><\/li><\/ul>\n\n\n\n<ul class=\"wp-block-list\"><li>Ruibin Xiong, Yunchang Yang,&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/dihe\/\">Di He<\/a>, Kai Zheng,&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/shuz\/\">Shuxin Zheng<\/a>, Chen Xing,&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/huzhang\/\">Huishuai Zhang<\/a>, Yanyan Lan, Liwei Wang and&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/tyliu\/\">Tie-Yan Liu<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/on-layer-normalization-in-the-transformer-architecture\/\">On Layer Normalization in the Transformer Architecture<\/a>, <em>ICML, July 2020<\/em><\/li><\/ul>\n\n\n\n<ul class=\"wp-block-list\"><li>Shicong Cen,&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/huzhang\/\">Huishuai Zhang<\/a>, Yuejie Chi,&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/wche\/\">Wei Chen<\/a> and&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/tyliu\/\">Tie-Yan Liu<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/convergence-of-distributed-stochastic-variance-reduced-methods-without-sampling-extra-data\/\">Convergence of Distributed Stochastic Variance Reduced Methods without Sampling Extra Data<\/a>, <em>IEEE Transactions on Signal Processing&nbsp;|&nbsp;June 2020, Vol 68<\/em><\/li><\/ul>\n\n\n\n<ul class=\"wp-block-list\"><li>Yi Zhou, Junjie Yang,&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/huzhang\/\">Huishuai Zhang<\/a>, Yingbin Liang and Vahid Tarokh, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/sgd-converges-to-global-minimum-in-deep-learning-via-star-convex-path\/\">SGD Converges to Global Minimum in Deep Learning via Star-convex Path<\/a>, <em>International Conference on Learning Representations (ICLR), &nbsp;May 2019<\/em><\/li><\/ul>\n\n\n\n<ul class=\"wp-block-list\"><li><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/huzhang\/\">Huishuai Zhang<\/a>,&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/wche\/\">Wei Chen<\/a> and&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/tyliu\/\">Tie-Yan Liu<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/on-the-local-hessian-in-back-propagation\/\">On the local Hessian in back-propagation<\/a>, <em>Neural Information Processing Systems (NeurIPS), 2018<\/em><\/li><\/ul>\n\n\n","protected":false},"excerpt":{"rendered":"<p>&nbsp;Understand the dynamics of the optimizers in deep learning and its convergence rate and implicit regularization. One important aspect to open the black-box of deep neural networks is to understand the dynamics of the optimization process in deep learning. We investigate the influence of the noise in the stochastic optimization algorithms, the influence of the [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"research-area":[13556],"msr-locale":[268875],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-788912","msr-project","type-msr-project","status-publish","hentry","msr-research-area-artificial-intelligence","msr-locale-en_us","msr-archive-status-active"],"msr_project_start":"","related-publications":[],"related-downloads":[],"related-videos":[],"related-groups":[],"related-events":[],"related-opportunities":[],"related-posts":[],"related-articles":[],"tab-content":[],"slides":[],"related-researchers":[],"msr_research_lab":[199560],"msr_impact_theme":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/788912","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-project"}],"version-history":[{"count":6,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/788912\/revisions"}],"predecessor-version":[{"id":799927,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/788912\/revisions\/799927"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=788912"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=788912"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=788912"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=788912"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=788912"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}