Ruslan Salakhutdinov, CMU
Tuesday, July 30, 2019 | Mardi, juillet 30 2019
Transformers have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling. In the first part of the talk, I will discuss a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length without disrupting temporal coherence. It consists of a segment-level recurrence mechanism and a novel positional encoding scheme. Our method not only enables capturing longer-term dependency, but also resolves the context fragmentation problem. As a result, Transformer-XL learns dependency that is 80% longer than RNNs and 450% longer than vanilla Transformers, achieves better performance on both short and long sequences, and is up to 1,800+ times faster than vanilla Transformers during evaluation.
In the second part of the talk, I will introduce XLNet, a generalized autoregressive pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and (2) overcomes the limitations of BERT thanks to its autoregressive formulation. I will show how ideas from Transformer-XL can be integrated into XLNet pretraining. Empirically, XLNet outperforms BERT on 20 tasks, often by a large margin, and achieves state-of-the-art results on 18 tasks including question answering, natural language inference, sentiment analysis, and document ranking.Joint work with Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, and Quoc Le.
Ruslan Salakhutdinov is a UPMC Professor of Computer Science in the Department of Machine Learning at CMU. He received his PhD in computer science from the University of Toronto in 2009. After spending two post-doctoral years at the Massachusetts Institute of Technology Artificial Intelligence Lab, he joined the University of Toronto as an Assistant Professor in the Departments of Statistics and Computer Science. In 2016, he joined CMU. Ruslan’s primary interests lie in deep learning, machine learning, and large-scale optimization. He is an action editor of the Journal of Machine Learning Research, served on the senior programme committee of several top-tier machine learning conferences including NIPS and ICML, and was a program co-chair for ICML 2019. He is an Alfred P. Sloan Research Fellow, Microsoft Research Faculty Fellow, Canada Research Chair in Statistical Machine Learning, a recipient of the Early Researcher Award, Google Faculty Award, Nvidia’s Pioneers of AI award, and is a Senior Fellow of the Canadian Institute for Advanced Research.
Sanja Fidler, University of Toronto
Thursday, June 25, 2019 | Mardi, juin 25 2019
Simulation is crucial for robotic applications such as autonomous vehicles and household robots where agents need to be tested in a virtual environment before they are deployed to the real world. One of the bottlenecks in simulation is content creation which is typically done manually, and is time consuming. In this talk, I will present our recent work on adaptive simulation and 3D content generation with deep learning.
Sanja Fidler is an Assistant Professor at the Department of Computer Science, University of Toronto. She joined UofT in 2014. In 2018, she took a role of Director of AI at NVIDIA, leading a research lab in Toronto. Previously she was a Research Assistant Professor at TTI-Chicago, a philanthropically endowed academic institute located in the campus of the University of Chicago. She completed her PhD in computer science at University of Ljubljana in 2010, and was a postdoctoral fellow at University of Toronto during 2011-2012. In 2010 she visited UC Berkeley as a visiting research scientist. She has served as a Program Chair of the 3DV conference, and as an Area Chair of CVPR, ICCV, EMNLP, ICLR, NIPS, and AAAI, and will serve as Program Chair of ICCV’21. She received the NVIDIA Pioneer of AI award, Amazon Academic Research Award, Facebook Faculty Award, and the Connaught New Researcher Award. In 2018 she was appointed as the Canadian CIFAR AI Chair. She has also been ranked among the top 3 most influential AI female researchers in Canada by Re-WORK. Her work on semi-automatic object instance annotation won the Best Paper Honorable Mention at CVPR’17. Her main research interests are scene parsing from images and videos, interactive annotation, 3D scene understanding, 3D content creation, and multimodal representations.
M. Alex O. Vasilescu, UCLA
Thursday, May 30, 2019 | Jeudi, 30 mai 2019
Natural images are the compositional consequence of multiple causal factors related to scene structure, illumination, and imaging. Tensor algebra, the algebra of higher-order tensors offers a potent mathematical framework for explicitly representing and disentangling the causal factors of data formation which allows intelligent agents to better understand and navigate the world, an important tenet of artificial intelligence, and an important goal of data science. Theoretical evidence has shown that deep learning is a neural network implementation equivalent to multilinear tensor decomposition, while a shallow network corresponds to linear tensor factorization (aka. CANDECOMP/Parafac tensor factorization).
Tensor factorizations have been successfully applied in numerous computer vision, signal processing, computer graphics, and machine learning tasks. Tensor approach first employed in computer vision to recognize people from the way they move (Human Motion Signatures in 2001) and from their facial images (TensorFaces in 2002), but it may be used to recognize any objects, or object attributes.
We will also discuss several multilinear representations that represent cause-and-effect, such as, Multilinear PCA, Multilinear ICA (not to be confused with computing ICA by employing tensor methods, an approach typically employed to reparameterize deep learning models), Compositional Hierarchical Tensor Factorization, as well as the multilinear projection operator which is important in performing recognition.
M. Alex O. Vasilescu received her education at the Massachusetts Institute of Technology and the University of Toronto.
Vasilescu introduced the tensor paradigm in computer vision, computer graphics, machine learning, and extended the tensor algebraic framework by generalizing concepts from linear algebra. Starting in the early 2000s, she re-framed the analysis, recognition, synthesis, and interpretability of sensory data as multilinear tensor factorization problems suitable for mathematically representing cause-and-effect and demonstratively disentangling the causal factors of observable data. The tensor framework is a powerful paradigm whose utility and value has been further underscored by theoretical evidence that has showing that deep learning is a neural network approximation of multilinear tensor factorization and shallow networks are linear tensor factorizations (CP decomposition).
Vasilescu’s face recognition research, known as TensorFaces, has been funded by the TSWG, the Department of Defenses Combating Terrorism Support Program, and by IARPA, Intelligence Advanced Research Projects Activity. Her work was featured on the cover of Computer World, and in articles in the New York Times, Washington Times, etc. MITs Technology Review Magazine named her to their TR100 honoree, and the National Academy of Science co-awarded the KeckFutures Initiative Grant.
Mark Riedl, Georgia Tech
Tuesday, April 30, 2019 | Mardi, 30 avril 2019
Storytelling is a pervasive part of the human experience–we as humans tell stories to communicate, inform, entertain, and educate. In this talk, I will lay out the case for the study of storytelling through the lens of present the case for the study of storytelling through the lens of artificial intelligence and a number of ways computational narrative intelligence can facilitate the creation of intelligent applications that benefit humans and facilitate human-agent interaction. I will explore the grand challenge of building an intelligent system capable of generating fictional stories, including work from my lab using classical artificial intelligence techniques, machine learning, and neural networks.
Dr. Mark Riedl is an Associate Professor in the Georgia Tech School of Interactive Computing and director of the Entertainment Intelligence Lab. Dr. Riedl’s research focuses on human-centered artificial intelligence—the development of artificial intelligence and machine learning technologies that understand and interact with human users in more natural ways. Dr. Riedl’s recent work has focused on story understanding and generation, computational creativity, explainable AI, and teaching virtual agents to behave safely. His research is supported by the NSF, DARPA, ONR, the U.S. Army, U.S. Health and Human Services, Disney, and Google. He is the recipient of a DARPA Young Faculty Award and an NSF CAREER Award.
Dr. Marzyeh Ghassemi, University of Toronto and Vector Institute
March 25, 2019
Health is important, and improvements in health improve lives. However, we still don’t fundamentally understand what it means to be healthy, and the same patient may receive different treatments across different hospitals or clinicians as new evidence is discovered, or individual illness is interpreted.
Health is unlike many success stories in machine learning so far – games like Go and self-driving cars – because we do not have well-defined goals that can be used to learn rules. The nuance of health also requires that we keep machine learning models “healthy” – working to ensure that they do not learn biased rules or detrimental recommendations.
In this talk, Dr. Ghassemi covered some of the many novel technical opportunities for machine learning to tackle that stem from health challenges, and important progress to be made with careful application to domain.