MSR Montreal Pizza & AI Distinguished Lecture Series / Série de conférences émérites (MSR Montréal Pizzas et IA)

MSR Montreal Pizza & AI Distinguished Lecture Series / Série de conférences émérites (MSR Montréal Pizzas et IA)


Join us for technical AI presentations with Q&A, followed immediately by a brief reception with pizza to meet the speaker and address detailed questions. Members of the local academic community are welcome to attend.

Rejoignez-nous pour une série de conférences techniques sur l’IA. Chaque conférence sera suivie d’une session de questions-réponses ainsi que d’une réception au cours de laquelle seront servis breuvages et pizzas. Tous les membres de la communauté académique locale sont invités à participer.

Upcoming speakers

Deep Learning for Knowledge Representation and Reasoning

Andrew McCallum
Director of Center for Data Science, College of Information and Computer Science, University of Massachusetts Amherst

Tuesday, September 24, 2019 | Mardi, 24 septembre 2019
4:30 PM – 6:00 PM EST


Knowledge gathering, representation, and reasoning are among the fundamental challenges of artificial intelligence. Large-scale repositories of knowledge about entities, relations, and their abstractions are known as “knowledge bases.” Most major technology companies now have substantial efforts in knowledge base construction. But how should knowledge in KBs be represented? Information retrieval and QA simply operate on raw text. Traditional KGs, like Cyc and Freebase, operate on human-engineered symbolic schemas. Massive latent learned representations, like those in BERT, have recently been explored for their knowledge-holding capacity. In this talk I will advocate for our ‘Universal Schema’ approach—a “middle way,” incorporating aspects of non-parametric raw-text, human ontologies, and large latent representations. After briefly reviewing foundational work on Universal Schema, I will introduce new research in (1) chains of reasoning, using reinforcement learning to guide the efficient search for meaningful chains, (2) aligning taxonomies and representing common sense with box-shaped embeddings, and (3) entity resolution by large-scale non-greedy clustering via Poincare embeddings.


Andrew McCallum is a Distinguished Professor and Director of the Information Extraction and Synthesis Laboratory, as well as Director of Center for Data Science in the College of Information and Computer Science at University of Massachusetts Amherst. He has published over 300 papers in many areas of AI, including natural language processing, machine learning and reinforcement learning; his work has received over 60,000 citations. He obtained his PhD from University of Rochester in 1995 with Dana Ballard and a postdoctoral fellowship from CMU with Tom Mitchell and Sebastian Thrun. In the early 2000’s he was Vice President of Research and Development at at WhizBang Labs, a 170-person start-up company that used machine learning for information extraction from the Web. He is a AAAI Fellow, ACM Fellow, the recipient of the UMass Chancellor’s Award for Research and Creative Activity, the UMass NSM Distinguished Research Award, the UMass Lilly Teaching Fellowship, and research awards from Google, IBM, Microsoft, Oracle, Amazon, and others. He was the General Chair for the International Conference on Machine Learning (ICML) 2012, and from 2014 to 2017 served the President of the International Machine Learning Society. He is a member of the editorial board of the Journal of Machine Learning Research. For the past ten years, McCallum has been active in research on statistical machine learning applied to text, especially information extraction, entity resolution, social network analysis, structured prediction, semi-supervised learning, and deep neural networks for knowledge representation. His team’s work on open peer review can be found at McCallum’s web page is


Past speakers

Transformer-XL and XLNet: Generalized Autoregressive Modelling for Language Understanding

Ruslan Salakhutdinov, CMU

Tuesday, July 30, 2019 | Mardi, juillet 30 2019


Transformers have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling. In the first part of the talk, I will discuss a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length without disrupting temporal coherence. It consists of a segment-level recurrence mechanism and a novel positional encoding scheme. Our method not only enables capturing longer-term dependency, but also resolves the context fragmentation problem. As a result, Transformer-XL learns dependency that is 80% longer than RNNs and 450% longer than vanilla Transformers, achieves better performance on both short and long sequences, and is up to 1,800+ times faster than vanilla Transformers during evaluation.

In the second part of the talk, I will introduce XLNet, a generalized autoregressive pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and (2) overcomes the limitations of BERT thanks to its autoregressive formulation. I will show how ideas from Transformer-XL can be integrated into XLNet pretraining. Empirically, XLNet outperforms BERT on 20 tasks, often by a large margin, and achieves state-of-the-art results on 18 tasks including question answering, natural language inference, sentiment analysis, and document ranking.Joint work with Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, and Quoc Le.


Ruslan Salakhutdinov is a UPMC Professor of Computer Science in the Department of Machine Learning at CMU. He received his PhD in computer science from the University of Toronto in 2009. After spending two post-doctoral years at the Massachusetts Institute of Technology Artificial Intelligence Lab, he joined the University of Toronto as an Assistant Professor in the Departments of Statistics and Computer Science. In 2016, he joined CMU. Ruslan’s primary interests lie in deep learning, machine learning, and large-scale optimization. He is an action editor of the Journal of Machine Learning Research, served on the senior programme committee of several top-tier machine learning conferences including NIPS and ICML, and was a program co-chair for ICML 2019. He is an Alfred P. Sloan Research Fellow, Microsoft Research Faculty Fellow, Canada Research Chair in Statistical Machine Learning, a recipient of the Early Researcher Award, Google Faculty Award, Nvidia’s Pioneers of AI award, and is a Senior Fellow of the Canadian Institute for Advanced Research.

Towards Automatic 3D Content Creation

Sanja Fidler, University of Toronto

Thursday, June 25, 2019 | Mardi, juin 25 2019


Simulation is crucial for robotic applications such as autonomous vehicles and household robots where agents need to be tested in a virtual environment before they are deployed to the real world. One of the bottlenecks in simulation is content creation which is typically done manually, and is time consuming. In this talk, I will present our recent work on adaptive simulation and 3D content generation with deep learning.


Sanja Fidler is an Assistant Professor at the Department of Computer Science, University of Toronto. She joined UofT in 2014. In 2018, she took a role of Director of AI at NVIDIA, leading a research lab in Toronto. Previously she was a Research Assistant Professor at TTI-Chicago, a philanthropically endowed academic institute located in the campus of the University of Chicago. She completed her PhD in computer science at University of Ljubljana in 2010, and was a postdoctoral fellow at University of Toronto during 2011-2012. In 2010 she visited UC Berkeley as a visiting research scientist. She has served as a Program Chair of the 3DV conference, and as an Area Chair of CVPR, ICCV, EMNLP, ICLR, NIPS, and AAAI, and will serve as Program Chair of ICCV’21. She received the NVIDIA Pioneer of AI award, Amazon Academic Research Award, Facebook Faculty Award, and the Connaught New Researcher Award. In 2018 she was appointed as the Canadian CIFAR AI Chair. She has also been ranked among the top 3 most influential AI female researchers in Canada by Re-WORK. Her work on semi-automatic object instance annotation won the Best Paper Honorable Mention at CVPR’17. Her main research interests are scene parsing from images and videos, interactive annotation, 3D scene understanding, 3D content creation, and multimodal representations.

Representing Cause-and-Effect in a Tensor Framework

M. Alex O. Vasilescu, UCLA

Thursday, May 30, 2019 | Jeudi, 30 mai 2019


Natural images are the compositional consequence of multiple causal factors related to scene structure, illumination, and imaging. Tensor algebra, the algebra of higher-order tensors offers a potent mathematical framework for explicitly representing and disentangling the causal factors of data formation which allows intelligent agents to better understand and navigate the world, an important tenet of artificial intelligence, and an important goal of data science. Theoretical evidence has shown that deep learning is a neural network implementation equivalent to multilinear tensor decomposition, while a shallow network corresponds to linear tensor factorization (aka. CANDECOMP/Parafac tensor factorization).

Tensor factorizations have been successfully applied in numerous computer vision, signal processing, computer graphics, and machine learning tasks. Tensor approach first employed in computer vision to recognize people from the way they move (Human Motion Signatures in 2001) and from their facial images (TensorFaces in 2002), but it may be used to recognize any objects, or object attributes.

We will also discuss several multilinear representations that represent cause-and-effect, such as, Multilinear PCA, Multilinear ICA (not to be confused with computing ICA by employing tensor methods, an approach typically employed to reparameterize deep learning models), Compositional Hierarchical Tensor Factorization, as well as the multilinear projection operator which is important in performing recognition.


M. Alex O. Vasilescu received her education at the Massachusetts Institute of Technology and the University of Toronto.

Vasilescu introduced the tensor paradigm in computer vision, computer graphics, machine learning, and extended the tensor algebraic framework by generalizing concepts from linear algebra. Starting in the early 2000s, she re-framed the analysis, recognition, synthesis, and interpretability of sensory data as multilinear tensor factorization problems suitable for mathematically representing cause-and-effect and demonstratively disentangling the causal factors of observable data. The tensor framework is a powerful paradigm whose utility and value has been further underscored by theoretical evidence that has showing that deep learning is a neural network approximation of multilinear tensor factorization and shallow networks are linear tensor factorizations (CP decomposition).

Vasilescu’s face recognition research, known as TensorFaces, has been funded by the TSWG, the Department of Defenses Combating Terrorism Support Program, and by IARPA, Intelligence Advanced Research Projects Activity. Her work was featured on the cover of Computer World, and in articles in the New York Times, Washington Times, etc. MITs Technology Review Magazine named her to their TR100 honoree, and the National Academy of Science co-awarded the KeckFutures Initiative Grant.

Computational Narrative Intelligence and Story Generation

Mark Riedl, Georgia Tech

Tuesday, April 30, 2019 | Mardi, 30 avril 2019


Storytelling is a pervasive part of the human experience–we as humans tell stories to communicate, inform, entertain, and educate. In this talk, I will lay out the case for the study of storytelling through the lens of present the case for the study of storytelling through the lens of artificial intelligence and a number of ways computational narrative intelligence can facilitate the creation of intelligent applications that benefit humans and facilitate human-agent interaction. I will explore the grand challenge of building an intelligent system capable of generating fictional stories, including work from my lab using classical artificial intelligence techniques, machine learning, and neural networks.


Dr. Mark Riedl is an Associate Professor in the Georgia Tech School of Interactive Computing and director of the Entertainment Intelligence Lab. Dr. Riedl’s research focuses on human-centered artificial intelligence—the development of artificial intelligence and machine learning technologies that understand and interact with human users in more natural ways. Dr. Riedl’s recent work has focused on story understanding and generation, computational creativity, explainable AI, and teaching virtual agents to behave safely. His research is supported by the NSF, DARPA, ONR, the U.S. Army, U.S. Health and Human Services, Disney, and Google. He is the recipient of a DARPA Young Faculty Award and an NSF CAREER Award.

Learning Healthy Models for Healthcare

Dr. Marzyeh Ghassemi, University of Toronto and Vector Institute

March 25, 2019


Health is important, and improvements in health improve lives. However, we still don’t fundamentally understand what it means to be healthy, and the same patient may receive different treatments across different hospitals or clinicians as new evidence is discovered, or individual illness is interpreted.

Health is unlike many success stories in machine learning so far – games like Go and self-driving cars – because we do not have well-defined goals that can be used to learn rules. The nuance of health also requires that we keep machine learning models “healthy” – working to ensure that they do not learn biased rules or detrimental recommendations.

In this talk, Dr. Ghassemi covered some of the many novel technical opportunities for machine learning to tackle that stem from health challenges, and important progress to be made with careful application to domain.