{"id":625197,"date":"2019-12-04T03:15:07","date_gmt":"2019-12-04T11:15:07","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=625197"},"modified":"2022-11-07T11:38:52","modified_gmt":"2022-11-07T19:38:52","slug":"going-meta-learning-algorithms-and-the-self-supervised-machine-with-dr-philip-bachman","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/podcast\/going-meta-learning-algorithms-and-the-self-supervised-machine-with-dr-philip-bachman\/","title":{"rendered":"Going meta: learning algorithms and the self-supervised machine with Dr. Philip Bachman"},"content":{"rendered":"<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-625203\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/12\/Research_Podcast_Phil-Bachman_Site_10_2019_1400x788-1024x576.png\" alt=\"Dr. Philip Bachman on the Microsoft Research Podcast\" width=\"1024\" height=\"576\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/12\/Research_Podcast_Phil-Bachman_Site_10_2019_1400x788-1024x576.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/12\/Research_Podcast_Phil-Bachman_Site_10_2019_1400x788-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/12\/Research_Podcast_Phil-Bachman_Site_10_2019_1400x788-768x432.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/12\/Research_Podcast_Phil-Bachman_Site_10_2019_1400x788-1066x600.png 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/12\/Research_Podcast_Phil-Bachman_Site_10_2019_1400x788-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/12\/Research_Podcast_Phil-Bachman_Site_10_2019_1400x788-343x193.png 343w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/12\/Research_Podcast_Phil-Bachman_Site_10_2019_1400x788-640x360.png 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/12\/Research_Podcast_Phil-Bachman_Site_10_2019_1400x788-960x540.png 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/12\/Research_Podcast_Phil-Bachman_Site_10_2019_1400x788-1280x720.png 1280w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/12\/Research_Podcast_Phil-Bachman_Site_10_2019_1400x788.png 1400w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/p>\n<h3>Episode 101 | December 4, 2019<\/h3>\n<p>Deep learning methodologies like supervised learning have been very successful in training machines to make predictions about the world. But because they\u2019re so dependent upon large amounts of human-annotated data, they\u2019ve been difficult to scale. <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/phbachma\/\">Dr. Phil Bachman<\/a>, a researcher at <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/lab\/microsoft-research-montreal\/\">MSR Montreal<\/a>, would like to change that, and he\u2019s working to train machines to collect, sort and label their own data, so people don\u2019t have to.<\/p>\n<p>Today, Dr. Bachman gives us an overview of the machine learning landscape and tells us why it\u2019s been so difficult to sort through noise and get to useful information. He also talks about his ongoing work on <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/deep-infomax-learning-good-representations-through-mutual-information-maximization\/\">Deep InfoMax<\/a>, a novel approach to self-supervised learning, and reveals what a conversation about ML classification problems has to do with Harrison Ford\u2019s face.<\/p>\n<h3>Related:<\/h3>\n<ul type=\"disc\">\n<li><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/podcast\">Microsoft Research Podcast<\/a>: View more podcasts on Microsoft.com<\/li>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/itunes.apple.com\/us\/podcast\/microsoft-research-a-podcast\/id1318021537?mt=2\">iTunes<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>: Subscribe and listen to new podcasts each week on iTunes<\/li>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/subscribebyemail.com\/www.blubrry.com\/feeds\/microsoftresearch.xml\">Email<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>: Subscribe and listen by email<\/li>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/subscribeonandroid.com\/www.blubrry.com\/feeds\/microsoftresearch.xml\">Android<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>: Subscribe and listen on Android<\/li>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/open.spotify.com\/show\/4ndjUXyL0hH1FXHgwIiTWU\">Spotify<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>: Listen on Spotify<\/li>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/www.blubrry.com\/feeds\/microsoftresearch.xml\">RSS feed<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/li>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/note.microsoft.com\/ww-registration-microsoft-research-newsletter-s.html?wt.mc_id=S-webpage_podcast\">Microsoft Research Newsletter<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>: Sign up to receive the latest news from Microsoft Research<\/li>\n<\/ul>\n<hr>\n<h3>Transcript<\/h3>\n<p><span data-contrast=\"none\">Phil Bachman: Training a machine to look at a large amount of unannotated data and point to specific examples and say, well, I think if a human comes in and tells me exactly what that thing is, I\u2019ll learn a lot about the problem that I\u2019m trying to solve<\/span><span data-contrast=\"none\">.<\/span><span data-contrast=\"none\">&nbsp;<\/span><span data-contrast=\"none\">S<\/span><span data-contrast=\"none\">o this general notion of carefully selecting which of those examples you want to spend the money or spend the time to get a human to go in and provide the annotations for those examples, that\u2019s this idea of active learning.<\/span><span data-ccp-props=\"{}\">&nbsp;<\/span><\/p>\n<p><b><span data-contrast=\"auto\">Host:&nbsp;<\/span><\/b><b><span data-contrast=\"auto\">You\u2019re listening to the Microsoft Research Podcast, a show that brings you closer to the cutting-edge of technology research and the scientists behind it. I\u2019m your host, Gretchen Huizinga.<\/span><\/b><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><b><span data-contrast=\"auto\">Host:&nbsp;<\/span><\/b><b><span data-contrast=\"auto\">Deep learning methodologies<\/span><\/b><b><span data-contrast=\"auto\">&nbsp;like supervised learning have been very successful in training machines to make predictions about the world<\/span><\/b><b><span data-contrast=\"auto\">.<\/span><\/b><b><span data-contrast=\"auto\">&nbsp;<\/span><\/b><b><span data-contrast=\"auto\">B<\/span><\/b><b><span data-contrast=\"auto\">ut because they\u2019re so dependent upon large amounts of human-annotated data, they\u2019ve been difficult to scale. Dr. Phil Bachman, a researcher at MSR Montreal, would like to change that, and he\u2019s working to train machines to collect, sort and label their own data, so people don\u2019t have to.<\/span><\/b><span data-ccp-props=\"{}\">&nbsp;<\/span><\/p>\n<p><b><span data-contrast=\"auto\">Today, Dr. Bachman gives us an overview of the machine learning landscape and tells us why it\u2019s been so difficult to sort through noise and get to useful information. He also talks about his ongoing work on Deep&nbsp;<\/span><\/b><b><span data-contrast=\"auto\">InfoMax<\/span><\/b><b><span data-contrast=\"auto\">, a novel approach to self-supervised learning, and reveals what a conversation about ML classification problems has to do with Harrison Ford\u2019s face.<\/span><\/b><b><span data-contrast=\"auto\">&nbsp;<\/span><\/b><b><span data-contrast=\"auto\">That and much more on this episode of the Microsoft Research Podcast.<\/span><\/b><span data-ccp-props=\"{}\">&nbsp;<\/span><\/p>\n<p><b><i><span data-contrast=\"auto\">(music plays)<\/span><\/i><\/b><span data-ccp-props=\"{}\">&nbsp;<\/span><\/p>\n<p><b><span data-contrast=\"auto\">Host: Phil Bachman, welcome to the podcast<\/span><\/b><b><span data-contrast=\"auto\">!<\/span><\/b><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><span data-contrast=\"auto\">Phil Bachman: Hi. Thanks for having me.<\/span><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><b><span data-contrast=\"auto\">Host: So as a researcher at MSR Montreal, you\u2019ve got a lot going on. Let\u2019s start macro and then get micro. And we\u2019ll start with a little phrase that I like in your bio that says you want to understand the ways in which actionable information can be distilled from raw data. Unpack it for us. What big problem or problems are you working on? What gets you up in the morning?<\/span><\/b><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><span data-contrast=\"auto\">Phil Bachman: So I\u2019d say the key here is to sort of understand the distinction between information in general and, let\u2019s say, information that might be useful.&nbsp;<\/span><span data-contrast=\"none\">So for example, if images are coming from the camera that you are using to pilot a self-driving car, then low-level sensor noise probably doesn\u2019t provide you useful information<\/span><span data-contrast=\"none\">\u2026<\/span><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><b><span data-contrast=\"none\">Host: Hmm.<\/span><\/b><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><span data-contrast=\"none\">Phil Bachman:&nbsp;<\/span><span data-contrast=\"none\">\u2026<\/span><span data-contrast=\"none\">for deciding whether to stop the car or whether to turn or make other sorts of decisions that are useful for driving.<\/span><span data-contrast=\"none\">&nbsp;<\/span><span data-contrast=\"auto\">So, what I\u2019m interested in, sort of this phrase, actionable information here, it\u2019s referring specifically to trying to focus on getting our models to capture the information content in the data that we\u2019re looking at that is actually going to be useful in the future for making some sorts of decisions. So if we\u2019re training a model that\u2019s processing the video data that\u2019s being used to drive this car, then perhaps we don\u2019t want to waste the effort of the model on trying to represent this low-level information about small variations in pixel intensity. And we\u2019d rather have the model focus its capacity for representing information on the information that corresponds to sort of higher-level structure in the image, so things like the presence or absence of a pedestrian or another car in front of it. So that\u2019s kind of what I mean with this phrase, actionable information. So this distillation from raw data is on doing learning from data that hasn\u2019t been manually curated or that doesn\u2019t have a lot of information injected into it by a human who\u2019s doing the data collection process. So going back to the self-driving car example, I\u2019d like to have a system where we could allow the computer just to watch thousands of hours of video that\u2019s captured from a bunch of cars driving around. Then what I want to be able to do is have a system that\u2019s just watching all of that video and doesn\u2019t require that much input from a person who\u2019s pointing to the video and saying specifically what\u2019s going to be interesting or useful in the future. So this information that\u2019s going to be useful for performing the types of tasks that we want our model to do eventually.<\/span><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><b><span data-contrast=\"none\">Host: B<\/span><\/b><b><span data-contrast=\"none\">efore we get specific, give us a short historical&nbsp;<\/span><\/b><b><span data-contrast=\"none\">tour&nbsp;<\/span><\/b><b><span data-contrast=\"none\">of the deep-learning methodologies as a level set<\/span><\/b><b><span data-contrast=\"none\">,<\/span><\/b><b><span data-contrast=\"none\">&nbsp;and then tell us why we need a methodology for learning representations from unlabeled data.<\/span><\/b><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><span data-contrast=\"auto\">Phil Bachman: Okay. So in the context of machine learning, people often break it down into three categories. So there will be supervised learning, unsupervised learning and reinforcement learning\u2026<\/span><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><b><span data-contrast=\"auto\">Host:&nbsp;<\/span><\/b><b><span data-contrast=\"auto\">Mmm<\/span><\/b><b><span data-contrast=\"auto\">-hmm.<\/span><\/b><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><span data-contrast=\"auto\">Phil Bachman: \u2026and it\u2019s not always clear what the distinction between the methods are. But supervised learning is sort of what\u2019s had the most immediate success and what\u2019s driving a lot of the deep learning power technologies that are being used for doing things like speech recognition in phones or doing automated question answering for chat bots and stuff like that. So supervised learning refers to kind of a subset of the techniques that people apply when they have access to a large amount of data and they have a specific type of action that they want a model to perform when it processes that data. And what they do is<\/span><span data-contrast=\"auto\">,<\/span><span data-contrast=\"auto\">&nbsp;they get a person to go and label all the data and say, okay, well this is the input to the model at this point in time. And given this input, this is what the model should output. So yo<\/span><span data-contrast=\"auto\">u\u2019<\/span><span data-contrast=\"auto\">re putting a lot of constraints on what the model is doing and constructing those constraints manually by having a person looking at a set of a million images and, for each image, they say, oh, this is a cat, this is a dog, this is a person, this is a car. So after having done that for thousands of hours, you now have a large data set where you have a bunch of different images and each of those images has an associated tag. And so now the kind&nbsp;<\/span><span data-contrast=\"auto\">of techniques that we work with and the optimization methods that we use for training our models, are very effective at fitting really large powerful models to large amounts of this sort of annotated data.<\/span><b><span data-contrast=\"auto\">&nbsp;<\/span><\/b><span data-contrast=\"auto\">So that\u2019s kind of the traditional supervised learning. But the major downside there is that the process of providing all of those annotations can be very expensive. So that process of supervised learning has a lot of issues with scalability. What we\u2019d like to do<\/span><span data-contrast=\"auto\">,<\/span><span data-contrast=\"auto\">&nbsp;ideally<\/span><span data-contrast=\"auto\">,<\/span><span data-contrast=\"auto\">&nbsp;is make use of a lot of that and figure out what kinds of information is actionable. So finding the information that seems like it will be useful for making decisions. So that\u2019s getting into a contrast between supervised learning and unsupervised learning. And then there\u2019s also reinforcement learning which is a slightly different set of techniques where you actually allow a model to go out and&nbsp;<\/span><span data-contrast=\"none\">kind of perform experiments or&nbsp;<\/span><span data-contrast=\"auto\">try to do things and then<\/span><span data-contrast=\"auto\">&nbsp;somehow<\/span><span data-contrast=\"auto\">&nbsp;it receives feedback about the things that it\u2019s doing that says, oh, what you just did, that was a good thing or that was a bad thing.<\/span><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><b><span data-contrast=\"auto\">Host: Hmm.<\/span><\/b><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><span data-contrast=\"auto\">Phil Bachman: And&nbsp;<\/span><span data-contrast=\"auto\">that<\/span><span data-contrast=\"auto\">&nbsp;i<\/span><span data-contrast=\"auto\">t<\/span><span data-contrast=\"auto\">&nbsp;learns by kind of a process of trial and error. So that\u2019s a general idea of reinforcement learning.<\/span><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><b><span data-contrast=\"none\">Host: Hmm. Okay<\/span><\/b><b><span data-contrast=\"none\">. W<\/span><\/b><b><span data-contrast=\"none\">e mentioned two flavors of this, unsupervised and then self-supervised. Is that another differentiation there?<\/span><\/b><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><span data-contrast=\"auto\">Phil Bachman: So the self-supervised learning, it\u2019s not a completely different thing, but it\u2019s a sort of subset of those types of techniques. So, the general idea behind self-supervised learning is that we try to design procedures that will generate little supervised learning problems for a model to solve, where the process of generating those little supervised learning problems is kind of automatic. And the hope here is that the kind of procedurally-generated supervised learning problems that our little algorithm is generating, based on the unlabeled data, will force the model to capture some useful information about the structure of that data that will allow it to answer more, sort of, human-oriented questions easier in the future. So just to clarify this concept of procedurally generating supervised learning problems, one really simple example would be that you could try to train a model to have some understanding of the statistical structure of visual data by showing a model a bunch of images, but what you do is you take each image and you split it into a left half and a right half. So now what you do is you take your model, and all the model is allowed to see if the left half of the image\u2026<\/span><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><b><span data-contrast=\"auto\">Host: Hmm.<\/span><\/b><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><span data-contrast=\"auto\">Phil Bachman: \u2026and then you have another model that sort of tries to form a representation of the right half of the image. And so the model that looked at the left half of the image, you present it with representations of the right halves of, like, let\u2019s say, ten thousand images, one of which is the image that it looked at. So, it\u2019s kind of got like a partner that it\u2019s looking for in this big bag of encoded right halves of images. And the job of the encoder that\u2019s processing the left half of the image is to be able to look in that bag and pick out the right half that actually corresponds to the&nbsp;<\/span><span data-contrast=\"auto\">image that it originally came from. So in this case, we\u2019re taking something that looks like unsupervised learning, but instead, here, what we\u2019re doing, is treating it more like a supervised learning problem. So the model that looks at the left half of the image, its task is to solve something that looks like just a simple classification problem.&nbsp;<\/span><span data-contrast=\"none\">And then making this like a one thousand<\/span><span data-contrast=\"none\">&#8211;<\/span><span data-contrast=\"none\">way classification problem.<\/span><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><b><span data-contrast=\"auto\">Host: The other thing that comes to my mind is, there\u2019s this weird thing on the internet where like, Harrison Ford\u2026&nbsp;<\/span><\/b><b><span data-contrast=\"auto\">y<\/span><\/b><b><span data-contrast=\"auto\">ou see half of his face and the other half of his face and they are completely different. Like if you put each halves together, they wouldn\u2019t look like Harrison Ford, but together with the different halves, they look like him.&nbsp;<\/span><\/b><b><span data-contrast=\"none\">So that would really trick the machine<\/span><\/b><b><span data-contrast=\"none\">,<\/span><\/b><b><span data-contrast=\"none\">&nbsp;I would think.<\/span><\/b><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><span data-contrast=\"none\">Phil Bachman: Actually, I wouldn\u2019t be so confident about that<\/span><span data-contrast=\"none\">!<\/span><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><b><span data-contrast=\"none\">Host: Really?<\/span><\/b><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><span data-contrast=\"none\">Phil Bachman: Yeah.&nbsp;<\/span><span data-contrast=\"auto\">The question that you<\/span><span data-contrast=\"auto\">\u2019<\/span><span data-contrast=\"auto\">re sort of training the machine to answer is, which of these possible things do you think is most likely associated with the thing that you<\/span><span data-contrast=\"auto\">\u2019<\/span><span data-contrast=\"auto\">re currently looking at? So unless there was somebody else\u2019s right face half, that looked significantly more Harrison Ford-<\/span><span data-contrast=\"auto\">ish<\/span><span data-contrast=\"auto\">, than his own right face half, then the model actually could do pretty reasonably, I\u2019d expect.<\/span><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><b><span data-contrast=\"auto\">Host: That\u2019s hilarious.<\/span><\/b><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><span data-contrast=\"auto\">Phil Bachman: So unless you had somebody who\u2026 where it was like this really strict dichotomous separation of the halves of their face, like Two-Face from Batman or something\u2026<\/span><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><b><span data-contrast=\"auto\">Host: Right. That\u2019s another one!<\/span><\/b><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><span data-contrast=\"auto\">Phil Bachman: \u2026in which case maybe the model would fail, but\u2026<\/span><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><b><span data-contrast=\"auto\">Host: I love that.<\/span><\/b><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><span data-contrast=\"auto\">Phil Bachman: \u2026if it\u2019s within like standard realm of human variability, I think it would be okay.&nbsp;<\/span><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><b><span data-contrast=\"auto\">Host: Well that\u2019s good. So let\u2019s move ahead to the algorithms that we\u2019re talking about here. And you call them learning algorithms, and you\u2019ve described your goal for learning algorithms in some intriguing ways. You want to train machines to go out and fetch data for themselves and actively find out about the world, and you want to get the machine to ask itself interesting questions so it begins to build up its own knowledge base. Tell us about these learning algorithms for active learning and what it takes to turn a machine into an information-seeking missile?<\/span><\/b><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><span data-contrast=\"auto\">Phil Bachman: Yeah, so this kind of overall objective there that you\u2019ve described is targeted at kind of expanding the scope of which parts of the problems that we\u2019re currently trying to solve<\/span><span data-contrast=\"auto\">,<\/span><span data-contrast=\"auto\">&nbsp;are solved by the machine rather than by a person who is acting as a shepherd for the machine<\/span><span data-contrast=\"auto\">,<\/span><span data-contrast=\"auto\">&nbsp;or as a teacher or something along those lines<\/span><span data-contrast=\"auto\">.<\/span><span data-contrast=\"auto\">&nbsp;So right now, the machine learning component of most systems is&nbsp;<\/span><span data-contrast=\"none\">a&nbsp;<\/span><span data-contrast=\"auto\">very important part of the system, but there\u2019s a whole lot of human effort that surrounds the production and use of something like a practical image classifier or a practical machine translation system. So that\u2019s one part of the effort that\u2019s required for getting an automated system out there in the world. So part of the process is just the initial decision, like the thing that we want to do is machine translation, here\u2019s a way of formalizing that problem and specifying it such that we can go out and now perform another part of the process \u2013 so this other part of process is a data collection.<\/span><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><b><span data-contrast=\"auto\">Host: Hmm.<\/span><\/b><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><span data-contrast=\"auto\">Phil Bachman: So you\u2019d have to go out and you\u2019d have to explicitly collect a lot of data that is relevant to the task that you are trying to solve. And then you have to take that data and&nbsp;<\/span><span data-contrast=\"none\">you maybe have to&nbsp;<\/span><span data-contrast=\"auto\">have somebody curate it to make that data more directly useful or more immediately useful for the kinds of algorithms that we tend to use right now. So a lot of the work that I want to do is about trying to reduce the amount of human effort that\u2019s required on those two fronts and trying to get as much of those two parts of the problem automated and built into the models that we\u2019re training so that we don\u2019t have to go out and manually annotate all the data.<\/span><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><b><span data-contrast=\"auto\">Host: Talk to me about the technical end of that. You know, our listeners are pretty sophisticated and you are talking about algorithms that are training a machine to do something for itself. Go a little deeper there.<\/span><\/b><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><span data-contrast=\"auto\">Phil Bachman: Okay. Yeah, I\u2019ll kind of jump into the learning algorithms for active learning part, which I guess I actually completely skipped over as I was answering the question before.&nbsp;<\/span><span data-contrast=\"none\">So training a machine to go out and collect its own data and point to specific examples and say, well, I think if a human comes in and tells me exactly what that thing is, I\u2019ll learn a lot about the problem that I\u2019m trying to solve. So this general notion of carefully selecting which of those examples you want to spend the money or spend the time to get a human to go in and provide the annotations for those examples, that\u2019s this idea of active learning.<\/span><span data-contrast=\"none\">&nbsp;<\/span><span data-contrast=\"auto\">So rather than just assuming that you have a huge batch of data and all the data is labeled, a lot of practical problems are structured more like, you have a lot of unlabeled data and you have to decide how to collect data and apply labels to it so that you can then train a model. So to do this efficiently, is you take some of the data<\/span><span data-contrast=\"auto\">,<\/span><span data-contrast=\"auto\">&nbsp;you train a model<\/span><span data-contrast=\"auto\">,<\/span><span data-contrast=\"auto\">&nbsp;and then you look at what the model is doing and you try to figure out where it\u2019s weak and where it\u2019s strong. And based on where it\u2019s weak and where it\u2019s strong, you use that to try and decide how to go out and pick other examples specifically so that you can minimize the amount of data that you have to collect and provide annotations to you such that you end up with a model that makes good predictions at the end. So that\u2019s just active learning<\/span><span data-contrast=\"none\">. And existing techniques for doing active learning, a lot of them revolve around assumptions about what kind of classifier<\/span><span data-contrast=\"none\">,<\/span><span data-contrast=\"none\">&nbsp;or what kind of decision function you are going to&nbsp;<\/span><span data-contrast=\"none\">train<\/span><span data-contrast=\"none\">&nbsp;<\/span><span data-contrast=\"none\">on that data that you are collecting the labels for. So there might be assumptions that all of the data already has some sort of fixed representation and then you are going to feed that representation into a linear classifier<\/span><span data-contrast=\"none\">,<\/span><span data-contrast=\"none\">&nbsp;for example.&nbsp;<\/span><span data-contrast=\"none\">A<\/span><span data-contrast=\"none\">nd if you make that kind of assumption, then there might be very good heuristics for going out and deciding which particular sets of features you want to apply labels to. So you can minimize the uncertainly and minimize the number of errors that\u2019s made by this linear classifier.&nbsp;<\/span><span data-contrast=\"auto\">But for working with more complicated data<\/span><span data-contrast=\"auto\">,<\/span><span data-contrast=\"auto\">&nbsp;or working in scenarios where you also want to learn a powerful representation of the data at the same time that you\u2019re collecting the data and applying labels, you might want to sort of transform this process where you decide on what the model is going to be and then you sit down for weeks<\/span><span data-contrast=\"auto\">,<\/span><span data-contrast=\"auto\">&nbsp;or years<\/span><span data-contrast=\"auto\">,<\/span><span data-contrast=\"auto\">&nbsp;and come up with a very clever heuristic for how to collect data efficiently to make that model succeed when it has a small amount of labeled data. And you\u2019d like to replace some of those more effort-intensive parts of the process with a machine that can kind of train itself to learn what kinds of data it\u2019s going to need<\/span><span data-contrast=\"auto\">,<\/span><span data-contrast=\"auto\">&nbsp;at the same time that you are also training the model that\u2019s making the prediction.<\/span><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><b><i><span data-contrast=\"auto\">(music plays)<\/span><\/i><\/b><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><b><span data-contrast=\"auto\">Host: Let\u2019s spend some time talking about your current research, and there\u2019s a lot of flavors to it. Let\u2019s start with what you are calling Deep Infomax or DIM,<\/span><\/b><b><span data-contrast=\"none\">&nbsp;<\/span><\/b><b><span data-contrast=\"auto\">but I want to point out too, that<\/span><\/b><b><span data-contrast=\"auto\">,<\/span><\/b><b><span data-contrast=\"auto\">&nbsp;in addition to Deep Infomax, you have Augmented Multiscale Deep Infomax, or AMDIM,&nbsp;<\/span><\/b><b><span data-contrast=\"auto\">Spatio<\/span><\/b><b><span data-contrast=\"auto\">-temporal Deep Infomax, Deep Graph Infomax<\/span><\/b><b><span data-contrast=\"auto\">\u2026<\/span><\/b><b><span data-contrast=\"auto\">&nbsp;There\u2019s a lot of sort of offshoots I guess you might call it. So I\u2019m going to go sort of free range here because you\u2019ll be able to give us a better guided tour of the main idea<\/span><\/b><b><span data-contrast=\"auto\">,<\/span><\/b><b><span data-contrast=\"auto\">&nbsp;and all the offshoots<\/span><\/b><b><span data-contrast=\"auto\">,<\/span><\/b><b><span data-contrast=\"auto\">&nbsp;better than I will. Tell us about the Deep Infomax research family and what you\u2019re up to.<\/span><\/b><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><span data-contrast=\"auto\">Phil Bachman: Okay. So the kind of higher level idea that ties these things together is the idea that we want to learn to represent the data that we\u2019re looking at. So sometimes that data might be text, sometimes it might be images or in the case, for example, of the Deep Graph Infomax, it might be a graph. So the overall higher level idea of Deep Infomax is that we want to form representations that act a little bit like an associative memory. Kind of going back to what I was saying about the thing with the split faces before, we can think of the left half of a face and the right half of&nbsp;<\/span><span data-contrast=\"auto\">a<\/span><span data-contrast=\"auto\">&nbsp;face sort of as random variables. So you can think of just sampling the left half of&nbsp;<\/span><span data-contrast=\"auto\">a<\/span><span data-contrast=\"auto\">&nbsp;face and there might be slightly different versions of the right half of that face that are all sort of valid. So looking at the left half, I guess, as you were getting at with the Harrison Ford thing, the right half isn\u2019t always perfectly determined, but you can think of the distribution of all possible right half faces, and the variability there is much broader than the variability that you have if you are just looking at<\/span><span data-contrast=\"auto\">,<\/span><span data-contrast=\"auto\">&nbsp;what is the right half of Harrison Ford\u2019s face given that we\u2019re looking at the left half<\/span><span data-contrast=\"auto\">?<\/span><span data-contrast=\"auto\">&nbsp;So the mutual information between our representation of the left half of the face and the right half of the face is high. When our ability to predict what the right half of the face looks like is very good, relative to how well we could predict what the right half of the face looks like in the case where we didn\u2019t get to see the left half, if we were just looking at a bunch of different images that had the same shape as the images of the right halves of a face, these images have a lot of variability in their structure. Like some of them, it might be the back&nbsp;<\/span><span data-contrast=\"auto\">half or the front half of a car or something like that and looks very different from faces. So, in principle, we can sort of make a reasonable prediction,&nbsp;<\/span><span data-contrast=\"none\">for example<\/span><span data-contrast=\"none\">,<\/span><span data-contrast=\"none\">&nbsp;<\/span><span data-contrast=\"auto\">of whether or not the image that we\u2019re looking at right now encodes the right half of&nbsp;<\/span><span data-contrast=\"auto\">a<\/span><span data-contrast=\"auto\">&nbsp;face, but there\u2019s still some uncertainty there. And then when we add in the left half of Harrison Ford\u2019s face, and we\u2019re trying to say, okay, well out of the distribution of things that look like the right halves of faces, which ones correspond to Harrison Ford, the more precisely we can make that guess, the higher the mutual information is between our representation of the left half and the right half of the face.<\/span><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><b><span data-contrast=\"none\">Host:&nbsp;<\/span><\/b><b><span data-contrast=\"none\">Well, l<\/span><\/b><b><span data-contrast=\"none\">et me ask you to go a little deeper on the technical side of this.&nbsp;<\/span><\/b><b><span data-contrast=\"none\">Y<\/span><\/b><b><span data-contrast=\"none\">ou sent me a slide that has a lot of algorithmic explanation of Deep Infomax and then how you kind of take that further with Augmented Multiscale Deep Infomax<\/span><\/b><b><span data-contrast=\"none\">\u2026<\/span><\/b><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><span data-contrast=\"none\">Phil Bachman:&nbsp;<\/span><span data-contrast=\"none\">S<\/span><span data-contrast=\"none\">o the actual mutual information aspect<\/span><span data-contrast=\"none\">,&nbsp;<\/span><span data-contrast=\"none\">sort of formally<\/span><span data-contrast=\"none\">,<\/span><span data-contrast=\"none\">&nbsp;the way it shows up here<\/span><span data-contrast=\"none\">,<\/span><span data-contrast=\"none\">&nbsp;is that we sample this kind of true pair of corresponding image and audio sample<\/span><span data-contrast=\"none\">,<\/span><span data-contrast=\"none\">&nbsp;and then we have a distribution from which we can sample just another random audio sample and we can sample maybe<\/span><span data-contrast=\"none\">,<\/span><span data-contrast=\"none\">&nbsp;say<\/span><span data-contrast=\"none\">,<\/span><span data-contrast=\"none\">&nbsp;<\/span><span data-contrast=\"none\">a thousand<\/span><span data-contrast=\"none\">&nbsp;of those other random audio samples and we can encode them with our audio encoder. And then we can sort of present a little classification problem<\/span><span data-contrast=\"none\">\u2026<\/span><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><b><span data-contrast=\"none\">Host:&nbsp;<\/span><\/b><b><span data-contrast=\"none\">Mmm<\/span><\/b><b><span data-contrast=\"none\">-hmm.<\/span><\/b><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><span data-contrast=\"none\">Phil Bachman:<\/span><span data-contrast=\"none\">&nbsp;\u2026<\/span><span data-contrast=\"none\">to the model that looked at the image, where that classification problem is telling the model that looked at the image to identify which<\/span><span data-contrast=\"none\">,<\/span><span data-contrast=\"none\">&nbsp;among<\/span><span data-contrast=\"none\">,<\/span><span data-contrast=\"none\">&nbsp;let\u2019s say<\/span><span data-contrast=\"none\">,<\/span><span data-contrast=\"none\">&nbsp;<\/span><span data-contrast=\"none\">one thousand and one<\/span><span data-contrast=\"none\">&nbsp;audio recordings is the audio recording that comes from that same point in time. So the mutual information here<\/span><span data-contrast=\"none\">, um,&nbsp;<\/span><span data-contrast=\"none\">what we\u2019re doing kind of more technically&nbsp;<\/span><span data-contrast=\"none\">is&nbsp;<\/span><span data-contrast=\"none\">we\u2019re constructing a lower bound on the mutual information between the random variables corresponding to the representation of the image and the representation of the audio modality.&nbsp;<\/span><span data-contrast=\"none\">S<\/span><span data-contrast=\"none\">o we first draw a sample from the joint distribution of the representations of those two modalities, and then we also have to sample a lot of samples from what\u2019s called&nbsp;<\/span><span data-contrast=\"none\">the<\/span><span data-contrast=\"none\">&nbsp;marginal distribution of that second random variable which is the representations of the audio modality.&nbsp;<\/span><span data-contrast=\"none\">S<\/span><span data-contrast=\"none\">o we draw<\/span><span data-contrast=\"none\">,<\/span><span data-contrast=\"none\">&nbsp;say<\/span><span data-contrast=\"none\">,<\/span><span data-contrast=\"none\">&nbsp;<\/span><span data-contrast=\"none\">a thousand<\/span><span data-contrast=\"none\">&nbsp;samples from that marginal distribution and we construct this little classification problem where the model is trying to identify which of the audio samples was the sample from the true joint distribution over audio and visual data<\/span><span data-contrast=\"none\">,<\/span><span data-contrast=\"none\">&nbsp;and which of the samples just came from random samples from the marginal distribution.&nbsp;<\/span><span data-contrast=\"none\">S<\/span><span data-contrast=\"none\">o this is a technique called&nbsp;<\/span><span data-contrast=\"none\">N<\/span><span data-contrast=\"none\">oise&nbsp;<\/span><span data-contrast=\"none\">C<\/span><span data-contrast=\"none\">ontrast<\/span><span data-contrast=\"none\">ive<\/span><span data-contrast=\"none\">&nbsp;<\/span><span data-contrast=\"none\">E<\/span><span data-contrast=\"none\">stimation&nbsp;<\/span><span data-contrast=\"none\">that\u2019s been developed and applied in a lot of different scenarios<\/span><span data-contrast=\"none\">. So a good example of this is techniques that have been used for training word vectors<\/span><span data-contrast=\"none\">.&nbsp;<\/span><span data-contrast=\"none\">But in the case where we\u2019re using it, it\u2019s a technique that can be used for constructing kind of a formally correct lower bound on the mutual information between these two random variables<\/span><span data-contrast=\"none\">, o<\/span><span data-contrast=\"none\">ne of which corresponds to you know, samples of visual data and one of which corresponds to samples of audio data.<\/span><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><b><span data-contrast=\"none\">Host: Okay.<\/span><\/b><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><span data-contrast=\"none\">Phil Bachman: And the joint distribution over those two kind of random variables is constructed by just going around the world with a camera and a microphone and just taking little snippets of visual and audio information from different points in time and in different scenes.<\/span><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><b><span data-contrast=\"auto\">Host: All right. Well, as you described Deep Infomax, and then you have Augmented Multi<\/span><\/b><b><span data-contrast=\"auto\">s<\/span><\/b><b><span data-contrast=\"auto\">cale Deep Infomax, you call that improving Deep Infomax based on some limitations in the prior. How would you differentiate how the Augmented Multiscale Deep Infomax is better than the original idea?<\/span><\/b><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><span data-contrast=\"auto\">Phil Bachman: Yeah, so the original idea, depending on specifically how you implement it, has some significant downsides in some sense. The original Deep Infomax was just looking at a single version of a single image, and in this case, there\u2019s sort of an issue where, if you are just looking at the single image, and, let\u2019s say, encoding all of the little patches in the image, the way that the original Deep Infomax presentation kind of goes is that you take that image, you encode each of the patches and you also encode the whole image. And so here, we\u2019re going to sort of train the representation of the whole image such that it can look at all of these patches and say that oh, those are patches that came from my image.<\/span><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><b><span data-contrast=\"auto\">Host:&nbsp;<\/span><\/b><b><span data-contrast=\"auto\">Mmm<\/span><\/b><b><span data-contrast=\"auto\">.<\/span><\/b><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><span data-contrast=\"auto\">Phil Bachman: So this is a little bit like that idea of associative memory<\/span><span data-contrast=\"auto\">,<\/span><span data-contrast=\"auto\">&nbsp;but it\u2019s applied on sort of a single input.<\/span><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><b><span data-contrast=\"none\">Host: Okay.&nbsp;<\/span><\/b><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><span data-contrast=\"auto\">Phil Bachman: So kind of procedurally how you would do this is that you would take an image, you would encode it, you get representations of all the little patches and you get a representation of the whole image. And now you\u2019re going to construct a little classification problem where you take a thousand other images and you also encode their patches and you sort of mix them into a bucket with all the patches from the original image that you computed a full image encoding for, and the job of the full image&nbsp;<\/span><span data-contrast=\"auto\">en<\/span><span data-contrast=\"auto\">coding is to look in that bucket and essentially pick out all the patches that are part of its image.<\/span><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><b><span data-contrast=\"auto\">Host: Hmm.<\/span><\/b><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><span data-contrast=\"auto\">Phil Bachman: So one of the difficulties here, like one of the shortcomings of that particular way of formulating it, if you take that more restrictive interpretation, the main downside is that the encoder that\u2019s processing the full image can basically just memorize the content that\u2019s there.<\/span><span data-contrast=\"none\">&nbsp;<\/span><span data-contrast=\"auto\">And it\u2019s fairly easy for the model to just copy that information into the representation of the whole image, and essentially it\u2019s just memory that stores the representations of all the little patches. There might be some areas in which this is useful, but for some types of predictive tasks, it might not be so useful because you\u2019re not really asking the representation of the whole image to answer&nbsp;<\/span><span data-contrast=\"auto\">sort of interesting predictive problems about what kinds of other things might you see that weren\u2019t explicitly in the image that you\u2019re looking at now.<\/span><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><b><span data-contrast=\"auto\">Host: Right.<\/span><\/b><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><span data-contrast=\"auto\">Phil Bachman: So if you are looking at left half of faces and right half of faces, if, instead of looking at left half of face and right half of face, all you did was you showed your encoder this left half of the face, you encoded it to a small vector and then you showed it the same half again and you said is this the one that you looked at before? The model might not actually have to learn that much to be able to solve that task really well. But if you take it and you change it to a task where the kinds of predictions that you are forcing that representation to make are a little bit more interesting<\/span><span data-contrast=\"auto\">, y<\/span><span data-contrast=\"auto\">ou can ask a more interesting question which is like, did this eye come from the right half of the face whose left half you looked at? So here, now, the model is answering kind of a more challenging question.<\/span><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><b><span data-contrast=\"auto\">Host: Right.<\/span><\/b><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><span data-contrast=\"auto\">Phil Bachman: This is one of the main changes that we make when we go from the original formulation of the Deep Infomax to this Augmented Deep Infomax. So this is the augmented part, not the multiscale part. That\u2019s another thing, where we\u2019re looking at multiple scales of representation. But if we just look at the augmented part, kind of the big improvement there, is that we\u2019re forcing the model to answer questions<\/span><span data-contrast=\"auto\">,<\/span><span data-contrast=\"auto\">&nbsp;or form an associated memory<\/span><span data-contrast=\"auto\">,<\/span><span data-contrast=\"auto\">&nbsp;where the associations that we\u2019re forcing it to make are more challenging to make, so that the model has to put a little more effort into how it\u2019s going to represent the data.<\/span><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><b><i><span data-contrast=\"auto\">(music plays)<\/span><\/i><\/b><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><b><span data-contrast=\"auto\">Host: I like to explore consequences, Phil, both intended and otherwise, that new technologies inevitably have on society<\/span><\/b><b><span data-contrast=\"auto\">,<\/span><\/b><b><span data-contrast=\"auto\">&nbsp;and this is the part of the podcast where I always ask, what can possibly go wrong? So you<\/span><\/b><b><span data-contrast=\"auto\">\u2019<\/span><\/b><b><span data-contrast=\"auto\">re working in a lab that has a stated aim of teaching machines to read, think and communicate like humans. Is there anything about that, that keeps you up at night, and if so, what is it<\/span><\/b><b><span data-contrast=\"auto\">,<\/span><\/b><b><span data-contrast=\"auto\">&nbsp;and more importantly, what are you doing to address i<\/span><\/b><b><span data-contrast=\"auto\">t<\/span><\/b><b><span data-contrast=\"auto\">?<\/span><\/b><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><span data-contrast=\"auto\">Phil Bachman: So we do have a group here that\u2019s working on what we call fairness, accountability, transparency and ethics. So it\u2019s the FATE group. So they<\/span><span data-contrast=\"auto\">\u2019<\/span><span data-contrast=\"auto\">re working on a lot of questions that are, let\u2019s say, immediately relevant as opposed to questions that are kind of long-term relevant \u2013 or irrelevant depending on your perspective! \u2013&nbsp;<\/span><span data-contrast=\"auto\">umm\u2026&nbsp;<\/span><span data-contrast=\"auto\">so there\u2019s this idea of existential risk<\/span><span data-contrast=\"auto\">,<\/span><span data-contrast=\"auto\">&nbsp;which is more of a long-term question. So this is the kind of question like, well, if we develop a superhuman AI, is it going to care about us and take care of us, or is it going to consume us in its quest for more resources?&nbsp;<\/span><span data-contrast=\"none\">S<\/span><span data-contrast=\"none\">o we\u2019ll set that aside<\/span><span data-contrast=\"auto\">. And so like the more immediately salient one is the kinds of things that the FATE is looking at, and so these are things like well, if we\u2019re training a system that\u2019s going to sit at a bank and analyze people\u2019s credit&nbsp;<\/span><span data-contrast=\"auto\">history, are there historical trends in the data that might be due to systemic discrimination or systemic disadvantaging of particular groups of people, that are going to be reflected in the data that we use to train our system such that then, when the system goes to make decision<\/span><span data-contrast=\"auto\">s<\/span><span data-contrast=\"auto\">, it\u2019s kind of implicitly or accidentally discriminating against these groups just due to the fact that they were also historically discriminated against and that\u2019s reflected in the data that we\u2019re using to train the system. So me personally, a great thing that I could do would be create something that\u2019s like the internal combustion engine of machine learning<\/span><span data-contrast=\"auto\">,<\/span><span data-contrast=\"auto\">&nbsp;or even like the steam engine. Those things have had an incredible effect on society and that\u2019s been very empowering and it\u2019s helped with a lot of progress, but it also makes it easier for people to do bad things at scale. So I\u2019m kind of more worried about that type of problem. And I think that that type of problem isn\u2019t necessarily a technological problem. It\u2019s a little bit more of a system or social problem. Because I think the technology is going to happen<\/span><span data-contrast=\"auto\">,<\/span><span data-contrast=\"auto\">&nbsp;and so kind of the things that worry me there are along the lines of like seeing the technology<\/span><span data-contrast=\"auto\">&nbsp;<\/span><span data-contrast=\"auto\">and the way in which it increases people\u2019s leverage over the world and the ability to affect it kind of at scale. I guess for me, on a day-to-day basis, like I don\u2019t think about it too much as I\u2019m doing research because to me<\/span><span data-contrast=\"auto\">,<\/span><span data-contrast=\"auto\">&nbsp;again, it\u2019s not really so much of a technical problem. I think it would be very hard to design the technology so that it can\u2019t do bad things.<\/span><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><b><span data-contrast=\"auto\">Host: Well listen, I happen to know you didn\u2019t start out in Montreal. So tell us a little bit about yourself. What got a young Phil Bachman interested in computer science and how did he land at Microsoft Research in Montreal?<\/span><\/b><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><span data-contrast=\"auto\">Phil Bachman: I kind of always grew up with a computer in the home. I was fortunate in that sense, that I was always around computers and I could use them for playing games and I could do a little bit of programming<\/span><span data-contrast=\"auto\">.&nbsp;<\/span><span data-contrast=\"auto\">And I\u2019m not old, but I\u2019m not in the youngest demographic that you would see<\/span><span data-contrast=\"auto\">,&nbsp;<\/span><span data-contrast=\"auto\">uhhh<\/span><span data-contrast=\"auto\">,<\/span><span data-contrast=\"auto\">&nbsp;working in tech. And one of the things that I really liked<\/span><span data-contrast=\"auto\">&nbsp;<\/span><span data-contrast=\"auto\">when I was in high school,<\/span><span data-contrast=\"none\">&nbsp;<\/span><span data-contrast=\"auto\">I started playing a lot of these first person games where you kind of run around and you shoot things. You know, for better or worse, it was fun. So one of the things that was a challenge at first for me was<\/span><span data-contrast=\"auto\">,<\/span><span data-contrast=\"auto\">&nbsp;I didn\u2019t have great internet. So what I would do is go to the school library and look around and it turned out that you could download some bots that people had made so you could sort of fake the multi-player kind of experience. So I thought that was really cool. And one of the things I had<\/span><span data-contrast=\"auto\">,<\/span><span data-contrast=\"auto\">&nbsp;you know, started thinking about there was<\/span><span data-contrast=\"auto\">,<\/span><span data-contrast=\"auto\">&nbsp;okay<\/span><span data-contrast=\"auto\">,<\/span><span data-contrast=\"auto\">&nbsp;well, you know, what is it that these bots are actually doing? So I was doing a little bit of coding and like making some little simple games<\/span><span data-contrast=\"auto\">.&nbsp;<\/span><span data-contrast=\"auto\">So thinking about that, like how would we automate this little thing that kind of is fairly simple at its core, but that, when we let it loose in this environment \u2013 so like when we let it run around and compete with the other players \u2013 it does something interesting and fun<\/span><span data-contrast=\"auto\">?<\/span><span data-contrast=\"auto\">&nbsp;And so that was sort of always at the back of my mind a bit I guess.&nbsp;<\/span><span data-contrast=\"auto\">And&nbsp;<\/span><span data-contrast=\"auto\">I bounced around&nbsp;<\/span><span data-contrast=\"none\">a little<\/span><span data-contrast=\"none\">&nbsp;bit<\/span><span data-contrast=\"none\">,<\/span><span data-contrast=\"none\">&nbsp;<\/span><span data-contrast=\"auto\">academically,&nbsp;<\/span><span data-contrast=\"auto\">and starting doing research in a slightly different field,&nbsp;<\/span><span data-contrast=\"auto\">but then eventually I kind of sat around and watched a bunch of online lectures and there were a couple of areas of machine learning<\/span><span data-contrast=\"auto\">,<\/span><span data-contrast=\"auto\">&nbsp;like reinforcement learning for example<\/span><span data-contrast=\"auto\">,<\/span><span data-contrast=\"auto\">&nbsp;that really started to click with me and that I was excited about because it was getting back to those kinds of questions I\u2019d asked myself about before,&nbsp;<\/span><span data-contrast=\"none\">like how do we get this little bot to do interesting things<\/span><span data-contrast=\"auto\">. So that brought me from Texas<\/span><span data-contrast=\"auto\">\u2026<\/span><span data-contrast=\"auto\">&nbsp;because I was in grad school in Texas after having&nbsp;<\/span><span data-contrast=\"auto\">done my undergraduate studies in New York<\/span><span data-contrast=\"auto\">\u2026<\/span><span data-contrast=\"auto\">&nbsp;<\/span><span data-contrast=\"auto\">B<\/span><span data-contrast=\"auto\">ut then I found this group that was in Montreal, doing reinforcement learning<\/span><span data-contrast=\"auto\">, s<\/span><span data-contrast=\"auto\">o I came and I worked with that group and that\u2019s where I did my PhD. And then afterwards<\/span><span data-contrast=\"auto\">,<\/span><span data-contrast=\"auto\">&nbsp;<\/span><span data-contrast=\"none\">I hung around and I liked the city pretty well, and&nbsp;<\/span><span data-contrast=\"auto\">I was looking around at kind of the jobs that were available elsewhere<\/span><span data-contrast=\"auto\">,<\/span><span data-contrast=\"auto\">&nbsp;and an exciting opportunity popped up here. So, there was a start-up called Maluuba that was based out of Waterloo, and it was developing kind of technology and software for doing virtual personal assistants, and the company wanted to sort of start getting more aggressive about pushing their technology forward, so they came to Montreal because there was a lot of machine learning cool stuff happening in Montreal<\/span><span data-contrast=\"auto\">,<\/span><span data-contrast=\"auto\">&nbsp;and then opened a research lab and<\/span><span data-contrast=\"auto\">,<\/span><span data-contrast=\"auto\">&nbsp;basically<\/span><span data-contrast=\"auto\">,<\/span><span data-contrast=\"auto\">&nbsp;as those lab doors were opening, I walked in and joined the company. And about a year later<\/span><span data-contrast=\"auto\">,<\/span><span data-contrast=\"auto\">&nbsp;we were actually acquired by Microsoft. So that\u2019s how I ended up at MSR.<\/span><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><b><span data-contrast=\"auto\">Host: Well, at the risk of heading into uncomfortable ice-breaker question territory, Phil, tell us one interesting thing about yourself that people might not know and how has it influenced your career as a researcher? And even if it didn\u2019t, tell us something interesting about yourself anyway!<\/span><\/b><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><span data-contrast=\"auto\">Phil Bachman:&nbsp;<\/span><span data-contrast=\"none\">P<\/span><span data-contrast=\"none\">ersonally, I\u2019d say<\/span><span data-contrast=\"none\">, o<\/span><span data-contrast=\"auto\">ne thing that I\u2019ve always enjoyed is being fairly involved in at least one type of<\/span><span data-contrast=\"auto\">,<\/span><span data-contrast=\"auto\">&nbsp;let\u2019s sa<\/span><span data-contrast=\"auto\">y,<\/span><span data-contrast=\"auto\">&nbsp;goal-oriented physical activity. That\u2019s a super weird sounding description. But for example, as an undergrad, I did a lot of rock climbing<\/span><span data-contrast=\"auto\">.<\/span><span data-contrast=\"auto\">&nbsp;<\/span><span data-contrast=\"auto\">S<\/span><span data-contrast=\"auto\">o having that as a thing where I could just really be focused and apply myself to solving problems in some sense \u2013&nbsp;<\/span><span data-contrast=\"auto\">a&nbsp;<\/span><span data-contrast=\"auto\">lot of climbing is about kind of planning out what you are going to do and it\u2019s a little bit like solving a puzzle sometimes<\/span><span data-contrast=\"auto\">&nbsp;\u2013 a<\/span><span data-contrast=\"auto\">nd having that as a thing that\u2019s sort of separate from the work I do, but that still is kind of mentally and also physically active, and being able to kind of apply myself to that strongly. I don\u2019t do the rock climbing specifically anymore, but what I do now is I play a lot of soccer. So I really enjoy the combination of the physical aspect as well as the mental aspect of the game, so there\u2019s a lot of extemporaneous kind of inventive thinking. And it can be very satisfying when you kind of do something that\u2019s exactly right at exactly the right time, especially when you realize later that you didn\u2019t really even think about it, it just sort of happened.&nbsp;<\/span><span data-contrast=\"auto\">And&nbsp;<\/span><span data-contrast=\"auto\">I guess that might be related to some of the better moments<\/span><span data-contrast=\"auto\">,<\/span><span data-contrast=\"auto\">&nbsp;as a researcher<\/span><span data-contrast=\"auto\">,<\/span><span data-contrast=\"auto\">&nbsp;that you have when you are trying to solve a problem and you<\/span><span data-contrast=\"auto\">\u2019<\/span><span data-contrast=\"auto\">re just kind of messing around and then something just sort of clicks and you just kind of see how you should do it.<\/span><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><b><span data-contrast=\"auto\">Host: At the end of every podcast, I give my guests the proverbial last word. So tell our listeners from your perspective, what are the big challenges out there&nbsp;<\/span><\/b><b><span data-contrast=\"auto\">right now, and<\/span><\/b><b><span data-contrast=\"auto\">&nbsp;research directions that might address them<\/span><\/b><b><span data-contrast=\"auto\">,<\/span><\/b><b><span data-contrast=\"auto\">&nbsp;when we\u2019re talking about machine learning research and what\u2019s hype and what\u2019s hope and what\u2019s the future?<\/span><\/b><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><span data-contrast=\"auto\">Phil Bachman: I guess one that I would say is filtering through all the different things that people are writing and saying, and trying to figure out which parts of what they are saying seem new but they are really just kind of a rewording of some concept that you\u2019re familiar with&nbsp;<\/span><span data-contrast=\"none\">and you just kind of have to rephrase it a little bit and then see how it fits into your existing internal framework<\/span><span data-contrast=\"none\">.<\/span><span data-contrast=\"none\">&nbsp;<\/span><span data-contrast=\"auto\">A<\/span><span data-contrast=\"auto\">nd being able to use that ability to figure out what\u2019s new and what\u2019s different and&nbsp;<\/span><span data-contrast=\"auto\">figure out how it differs from what people were trying before<\/span><span data-contrast=\"auto\">,<\/span><span data-contrast=\"auto\">&nbsp;and that allows you to be kind of more precise in your guesses about what is actually important<\/span><span data-contrast=\"auto\">.<\/span><span data-contrast=\"auto\">&nbsp;<\/span><span data-contrast=\"auto\">B<\/span><span data-contrast=\"auto\">ut a lot of that sort of washes out in the end and it doesn\u2019t really survive that long.<\/span><span data-contrast=\"auto\">&nbsp;<\/span><span data-contrast=\"none\">Sort of&nbsp;<\/span><span data-contrast=\"auto\">a<\/span><span data-contrast=\"auto\">t the beginning<\/span><span data-contrast=\"auto\">,<\/span><span data-contrast=\"auto\">&nbsp;as a researcher, you have to<\/span><span data-contrast=\"auto\">,<\/span><span data-contrast=\"auto\">&nbsp;<\/span><span data-contrast=\"auto\">you know,&nbsp;<\/span><span data-contrast=\"auto\">rely on other people because you don\u2019t really know where you are going yet<\/span><span data-contrast=\"auto\">,<\/span><span data-contrast=\"auto\">&nbsp;but over time, taking those training wheels&nbsp;<\/span><span data-contrast=\"none\">off a little bit&nbsp;<\/span><span data-contrast=\"auto\">and developing your own personal internal framework for how you think about problems, so that when you get new information, you can kind of quickly contextualize it and figure out which are the new bits&nbsp;<\/span><span data-contrast=\"none\">that&nbsp;<\/span><span data-contrast=\"auto\">are actually going to change the way that you look at things, and which bits are sort of just a different version of something that you already have.<\/span><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><b><span data-contrast=\"auto\">Host: Phil Bachman, thanks for joining us from Montreal today.<\/span><\/b><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><span data-contrast=\"auto\">Phil Bachman: Yeah, thanks for having me.<\/span><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><b><i><span data-contrast=\"auto\">(music plays)<\/span><\/i><\/b><span data-ccp-props=\"{\"201341983\":0,\"335559685\":720,\"335559740\":240,\"335559991\":720,\"469777462\":[1620],\"469777927\":[0],\"469777928\":[1]}\">&nbsp;<\/span><\/p>\n<p><b><i><span data-contrast=\"auto\">To learn more about Dr. Phil Bachman, and the latest research in machine learning, visit Microsoft.com\/research<\/span><\/i><\/b><span data-ccp-props=\"{}\">&nbsp;<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Deep learning methodologies like supervised learning have been very successful in training machines to make predictions about the world. But because they\u2019re so dependent upon large amounts of human-annotated data, they\u2019ve been difficult to scale. Dr. Phil Bachman, a researcher at MSR Montreal, would like to change that, and he\u2019s working to train machines to collect, sort and label their own data, so people don\u2019t have to. On the podcast, Dr. Bachman gives us an overview of the machine learning landscape and tells us why it\u2019s been so difficult to sort through noise and get to useful information. He also talks about his ongoing work on Deep InfoMax, a novel approach to self-supervised learning, and reveals what a conversation about ML classification problems has to do with Harrison Ford\u2019s face.<\/p>\n","protected":false},"author":39507,"featured_media":625203,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"https:\/\/player.blubrry.com\/id\/52694268\/","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[],"msr_hide_image_in_river":0,"footnotes":""},"categories":[240054],"tags":[243924],"research-area":[13556],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-625197","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-msr-podcast","tag-deep-infomax","msr-research-area-artificial-intelligence","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"https:\/\/player.blubrry.com\/id\/52694268\/","podcast_episode":"","msr_research_lab":[437514],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[],"related-projects":[],"related-events":[],"related-researchers":[],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/12\/Research_Podcast_Phil-Bachman_Site_10_2019_1400x788-960x540.png\" class=\"img-object-cover\" alt=\"Dr. Philip Bachman on the Microsoft Research Podcast\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/12\/Research_Podcast_Phil-Bachman_Site_10_2019_1400x788-960x540.png 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/12\/Research_Podcast_Phil-Bachman_Site_10_2019_1400x788-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/12\/Research_Podcast_Phil-Bachman_Site_10_2019_1400x788-1024x576.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/12\/Research_Podcast_Phil-Bachman_Site_10_2019_1400x788-768x432.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/12\/Research_Podcast_Phil-Bachman_Site_10_2019_1400x788-1066x600.png 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/12\/Research_Podcast_Phil-Bachman_Site_10_2019_1400x788-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/12\/Research_Podcast_Phil-Bachman_Site_10_2019_1400x788-343x193.png 343w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/12\/Research_Podcast_Phil-Bachman_Site_10_2019_1400x788-640x360.png 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/12\/Research_Podcast_Phil-Bachman_Site_10_2019_1400x788-1280x720.png 1280w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/12\/Research_Podcast_Phil-Bachman_Site_10_2019_1400x788.png 1400w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/>","byline":"","formattedDate":"December 4, 2019","formattedExcerpt":"Deep learning methodologies like supervised learning have been very successful in training machines to make predictions about the world. But because they\u2019re so dependent upon large amounts of human-annotated data, they\u2019ve been difficult to scale. Dr. Phil Bachman, a researcher at MSR Montreal, would like&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/625197","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/39507"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=625197"}],"version-history":[{"count":7,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/625197\/revisions"}],"predecessor-version":[{"id":896307,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/625197\/revisions\/896307"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/625203"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=625197"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=625197"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=625197"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=625197"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=625197"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=625197"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=625197"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=625197"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=625197"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=625197"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=625197"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}