Microsoft Research Podcast

Microsoft Research Podcast

An ongoing series of conversations bringing you right up to the cutting edge of Microsoft Research.

Holograms, spatial anchors and the future of computer vision with Dr. Marc Pollefeys

April 10, 2019 | By Microsoft blog editor

Episode 71, April 10, 2019

Dr. Marc Pollefeys is a Professor of Computer Science at ETH Zurich, a Partner Director of Science for Microsoft, and the Director of a new Microsoft Mixed Reality and AI lab in Switzerland. He’s a leader in the field of computer vision research, but it’s hard to pin down whether his work is really about the future of computer vision, or about a vision of future computers. Arguably, it’s both!

On today’s podcast, Dr. Pollefeys brings us up to speed on the latest in computer vision research, including his innovative work with Azure Spatial Anchors, tells us how devices like Kinect and HoloLens may have cut their teeth in gaming, but turned out to be game changers for both research and industrial applications, and explains how, while it’s still early days now, in the future, you’re much more likely to put your computer on your head than on your desk or your lap.

Related:


Final Transcript

Marc Pollefeys: So, instead of carrying a small device with you, or having a computer screen in front of you, the computer or the device will not anymore be a physical thing that you look at. It will be something that can place information anywhere in the world. And so, you can have screens that move with you. You can choose how big or how many screens you want to place around you as you work. The difference with now, having to take out your phone and, if you want to see one of those holograms that we would share, you have to actively look for it. If you have these glasses on, you will just be walking around, and if there’s something relevant for you, it will just appear in front of your eyes.

Host: You’re listening to the Microsoft Research Podcast, a show that brings you closer to the cutting-edge of technology research and the scientists behind it. I’m your host, Gretchen Huizinga.

Host: Dr. Marc Pollefeys is a Professor of Computer Science at ETH Zurich, a Partner Director of Science for Microsoft, and the Director of a new Microsoft Mixed Reality and AI lab in Switzerland. He’s a leader in the field of computer vision research, but it’s hard to pin down whether his work is really about the future of computer vision, or about a vision of future computers. Arguably, it’s both!

On today’s podcast, Dr. Pollefeys brings us up to speed on the latest in computer vision research, including his innovative work with Azure Spatial Anchors, tells us how devices like Kinect and HoloLens may have cut their teeth in gaming, but turned out to be game changers for both research and industrial applications, and explains how, while it’s still early days now, in the future, you’re much more likely to put your computer on your head than on your desk or your lap. That and much more on this episode of the Microsoft Research Podcast.

Host: Marc Pollefeys, welcome to the podcast.

Marc Pollefeys: Thank you.

Host: So, you wear a few hats. You’re a professor of computer science at ETH Zurich, a Partner Director of Science for Microsoft, and now you’re overseeing the creation of a new Microsoft Mixed Reality and AI lab in Switzerland. So, it’s pretty obvious what gets you up in the morning. Tell us a little bit about each of your roles and how you manage to switch hats and work everything into a day.

Marc Pollefeys: Sure! I’ve been a professor for quite a while here in Switzerland, and before that in the US. Then almost three years ago I joined Microsoft to work with Alex Kipman on mixed reality. I spent two years in Redmond, working with a large team of scientists and engineers on moving computer vision technology on HoloLens forward for, in particular, we worked on HoloLens 2 that was recently announced. I told Alex that I was going to do this for two years and then I wanted to come back Zurich, back to being a professor at ETH, but after those two years, I realized that it was, in a sense, very complementary. So on the one hand, I’m really excited about doing academic, basic research, but I was also always very interested in doing more applied research, and so I can partially do that at ETH, but of course, there’s no place to have a bigger impact with that applied research than Microsoft.

Host: Right.

Marc Pollefeys: And so, I realized that I wanted to continue doing that. And so at that point I discussed with Alex what we could do, what would make sense, and I realized that it was something really interesting that could be done, which was to set up a lab here in Zurich, ETH being one of the top schools to recruit talent for the type of research that we need to do for mixed reality. At the same time, from the side of ETH, with my ETH hat on, a great opportunity to provide opportunities for students to work with Microsoft to get access to devices, to resources that we would not necessarily have at ETH. A lot of exciting projects to propose to the students. And so really, essentially, saw that there was a real win-win, between, you know, what ETH can offer and what Microsoft can offer. And so, both for myself, but actually for everybody involved, being able to kind of find all those different elements and bring them together and have something really nice come out of that.

Host: Yeah.

Marc Pollefeys: So yeah, so there’s synergies. It is a lot of work, but there’s also a lot of nice synergies. Yes.

Host: I want you to talk a little bit about how collaboration happens, particularly with what you’ve got going on. Microsoft researchers collaborate with ETH Zurich and EPFL researchers in the Swiss Joint Research Center or the JRC. Tell us more about the Swiss JRC and how it’s helped you bridge the gap between your ETH Zurich role and your Microsoft role.

Marc Pollefeys: Yes, so actually the JRC is a program that’s been going on for about ten years to stimulate collaboration between Microsoft, ETH and EPFL. And I was actually involved, I think in 2008 or so, this was one of the first grants that I got from Microsoft then, so I worked on the ETH side as a PI, but then over the years, the scheme became much more collaborative with PIs on both sides, so in the second generation of these collaborative projects, I had a project with a colleague here, Otmar Hilliges, who was actually a former Microsoft researcher, and then collaborating with researchers at Microsoft Research in Cambridge and then in Redmond.

Host: Hmm.

Marc Pollefeys: So that’s more like the projects in the past. Now, currently, we have also great synergies with the JRC because having that framework in place that facilitates collaborative projects between the different schools and Microsoft, it means that we can also, in the area of mixed reality, where we want to foster more collaboration, also in the context of the lab here, this gives us a framework and a way to facilitate extra collaborations.

Host: Your primary area of research is computer vision, and you’ve specialized in 3D computer vision. So, give us an overview of your work in the field, in general, just to set the stage for our conversation today. In broad strokes, what are the big questions you’re asking and the big problems you’re trying to solve?

Marc Pollefeys: So, essentially, computer vision is about extracting information from image data, from images or video. This can be information like recognizing things in the scene, or it can also be geometric information. I’m much more focused on extracting geometric information from images. As an example, this can be, as you move with a camera through a scene, being able to recover the 3-dimensional surfaces in the scene, 3-dimensional reconstruction of the scene. This can also be recovering the motion of the camera through the scenes, like which way the camera is moving through that scene. It can also be finding back the location. It can be figuring out the calibration of the camera. It can be many different things related to this geometry. More and more, in recent years, I’ve been also combining that with, at the same time, also extracting semantic information from the scene, and so having both geometry and semantic information so that for example, when we do reconstruction, we don’t only reconstruct surfaces, but then we know that part of the surface is a table, another part of the surface is the floor, for example, and a third part of the surface could be the wall surfaces and so on. And so, by doing both at the same time, we can achieve more and get better results for the different tasks at once.

Host: Well let’s talk a little bit more deeply about the different kinds of computer vision work going on. Microsoft has a lot of research going on in computer vision and from a variety of angles. And when you and I talked before, you sort of differentiated what you’re doing with what some of the other research areas in Microsoft are doing. So, tell us how the work you’re doing is different.

Marc Pollefeys: Yeah, so there is a lot of research in computer vision. We’ll look at an image and we’ll extract semantic information from that. It will recognize those objects, or it will be able to label different parts of the scene. This is an area that, for a long time, was struggling to get really good results.

Host: Mmmm.

Marc Pollefeys: But now with the advent of deep learning, there’s been tremendous progress in that area. When you look at what we are doing for mixed reality and also what I’m doing here in my lab, extracting this geometric information from images is not something that you can as simply tackle with these methods that are now very popular in computer vision of using deep learning. A lot of the kind of geometric computations are ill-suited to be, you know, just easily tackled with deep learning, convolutional neural networks or that type of approach.

Host: Yeah.

Marc Pollefeys: The classical methods that leverage our strong understanding of geometry are still, you know, very strongly needed to get good quality results. So, for example, for a device like HoloLens, where it is critically important to be able to know exactly how the device moves through the environment, because it’s actually what’s needed to give the impression of, like, a hologram being static in the environment.

Host: Right.

Marc Pollefeys: So, if I move my head, I need to somehow be able to fake the impression that what I display on the screen is static in the world as opposed to being static on the screen. To do that, I need to know very, very precisely how I’m moving my head through the environment. We do that by a combination of inertial sensors together with the camera image data.

Host: Oh.

Marc Pollefeys: And so, we analyze the image data to compute how the headset is moving through the world. That’s why HoloLens has a whole set of cameras that observe the world.

Host: Right.

Marc Pollefeys: It’s really to be able to track its position all the time.

Host: That’s interesting, because you take for granted that when I move my head – like if I’m talking to you and I move my head, you’re not going to move with me.

Marc Pollefeys: That’s right.

Host: But with a device, it’s going to be a little bit of a different experience unless you guys fix the technical aspects of it. So how are you tackling that technically?

Marc Pollefeys: On the HoloLens, you know, these are techniques that combine, very much like we do with our own eyes, where we combine our visual sensing together with our inner ear which is more inertial sensing. Combining those two, we can get a very good impression of how we’re moving through space. So, we’re actually doing roughly the same for mixed reality. It’s actually also very similar to what is being used in robotics. You can call it visual inertial odometry, which is determining your own motion from visual inertial data, or, even if you go beyond that, people often call it SLAM. This stands for Simultaneous Localization and Mapping. It means that, while you are localizing yourself, your relative motion in the environment, you’re at the same time building up a map of the environment so that if you revisit it, you can actually recognize and realize that you have already seen part of the scene and correct your position, or take that into account to kind of continue to estimate the position.

Host: Yeah.

Marc Pollefeys: So, this is used in robotics, in self-driving cars, and also very much in mixed reality and also for augmented reality on phones, the same techniques are being used. So, this is a key element in HoloLens. And so, this device needs to be able to track itself through the environment. Mm hmmm.

Host: So, Marc, I’ll argue that you’re doing the science behind what our future life with computers will look like, and a lot of it has to do with how we experience reality. And you’ve alluded to that just recently here. Currently, I’ll say we have “real-life reality.” I put that in air quotes, because that in itself is, you know, arguable. And also, what most people refer to as “virtual reality.” But we’re increasingly headed toward a world of augmented and mixed reality where the lines are a bit more blurred. So, tell us about your vision for this world. And I don’t want you to be sensational about it. I mean, for real, we’re heading into a different kind of paradigm here. How does it work and why do we need it even?

Marc Pollefeys: OK, so, if you look at computers and mobile phones, for example, there’s a lot of information that relates to the real world that’s available. The first generation of computing, essentially, was, you know, it’s a computer on your desk. And you might use mapping or other tools to predetermine the route to a particular place of interest. But then you still have to take that with you. That’s what changed completely once we went to mobile phones, which essentially is a mobile computer in your pocket, so you always have it with you. That computer actually knows about its approximate position in space. And so, a lot of things became possible like for navigation, for example, or also applications like Uber or other ride-sharing services and so on, because you have now information that is spatially, kind of, with you in the real world. The next generation, going beyond that, and where we go with mixed reality, is really about having information not only, you know, broadly contextually placed in space, but now actually going to very precise positioning in space. So, meaning in that general, you can expect, not only to know this is roughly where you have to go or look at your phone and getting instructions, but you can imagine now really to mix the digital information that you currently see on the screen of your phone and then the real-world information that you see in front of you, you can expect those to essentially be merged, to be mixed together in one reality, that’s the reality in front of you. And so the information that you need to do the task, which could be a navigation task, or it could be a, if you’re a technician, a complicated task to repair a machine needing to carefully know which part you have to take out first and which, you know, button you have to press, et cetera…

Host: Mm-hmm.

Marc Pollefeys: All those things can now be communicated to you, or, you know, in the future as we’re going already now in a number of contexts with mixed reality, it can be communicated to you in a very intuitive way. Just by overlaying the instructions on top of the world, visually. So, you can just see things in front of you, press this button, and it just shows you, with an arrow like the button, or it shows you, you know, an example, of exactly what to do. So, it makes it a lot simpler to process the information and be able to tackle more complicated tasks in that sense.

(music plays)

Host: Let’s talk about Microsoft’s HoloLens. Many people think of it as merely another device or maybe a pair of virtual reality glasses, but it’s actually, as you start to describe it, the world’s first self-contained holographic computer. Give us a brief history of HoloLens to clear up any misperceptions we might have, and then tell us where and how HoloLens is primarily being used today. And then we’ll start talking about some of the specific work you’re doing.

Marc Pollefeys: So, if you look at the first-generation HoloLens, it was developed as a development kit, essentially, for Microsoft to learn about where this could be used. The initial concept was already going all the way to the long-term vision, and so you can see that, on the first-generation HoloLens, it wasn’t clear if this was going to be something for consumers or something for industry or where, exactly, it would be applied. And it was put out there as, you know, in a sense this magical device of the future, to see where it could be useful to learn what type of tools and what type of solutions could be implemented on it. So, we learned a lot. I joined, you know, between HoloLens 1 and HoloLens 2, I joined the team. And it became clear, very quickly, that we still have a long ways to go in terms of form factor, in terms of a number of aspects, to get to something that makes sense to use in your daily life, all the time, you know, as you currently use your cell phone. The device is still too bulky, it’s too expensive for consumers, et cetera. So, it’s too early for that. However, it’s not at all too early to use a device like HoloLens 1 or now, obviously, HoloLens 2, in settings where you have task workers, people that have to do often complicated tasks in the real world, repair machines, or it can also be a surgeon, for example, who’s also a first-line worker, a person that’s out in the real world having to do a complicated operation, and essentially needs access to information in as seamless as possible way to help him do that task. That’s where HoloLens turned out to be incredibly valuable because it’s a full computer that you wear on your head. It allows you to place as much information as you want around you. You can, in very natural ways, interface with it by speaking to it, by doing gestures, so you can interact with the device without even having to touch anything. You can use your hands to actually do the task you’re supposed to do in the real world and still get access to all the information you need to help you do that. This magic of HoloLens is that you have this full computer, but you can still use your hands to do the task that you have to do.

Host: HoloLens has some really interesting applications, but right now, it’s mainly an enterprise tool and it’s particularly useful for researchers. So, tell us about this thing called Research Mode in HoloLens. What is it and why is it such a powerful tool, especially for computer vision researchers?

Marc Pollefeys: So, HoloLens has all of these sensors built in. So, you have this device on your head that has sensors that look at the world from the same viewpoint as you are looking at the world. We have four cameras tracking the environment. We have this depth camera, which can be used in two different modes, one for tracking the hands and then a second mode which is a smaller field of view, but a more powerful signal out to sense further away, which we use for doing reconstruction of the 3D environment. So, essentially, we have all these different imaging sensors moving with you through the world. So, this is perfect for all types of computer vision research. Potentially also for robotics research or other applications. You now have this device that can collect data in real time. You can either process it on device or over Wi-Fi, send it to a more-beefy computer outside to do more expensive computations if you want to, or you can just store the data to do experiments. But you can collect this very rich data from all these different sensors to do all types of computer vision experiments. In particular, if you’re thinking of doing things like trying to understand what the user is doing from a first-person point of view, you can actually use these different sensor streams to then develop computer vision algorithms to understand what a person is doing.

Host: Hmmm. You just mentioned a more-beefy computer, which raises the question what kind of a computer or a processing unit, shall we say… I’ve heard you refer to it as an HPU or a holographic processing unit… How big is the processing unit? What are we talking about here?

Marc Pollefeys: Well, it’s a small coprocessor for HoloLens. So, you can very much compare it to a state-of-the-art cell phone in terms of the general-purpose computing, but then on top of that, because it needs to continuously run all of these computer vision tasks to be able to track itself in the environment, to be able to track the hands, to be able, in HoloLens 2, to also track your eyes to know where you’re looking or to recognize you based on iris. So, all of these different tasks, most of them need to run all the time.

Host: Yeah.

Marc Pollefeys: This means that this is for hours at a time, it runs. If you look at the cell phone, and you take your cell phone and you run one of those fancy augmented reality apps for example, you will notice that after a few minutes, your phone is running extremely hot. Because it’s consuming a lot of power to do all those computations.

Host: Yeah.

Marc Pollefeys: And your battery is draining very quickly. We cannot afford that on HoloLens. So, if you just have this general-purpose processor, you could run HoloLens for 10 minutes and your battery would be empty.

Host: And your head would be hot.

Marc Pollefeys: Exactly. So, this is exactly why Microsoft had to develop its own ASIC, so its own chip, which is the HPU, which is a chip that’s dedicated to do all of these computer vision tasks and other signal processing tasks very efficiently at very low power and can sustain that all along. If you look at HoloLens, the whole system is, you know, below 10 watts of power consumption. If you actually look carefully, the whole design of HoloLens is really done around being able to consume as little power as possible and be able to stay passively cooled so that it doesn’t heat up your forehead and so on.

Host: Okay, so how are you doing that? Is it algorithms that help you out? I mean, what is the technical approach to solving those problems?

Marc Pollefeys: Well, you have to be thinking, in every algorithm, in everything you do, you really have to be very careful and thinking, right from the beginning, how much power it’s going to consume. It’s a lot of engineering effort to get to a system that consumes that little power and that amount of computer vision processing all the time.

Host: Mmm.

Marc Pollefeys: It means that you can put some very efficient processing units that are well-suited to do all this image processing operations. It means that some things that need to happen hide the latency in the rendering. All of these tasks are hardware-accelerated with some dedicated hardware in the HPU to make them run very efficiently. And it also means that you have to be smart how you use the algorithms and code things very efficiently. When you don’t need all the sensors all the time, you don’t use them all the time. It means that you really try to, at every point, everywhere you can, you try to save energy and just do what you need to do but not more.

Host: Sounds like a teenager… Um… One of the coolest things you’re working on is something you call spatial anchors. Talk about the science behind spatial anchors. How do they work? What are they, actually, and what are some compelling virtual content in the real-world use cases for spatial anchors?

Marc Pollefeys: So, spatial anchors are a type of visual anchoring in the real world. So, as you move your device to a particular location, and this can be both the HoloLens, or it can be a cell phone that runs one of the ARKit or ARCore applications, you are essentially always generating a little map of the environment. And when you want to place information in the world, you will attach it to this local little map of the environment. Currently, with HoloLens, you can place holograms in the world, and on HoloLens itself, it will be attached to a little map so that HoloLens is continuously building a map. And so, you can place it there, then it knows in the map where that hologram is placed. And then with HoloLens, you can see again that hologram when you walk by the same place. Now what Azure Spatial Anchors is doing is allowing you to extract a local little map and share that with others in the cloud so that they can also access your hologram if you want to share it with them. So that means that I can, for example, put on HoloLens and place a hologram somewhere in the world, and then you could come by with your mobile phone and use ARKit or ARCore to find back this hologram and see it at the same place where I placed it in the world.

Host: Mmm.

Marc Pollefeys: That means that now you can start thinking of applications where, for example, I can put virtual breadcrumbs in the world, and allow you to navigate, so these are for more consumer-ended applications. But if you look at applications like indoor navigation or, you know, if you think of applications in an enterprise where there’s all types of machinery, there’s all types of sensors, there’s things that we call digital twins. This means that you have, for a real machine in the real world, you also have somewhere a digital representation of it in your servers in the cloud. That information that is available in the cloud, you would like to be able to align it also in the real world, so that if you walk around with your holographic computer with your HoloLens on, you can actually see, on top of the real world, on top of the real machine, you can actually also access, in context, all of the information that relates to it. Now to do that, you need to know where you are and where it is in the world and so that’s where technology like Azure Spatial Anchors can essentially allow you to recover that. Now we are currently just at the beginning of this technology. There’s a lot of things we still have to work out. But the basics are there, and you can see online a lot of people are giving it a try and are having fun with it.

Host: Can you remove them?

Marc Pollefeys: Of course, you can remove them. You can move them around. And everybody can have their own view of the world, meaning a service technician might want to see very different things than, you know, like a random person walking through a mall. This all becomes possible and different people would have, you know, different filters, in a sense, on all this virtual information. And you would have different levels of access and privacy, and all these different things would come into play to let you see the things that are relevant to you.

Host: So, Microsoft Kinect. Let’s talk about that for a second, because it has an interesting past. Some have characterized it as a failure that turned out to be a huge success. Again, I think a short history might be in order, and I’m not sure that you’re the historian of Kinect, but I know you’ve had a lot of connection with the people who are connected to Kinect. Let’s say that.

Marc Pollefeys: Yes.

Host: Tell us how Kinect has impacted the work you’re doing today and how advances in cloud computing, cameras, microphone arrays, deep neural nets, et cetera, have facilitated that.

Marc Pollefeys: So yeah, so if you look back, Kinect was introduced as a device for gaming, for the Xbox. It had initially a great success there, but it was something more for the casual gamer, than for the hardcore gamer. But, at the same time, there also a little bit like with research mode, Kinect got opened up and, and people could access this 3D sensing data that Kinect was producing. And this was something that created quite a revolution in robotics and in computer vision where people suddenly had access to a very cheap, standardized, very powerful 3D camera. And so, this stimulated all types of research, both at Microsoft Research and everywhere, in every vision and robotics lab around the world, people had Kinect. What’s interesting is to see that then a lot of this research that was developed on that for example, one of those is Kinect Fusion, which consisted of taking single Kinect depth images. But as you move through the environment, having many of those images, you know, align to each other and create a more global 3D reconstruction of the environment. So, people developed all kinds of very interesting techniques. Many of those came back and enabled Microsoft to develop much more efficiently and already have an idea of what was possible with the camera that was going to be integrated in HoloLens, what could all be put on it in terms of algorithms and what was possible and so on, because they could just see what happened with Kinect in the research world.

Host: Hmm.

Marc Pollefeys: This was actually one of the big reasons also there why I pushed for having a research mode, made available on HoloLens, because I think there, also, we can both provide a great tool to the research community to work with, in terms of using HoloLens as a computer vision device, but at the same time, also really leverage that and benefit from it by learning all the amazing things that are possible that we might not think of ourselves. And this more or less coincided with the time where we were ready with our second-generation sensor in HoloLens, which is an amazing sensor that was built for HoloLens 2. And so, you know, putting things together, it became clear that it made a lot of sense to reuse that sensor and put it in a third generation Kinect that now is clearly not made for gaming but is really directly targeted at intelligent cloud type of scenarios where you can have a device in the world, sensing, with best-in-class depth sensor. It can essentially do 1 megapixel, 1 million separate depth measurements at 3 frames per second and do all of that below a watt…

Host: Wow.

Marc Pollefeys: …of power consumption. So, an amazing sensor. To combine that with a color camera, and a state-of-the-art microphone array, to bring back the Kinect sensor package, but in a much, much higher-quality setting now.

Host: Yeah.

Marc Pollefeys: …and that’s what Azure Kinect is.

(music plays)

Host: It seems like we’re heading for a huge paradigm shift, moving from a computer screen world to a holographic world. I mean, maybe that’s an oversimplification, but I’m going to go there. And people have suggested that holograms have what it takes to be the next internet, the next big technical paradigm shift going beyond individual devices and manufacturers. So, what does the world look like if you’re wildly successful?

Marc Pollefeys: Well essentially it means that people will wear a device like HoloLens or, actually, think more of a device like normal glasses. Maybe a little bit beefy glasses, but much closer to today’s glasses than to today’s HoloLens, that will enable them to see information in a natural way on top of the world. So, instead of carrying a small device with you, or having a computer screen in front of you, the computer or the device will not anymore be a physical thing that you look at. It will be something that can place information anywhere in the world. And so, you can have screens that move with you. You can choose how big or how many screens you want to place around you as you work. The difference with now, having to take out your phone and, if you want to see one of those holograms that we would share, you have to actively look for it. If you have these glasses on, you will just be walking around, and if there’s something relevant for you, it will just appear in front of your eyes. The information will just kind of always be there in context, just what you need, ideally. So hopefully we can have some good AI help with that and moderate and just pick up the right things…

Host: Right.

Marc Pollefeys: …and show you what’s helpful to you. It will be very different from where are now.

Host: Well, that leads right into the time on the podcast where I ask the “what could possibly go wrong” question. I actually get kind of excited about what I could do with a holographic computer on my head. But to be honest, I’m not sure I’m as excited about what other people could do, and I’ll call them “bad actors” or “advertisers.” Given the future that you’re literally inventing – or you’re working on inventing – does anything keep you up at night about that?

Marc Pollefeys: I certainly care a lot about the impact on privacy for this. So, I think there’s challenges but there’s also opportunities. I think it will be very important, as we will more and more have sensors continuously move and sense the environment, be it on a HoloLens or on your car, be it self-driving or just with driver assistance systems, be it any other systems, or robots that will roam around maybe your living room, all of those will have a lot of sensors, so even more than your cell phone in your pocket now, you will have sensors, including cameras and so on, all the time sensing.

Host: Mm-hmm.

Marc Pollefeys: And so it’s really important to figure out how to build systems that tackle the problems that we want to tackle, like being able to know where you are in the world so that you can see the holograms and the things that are relevant to you. But at the same time, not expose that in a way that harms your privacy. If you have a map of the environments around you, be able to maybe represent those in a way that allows you to do re-localization, to be able to retrieve your holograms where they should be, but doesn’t allow others to look at the inside of your house or inside of spaces that they’re not supposed to look into. So, these are actually active research topics that we are working on today also at the same time as we are pushing the technology forward, umm…

Host: Right. Well, and two other things pop into my head, like advertisers and spatial anchor spam. And transparency concerns. Like, if I’m wearing glasses that look pretty natural, how do I signal – or, how do we signal – to other people, hey, we have a lot of information about you that you might not be aware of?

Marc Pollefeys: I think you’re exactly right. So, those are all really important issues, and it’s really important to think about that. As an example, the first-generation HoloLens was designed in a way that all of those sensors that are needed to run continuously to just be able to operate HoloLens, that all this data would not be accessible to applications but just be accessible to the operating system. And be isolated in the HPU actually, and not be exposed on the general-purpose processor where the applications live as a way to ensure privacy by design in the hardware there.

Host: Right.

Marc Pollefeys: So, it’s clear that these types of things are very important. It’s often a tradeoff – or at least let’s say the easy solution is a tradeoff – between privacy and functionality.

Host: Yeah.

Marc Pollefeys: But I think that’s where we have to be smart and where we have to start already doing research in that space to work towards a future that we can live in.

Host: Marc, talk a little bit about yourself, your background and your journey. What got you started in computer science, and how did you end up doing computer vision research, and what was your path to Microsoft, not to mention all your other destinations?

Marc Pollefeys: Well, you know, so, I come from Belgium. Long ago, when I was like 12 years old or so, I wanted to get a game computer, and my father suggested it was maybe a better idea to get a computer on which I could program a bit.

Host: That’s just like a dad.

Marc Pollefeys: Yeah! And so, you know, the result was that I kind of thought it was pretty cool to program and all. And then, I actually didn’t study computer science eventually. I thought I was going to study computer science, but I ended up studying electrical engineering, which is close enough. And one of the last exams of my studies, I had a computer vision exam, and the professor there asked if I was interested in maybe doing a PhD with him. And so that’s how I got started in computer vision, and in particular, I picked a topic that, you know, was really about 3D reconstruction from images and had a lot of geometry in there. And from Belgium, from Leuven, University of Leuven, I then moved to the University of North Carolina in Chapel Hill, where I had a lot of colleagues doing computer graphics. Computer vision was very complementary to that. After a number of years, I got the opportunity to move back to ETH Zurich, which is, you know, really one of the top schools worldwide, so I decided to move to Switzerland and then, in 2015, I guess, I got approached by Alex Kipman and the mixed reality team, the HoloLens team, at Microsoft and I hesitated for a while, but then I realized that there was really an opportunity to have a big impact. You know, at some point, even, I had a conversation with Satya that kind of helped convince me to come over and you know, help realize the vision of mixed reality.

Host: I hear he’s very persuasive.

Marc Pollefeys: Uh, he is! And I was really impressed. We were very aligned on the vision and on where we could go with this technology and what we could do, and so this was actually a very good conversation.

Host: So, you packed up and moved to Redmond…

Marc Pollefeys: That’s right.

Host: …and now you’re back in Zurich.

Marc Pollefeys: That’s right. Actually, before I decided to join, I told Alex, I said, I’m going to come for two years and you have to be okay with that, otherwise I’m not coming at all. He convinced me that he was okay with it, and so, you know, the end result is now that eventually I didn’t really want to fully leave Microsoft. I wanted actually to both continue having impact on Microsoft but also felt that I could really do something in between that, you know, and have in some way, the best of both worlds. In a sense, it’s two “half jobs” turns out to be more than one job…

Host: Yeah.

Marc Pollefeys: …but I’m really excited about the opportunity I got to do this.

Host: Well maybe we can have a Marc Pollefeys “spatial anchor” here in Redmond and work with you there… As we close, what advice or wisdom or even direction would you give to any of our listeners who might be interested in the work you’re doing? Where would you advise people to dig in, research-wise, in this world of the holographic computer?

Marc Pollefeys: I think at this point, I really care about figuring out how to get all of this amazing sensing technology out in the world, but at the same time, make sure that we have systems that preserve privacy. Figuring out how to marry those things I think is really exciting, and so that’s one of the areas I’m really working on and I hope a lot of people are going to work on that.

Host: Marc Pollefeys, great to have you with us from Zurich today. Thanks for coming on the podcast.

Marc Pollefeys: Yeah, thank you for having me. It was great!

(music plays)

To learn more about Dr. Marc Pollefeys and Microsoft’s vision for the future of computer vision, visit Microsoft.com/research

Up Next

mountains at ECCV

Computer vision, Data platforms and analytics

Computer Vision at Microsoft: Uniting fundamental research and industry-defining products

Microsoft is very proud to be a diamond sponsor of ECCV 2018 and we’re in Munich, Germany from September 8-14 with the global computer vision community to share our research and to learn from our fellow contributors. At Microsoft, in parallel with fundamental research, we build products. Our software products, like Visual Studio, PowerPoint and […]

Andrew Fitzgibbon

Partner Scientist, HoloLens

Artificial intelligence, Computer vision, Graphics and multimedia

Microsoft HoloLens facilitates computer vision research by providing access to raw image sensor streams with Research Mode

Microsoft HoloLens is the world’s first self-contained holographic computer. Remarkably, in Research Mode, available in the newest release of Windows 10 for HoloLens, it’s also a potent computer vision research device. Application code can not only access video and audio streams but can also at the same time leverage the results of built-in computer vision […]

Marc Pollefeys

Partner Director of Science

Artificial intelligence, Computer vision, Graphics and multimedia

Second version of HoloLens HPU will incorporate AI coprocessor for implementing DNNs

By Marc Pollefeys, Director of Science, HoloLens It is not an exaggeration to say that deep learning has taken the world of computer vision, and many other recognition tasks, by storm. Many of the most difficult recognition problems have seen gains over the past few years that are astonishing. Although we have seen large improvements […]

Microsoft blog editor