Microsoft Research Podcast

Microsoft Research Podcast

An ongoing series of conversations bringing you right up to the cutting edge of Microsoft Research.

Project Triton and the physics of sound with Dr. Nikunj Raghuvanshi

March 20, 2019 | By Microsoft blog editor

Episode 68, March 20, 2019

If you’ve ever played video games, you know that for the most part, they look a lot better than they sound. That’s largely due to the fact that audible sound waves are much longer – and a lot more crafty – than visual light waves, and therefore, much more difficult to replicate in simulated environments. But Dr. Nikunj Raghuvanshi, a Senior Researcher in the Interactive Media Group at Microsoft Research, is working to change that by bringing the quality of game audio up to speed with the quality of game video. He wants you to hear how sound really travels – in rooms, around corners, behind walls, out doors – and he’s using computational physics to do it.

Today, Dr. Raghuvanshi talks about the unique challenges of simulating realistic sound on a budget (both money and CPU), explains how classic ideas in concert hall acoustics need a fresh take for complex games like Gears of War, reveals the computational secret sauce you need to deliver the right sound at the right time, and tells us about Project Triton, an acoustic system that models how real sound waves behave in 3-D game environments to makes us believe with our ears as well as our eyes.


Final Transcript

Nikunj Raghuvanshi: In a game scene, you will have multiple rooms, you’ll have caves, you’ll have courtyards, you’ll have all sorts of complex geometry and then people love to blow off roofs and poke holes into geometry all over the place. And within that, now sound is streaming all around the space and it’s making its way around geometry. And the question becomes how do you compute even the direct sound? Even the initial sound’s loudness and direction, which are important? How do you find those? Quickly? Because you are on the clock and you have like 60, 100 sources moving around, and you have to compute all of that very quickly.

Host: You’re listening to the Microsoft Research Podcast, a show that brings you closer to the cutting-edge of technology research and the scientists behind it. I’m your host, Gretchen Huizinga.

Host: If you’ve ever played video games, you know that for the most part, they look a lot better than they sound. That’s largely due to the fact that audible sound waves are much longer – and a lot more crafty – than visual light waves, and therefore, much more difficult to replicate in simulated environments. But Dr. Nikunj Raghuvanshi, a Senior Researcher in the Interactive Media Group at Microsoft Research, is working to change that by bringing the quality of game audio up to speed with the quality of game video. He wants you to hear how sound really travels – in rooms, around corners, behind walls, out doors – and he’s using computational physics to do it.

Today, Dr. Raghuvanshi talks about the unique challenges of simulating realistic sound on a budget (both money and CPU), explains how classic ideas in concert hall acoustics need a fresh take for complex games like Gears of War, reveals the computational secret sauce you need to deliver the right sound at the right time, and tells us about Project Triton, an acoustic system that models how real sound waves behave in 3-D game environments to makes us believe with our ears as well as our eyes. That and much more on this episode of the Microsoft Research Podcast.

Host: Nikunj Raghuvanshi, welcome to the podcast.

Nikunj Raghuvanshi: I’m glad to be here!

Host: You are a senior researcher in MSR’s Interactive Media Group, and you situate your research at the intersection of computational acoustics and graphics. Specifically, you call it “fast computational physics for interactive audio/visual applications.”

Nikunj Raghuvanshi: Yep, that’s a mouthful, right?

Host: It is a mouthful. So, unpack that! How would you describe what you do and why you do it? What gets you up in the morning?

Nikunj Raghuvanshi: Yeah, so my passion is physics. I really like the mixture of computers and physics. So, the way I got into this was, many, many years ago, I picked up this book on C++ and it was describing graphics and stuff. And I didn’t understand half of it, and there was a color plate in there. It took me two days to realize that those are not photographs, they were generated by a machine, and I was like, somebody took a photo of a world that doesn’t exist. So, that is what excites me. I was like, this is amazing. This is as close to magic as you can get. And then the idea was I used to build these little simulations and I was like the exciting thing is you just code up these laws of physics into a machine and you see all this behavior emerge out of it. And you didn’t tell the world to do this or that. It’s just basic Newtonian physics. So, that is computational physics. And when you try to do this for games, the challenge is you have to be super-fast. You have 1/60th of a second to render the next frame to produce the next buffer of audio. Right? So, that’s the fast portion. How do you take all these laws and compute the results fast enough that it can happen at 1/60th of a second, repeatedly? So, that’s where the computer science enters the physics part of it. So, that’s the sort of mixture of things where I like to work in.

Host: You’ve said that light and sound, or video and audio, work together to make gaming, augmented reality, virtual reality, believable. Why are the visual components so much more advanced than the audio? Is it because the audio is the poor relation in this equation, or is it that much harder to do?

Nikunj Raghuvanshi: It is kind of both. Humans are visual dominant creatures, right? Because visuals are what is on our conscious mind and when you describe the world, our language is so visual, right? Even for sound, sometimes we use visual metaphors to describe things. So, that is part of it. And part of it is also that for sound, the physics is in many ways tougher because you have much longer wavelengths and you need to model wave diffraction, wave scattering and all these things to produce a believable simulation. And so, that is the physical aspect of it. And also, there’s a perceptual aspect. Our brain has evolved in a world where both audio/visual cues exist, and our brain is very clever. It goes for the physical aspects of both that give us separate information, unique information. So, visuals give you line-of-sight, high resolution, right? But audio is lower resolution directionally, but it goes around corners. It goes around rooms. That’s why if you put on your headphones and just listen to music at the loud volume, you are a danger to everybody on the street because you have no awareness.

Host: Right.

Nikunj Raghuvanshi: So, audio is the awareness part of it.

Host: That is fascinating because you’re right. What you can see is what is in front of you, but you could hear things that aren’t in front of you.

Nikunj Raghuvanshi: Yeah.

Host: You can’t see behind you, but you can hear behind you.

Nikunj Raghuvanshi: Absolutely, you can hear behind yourself and you can hear around stuff, around corners. You can hear stuff you don’t see, and that’s important for anticipating stuff.

Host: Right.

Nikunj Raghuvanshi: People coming towards you and things like that.

Host: So, there’s all kinds of people here that are working on 3D sound and head-related transfer functions and all that.

Nikunj Raghuvanshi: Yeah, Ivan’s group.

Host: Yeah! How is your work interacting with that?

Nikunj Raghuvanshi: So, that work is about, if I tell you the spatial sound field around your head, how does it translate into a personal experience in your two ears? So, the HRTF modeling is about that aspect. My work with John Snyder is about, how does the sound propagate in the world, right?

Host: Interesting.

Nikunj Raghuvanshi: So, if there is a sound down a hallway, what happens during the time it gets from there up to your head? That’s our work.

Host: I want you to give us a snapshot of the current state-of-the-art in computational acoustics and there’s apparently two main approaches in the field. What are they, and what’s the case for each and where do you land in this spectrum?

Nikunj Raghuvanshi: So, there’s a lot of work in room acoustics where people are thinking about, okay, what makes a concert hall sound great? Can you simulate a concert hall before you build it, so you know how it’s going to sound? And, based on the constraints on those areas, people have used a lot of ray tracing approaches which borrow on a lot of literature in graphics. And for graphics, ray tracing is the main technique, and it works really well, because the idea is you’re using a short wavelength approximation. So, light wavelengths are submicron and if they hit something, they get blocked. But the analogy I like to use is sound is very different, the wavelengths are much bigger. So, you can hold your thumb out in front of you and blot out the sun, but you are going to have a hard time blocking out the sound of thunder with a thumb held out in front of your ear because the waves will just wrap around. And, that’s what motivates our approach which is to actually go back to the physical laws and say, instead of doing the short wave length approximation for sound, we revisit and say, maybe sounds needs the more fundamental wave equation to be solved, to actually model these diffraction effects for us. The usual thinking is that, you know, in games, you are thinking about we want a certain set of perceptual cues. We want walls to occlude sound, we want a small room to reverberate less. We want a large hall to reverberate more. And the thought is, why are we solving this expensive partial differential equation again? Can’t we just find some shortcut to jump straight to the answer instead of going through this long-winded route of physics? And our answer has been that you really have to do all the hard work because there’s a ton of information that’s folded in and what seems easy to us as humans isn’t quite so easy for a computer and and there’s no neat trick to get you straight to the perceptual answer you care about.

(music plays)

Host: Much of the work in audio and acoustic research is focused on indoor sound where the sound source is within the line of sight and the audience and the listener can see what they were listening to…

Nikunj Raghuvanshi: Um-hum.

Host: …and you mentioned that the concert hall has a rich literature in this field. So, what’s the gap in the literature when we move from the concert hall to the computer, specifically in virtual environments?

Nikunj Raghuvanshi: Yeah, so games and virtual reality, the key demand they have is the scene is not one room, and with time it has become much more difficult. So, a concert hall is terrible if you can’t see the people who are playing the sound, right? So, it allows for a certain set of assumptions that work extremely nicely. The direct sound, which is the initial sound, which is perceptually very critical, just goes in a straight line from source to listener. You know the distance so you can just use a simple formula and you know exactly how loud the initial sound is at the person. But in a game scene, you will have multiple rooms, you’ll have caves, you’ll have courtyards, you’ll have all sorts of complex geometry and then people love to blow off roofs and poke holes into geometry all over the place. And within that, now sound is streaming all around the space and it’s making its way around geometry. And the question becomes, how do you compute even the direct sound? Even the initial sound’s loudness and direction, which are important? How do you find those? Quickly? Because you are on the clock and you have like 60, 100 sources moving around, and you have to compute all of that very quickly. So, that’s the challenge.

Host: All right. So, let’s talk about how you’re addressing it. A recent paper that you’ve published made some waves, sound waves probably. No pun intended… It’s called Parametric Directional Coding for Pre-computed Sound Propagation. Another mouthful. But it’s a great paper and the technology is so cool. Talk about this… research this that you’re doing.

Nikunj Raghuvanshi: Yeah. So, our main idea is, actually, to look at the literature in lighting again and see the kind of path they’d followed to kind of deliver this computational challenge of how you do these extensive simulations and still hit that stringent CPU budget in real time. And one of the key ideas is you precompute. You cheat. You just look at the scene and just compute everything you need to compute beforehand, right? Instead of trying to do it on the fly during the game. So, it does introduce the limitation that the scene has to be static. But then you can do these very nice physical computations and you can ensure that the whole thing is robust, it is accurate, it doesn’t suffer from all the sort of corner cases that approximations tend to suffer from, and you have your result. You basically have a giant look-up table. If somebody tells you that the source is over there and the listener is over here, tell me what the loudness of the sound would be. We just say okay, we this a giant table, we’ll just go look it up for you. And that is the main way we bring the CPU usage into control. But it generates a knock-off challenge that now we have this huge table, there’s this huge amount of data that we’ve stored and it’s 6-dimensional. The source can move in 3-dimensions and the listener can move in 3-dimensions. So, we have the giant table which is terabytes or even more on data.

Host: Yeah.

Nikunj Raghuvanshi: And the game’s typical budget is like 100 megabytes. So, the key challenge we’re facing is, how do we fit everything in that? How do we take this data and extract out something salient that people listen to and use that? So, you start with full computation, you start as close to nature as possible and then we’re saying okay, now what would a person hear out of this? Right? Now, let’s do that activity of, instead of doing a shortcut, now let’s think about okay, a person hears the directional sound comes from. If there is a doorway, the sound should come from the doorway. So, we pick out these perceptual parameters that are salient for human perception and then we store those. That’s the crucial way you kind of bring down this enormous data set and do a sort of memory budget that’s feasible.

Host: So, that’s the paper.

Nikunj Raghuvanshi: Um-hum.

Host: And how has it played out in practice, or in project, as it were?

Nikunj Raghuvanshi: So, a little bit of history on this is, we had a paper SIGGRAPH 2010, me and John Snyder and some academic collaborators, and at that point, we were trying to think of just physical accuracy. So, we took the physical data and we were trying to stay as close to physical reality as possible and we were rendering that. And around 2012, we got to talking with Gears of War, the studio, and we were going through what the budgets will be, how things would be. And we were like we need… this needs to… this is gigabytes, it needs to go to megabytes…

Host: Really?

Nikunj Raghuvanshi: …very quickly. And that’s when we were like, okay, let’s simplify. Like, what’s the four like most basic things that you really want from an acoustic system? Like walls should occlude sound and thing like that. So, we kind of re-winded and came to it from this perceptual viewpoint that I was just describing. Let’s keep only what’s necessary. And that’s how we were able to ship this in 2016 in Gears of War 4 by just re-winding and doing this process.

Host: How is that playing in to, you know… Project Triton is the big project that we’re talking about. How would you describe what that’s about and where it’s going? Is it everything you’ve just described or is there… other aspects to it?

Nikunj Raghuvanshi: Yeah. Project Triton is this idea that you should precompute the wave physics, instead of starting with approximations. Approximate later. That’s one idea of Project Triton. And the second is, if you want to make it feasible for real games and real virtual reality and augmented reality, switch to perceptual parameters. Extract that out of this physical simulation and then you have something feasible. And the path we are on now, which brings me back to the recent paper you mentioned…

Host: Right.

Nikunj Raghuvanshi: …is, in Gears of War, we shipped some set of parameters. We were like, these make a big difference. But one thing we lacked was if the sound is, say, in a different room and you are separated by a doorway, you would hear the right loudness of the sound, but its direction would be wrong. Its direction would be straight through the wall, going from source to listener.

Host: Interesting.

Nikunj Raghuvanshi: And that’s an important spatial cue. It helps you orient yourself when sounds funnel through doorways.

Host: Right.

Nikunj Raghuvanshi: Right? And it’s a cue that sound designers really look for and try to hand-tune to get good ambiances going. So, in the recent 2018 paper, that’s what we fixed. We call this portaling. It’s a made-up word for this effect of sounds going around doorways, but that’s what we’re modeling now.

Host: Is this new stuff? I mean, people have tackled these problems for a long time.

Nikunj Raghuvanshi: Yeah.

Host: Are you people the first ones to come up with this, the portaling and…?

Nikunj Raghuvanshi: I mean, the basic ideas have been around. People know that, perceptually, this is important, and there are approaches to try to tackle this, but I’d say, because we’re using wave physics, this problem becomes much easier because you just have the waves diffract around the edge. With ray tracing you face the difficult problem that you have to trace out the rays “intelligently” somehow to hit an edge, which is like hitting a bullseye, right?

Host: Right.

Nikunj Raghuvanshi: So, the ray can wrap around the edge. So, it becomes really difficult. Most practical ray tracing systems don’t try to deal with this edge diffraction effect because of that. Although there are academic approaches to it, in practice it becomes difficult. But as I worked on this over the years, I’ve kind of realized, these are the real advantages of this. Disadvantages are pretty clear: it’s slow, right? So, you have to precompute. But we’re realizing, over time, that going to physics has these advantages.

Host: Well, but the precompute part is innovative in terms of a thought process on how you would accomplish the speed-up…

Nikunj Raghuvanshi: There have been papers on precomputed acoustics, academically before, but this realization that mixing precomputation and extracting these perceptual parameters? That is a recipe that makes a lot of practical sense. Because a third thing that I haven’t mentioned yet is going to the perceptual domain, now the sound designer can make sense of the numbers coming out of this whole system. Because it’s loudness. It’s reverberation time, how long the sound is reverberating. And these numbers that are super-intuitive to sound designers, they already deal with them. So, now what you are telling them is, hey, you used to start with a blank world, which had nothing, right? Like the world before the act of creation, there’s nothing. It’s just empty space and you are trying to make things reverberate this way or that, now you don’t need to do that. Now physics will execute first ,on the actual scene with the actual materials, and then you can say I don’t like what physics did here or there, let me tweak it, let me modify what the real result is and make it meet the artistic goals I have for my game.

(music plays)

Host: We’ve talked about indoor audio modeling, but let’s talk about the outdoors for now and the computational challenges to making natural outdoor sounds, sound convincing.

Nikunj Raghuvanshi: Yeah.

Host: How have people hacked it in the past and how does your work in ambient sound propagation move us forward here?

Nikunj Raghuvanshi: Yeah, we’ve hacked it in the past! Okay. This is something we realized on Gears of War because the parameters we use were borrowed, again, from the concert hall literature and, because they’re parameters informed by concert halls, things sound like halls and rooms. Back in the days of Doom, this tech would have been great because it was all indoors and rooms, but in Gears of War, we have these open spaces and it doesn’t sound quite right. Outdoors sounds like a huge hall and you know, how do we do wind ambiances and rain that’s outdoors? And so, we came up with a solution for them at that time which we called “outdoorness.” It’s, again, an invented word.

Host: Outdoorness.

Nikunj Raghuvanshi: Outdoorness.

Host: I’m going to use that. I like it.

Nikunj Raghuvanshi: Because the idea it’s trying to convey is, it’s not a binary indoor/outdoor. When you are crossing a doorway or a threshold, you expect a smooth transition. You expect that, I’m not hearing rain inside, I’m feeling nice and dry and comfortable and now I’m walking into the rain…

Host: Yeah.

Nikunj Raghuvanshi: …and you want the smooth transition on it. So, we built a sort of custom tech to do that outdoor transition. But it got us thinking about, what’s the right way to do this? How do you produce the right sort of spatial impression of, there’s rain outside, it’s coming through a doorway, the doorway is to my left, and as you walk, it spreads all around you. You are standing in the middle of rain now and it’s all around you. So, we wanted to create that experience. So, the ambient sound propagation work was an intern project and now we finished it up with our collaborators in Cornell. And that was about, how do you model extended sound sources? So, again, going back to concert halls, usually people have dealt with point-like sources which might have a directivity pattern. But rain is like a million little drops. If you try to model each and every drop, that’s not going to get you anywhere. So, that’s what the paper is about, how to treat it as one aggregate that somebody gave us? And we produce an aggregate sort of energy distribution of that thing along with this directional characteristics and just encode that.

Host: And just encode it.

Nikunj Raghuvanshi: And just encode it.

Host: How is it working?

Nikunj Raghuvanshi: It works nice. It sounds good. To my ears it sounds great.

Host: Well you know, and you’re the picky one, I would imagine.

Nikunj Raghuvanshi: Yeah. I’m the picky one and also when you are doing iterations for a paper, you also completely lose objectivity at some point. So, you’re always looking for others to get some feedback.

Host: Here, listen to this.

Nikunj Raghuvanshi: Well, reviewers give their feedback, so, yeah.

Host: Sure. Okay. Well, kind of riffing on that, there’s another project going on that I’d love for you to talk as much as you can about called Project Acoustics and kind of the future of where we’re going with this. Talk about that.

Nikunj Raghuvanshi: That’s really exciting. So, up to now, Project Triton was an internal tech which we managed to propagate from research into actual Microsoft product, internally.

Host: Um-hum.

Nikunj Raghuvanshi: Project Acoustics is being led by Noel Cross’s team in Azure Cognition. And what they’re doing is turning it into a product that’s externally usable. So, trying to democratize this technology so it can be used by any game audio team anywhere backed by Azure compute to do the precomputation.

Host: Which is key, the Azure compute.

Nikunj Raghuvanshi: Yeah, because you know, it took us a long time, with Gears of War to figure out, okay, where is all this precompute going to happen?

Host: Right.

Nikunj Raghuvanshi: We had to figure out the whole cluster story for themselves, how to get the machines, how to procure them, and there’s a big headache of arranging compute for yourself. And so that’s, logistically, a key problem that people face when they try to think of precomputed acoustics. The run-time side, Project Acoustics, we are going to have plug-ins for all the standard game audio engines and everything. So, that makes things simpler on that side. But a key blocker in my view was always this question of, where are you going to precompute? So, now the answer is simple. You get your Azure badge account and you just send your stuff up there and it just computes.

Host: Send it to the cloud and the cloud will rain it back down on you.

Nikunj Raghuvanshi: Yes. It will send down data.

Host: Who is your audience for Project Acoustics?

Nikunj Raghuvanshi: Project Acoustics, the audience is the whole game audio industry. And our real hope is that we’ll see some uptake on it when we announce it at GDC in March, and we want people to use it, as many teams, small, big, medium, everybody, to start using this because we feel there’s a positive feedback loop that can be set up where you have these new tools available, designers realize that they have these new tools available that have shipped in Triple A games, so they do work. And for them to give us feedback. If they use these tools, we hope that they can produce new audio experiences that are distinctly different so that then they can say to their tech director, or somebody, for the next game, we need more CPU budget. Because we’ve shown you value. So, a big exercise was how to fit this within current budgets so people can produce these examples of novel possible experiences so they can argue for more. So, to increase the budget for audio and kind of bring it on par with graphics over time as you alluded to earlier.

Host: You know, if we get nothing across in this podcast, it’s like, people, pay attention to good audio. Give it its props. Because it needs it. Let’s talk briefly about some of the other applications for computational acoustics. Where else might it be awesome to have a layer of realism with audio computing?

Nikunj Raghuvanshi: One of the applications that I find very exciting is for audio rendering for people who are blind. I had the opportunity to actually show the demo of our latest system to Daniel Kish, who, if you don’t know, he’s the human echo-locator. And he uses clicks from his mouth to actually locate geometry around him and he’s always oriented. He’s an amazing person. So that was a collaboration, actually, we had with a team in the Garage. They released a game called Ear Hockey and it was a nice collaboration, like there was a good exchange of ideas over there. That’s nice because I feel that’s a whole different application where it can have a potential social positive impact. The other one that’s very interesting to me is that we lived in 2-D desktop screens for a while and now computing is moving into the physical world. That’s the sort of exciting thing about mixed reality, is moving compute out into this world. And then the acoustics of the real world being folded into the sounds of virtual objects becomes extremely important. If something virtual is right behind the wall from you, you don’t want to listen to it with full loudness. That would completely break the realism of something being situated in the real world. So, from that viewpoint, good light transport and good sound propagation are both required things for the future compute platform in the physical world. So that’s a very exciting future direction to me.

(music plays)

Host: It’s about this time in the podcast I ask all my guests the infamous “what keeps you up at night?” question. And when you and I talked before, we went down kind of two tracks here, and I felt like we could do a whole podcast on it, but sadly we can’t… But let’s talk about what keeps you up at night. Ironically to tee it up here, it deals with both getting people to use your technology…

Nikunj Raghuvanshi: Um-hum.

Host: And keeping people from using your technology.

Nikunj Raghuvanshi: No! I wanted everybody to use the technology. But I’d say like five years ago, what used to keep me up at night is like, how are we going to ship this thing in Gears of War? Now what’s keeping me up at night is how do we make Project Acoustics succeed and how do we you know expand the adoption of it and, in a small way, try to improve, move the game audio industry forward a bit and help artists do the artistic expression they need to do in games? So, that’s what I’m thinking right now, how can we move things forward in that direction? I frankly look at video games as an art form. And I’ve gamed a lot in my time. To be honest, all of it wasn’t art, I was enjoying myself a lot and I wasted some time playing games. But we all have our ways to unwind and waste time. But good games can be amazing. They can be much better than a Hollywood movie in terms of what you leave them with. And I just want to contribute in my small way to that. Giving artists the tools to maybe make the next great story, you know.

Host: All right. So, let’s do talk a little bit, though, about this idea of you make a really good game…

Nikunj Raghuvanshi: Um-hum.

Host: Suddenly, you’ve got a lot of people spending a lot of time. I won’t say wasting. But we have to address the nature of gaming, and the fact that there are you know… you’re upstream of it. You are an artist, you are a technologist, you are a scientist…

Nikunj Raghuvanshi: Um-hum.

Host: And it’s like I just want to make this cool stuff.

Nikunj Raghuvanshi: Yeah.

Host: Downstream, it’s people want people to use it a lot. So, how do you think about that and the responsibilities of a researcher in this arena?

Nikunj Raghuvanshi: Yeah. You know, this reminds me of Kurt Vonnegut’s book, Cat’s Cradle? He kind of makes – what there’s scientist who makes Ice 9 and it freezes the whole planet or something. So, you see things about video games in the news and stuff. But I frankly feel that the kind of games I’ve participated in making, these games are very social experiences. People meet on the games a lot. Like Sea of Thieves is all about, you get a bunch of friends together, you’re sitting on the couch together, and you’re just going crazy like on these pirate ships and trying to just have fun. So, they are not the sort of games where a person is being separated from society by the act of gaming and just is immersed in the screen and is just not participating in the world. They are kind of the opposite. So, games have all these aspects. And so, I personally feel pretty good about the games I’ve contributed to. I can at least say that.

Host: So, I like to hear personal stories of the researchers that come on the podcast. So, tell us a little bit about yourself. When did you know you wanted to do science for a living and how did you go about making that happen?

Nikunj Raghuvanshi: Science for a living? I was the guy in 6th grade who’d get up and say I want to be a scientist. So, that was then, but what got me really hooked was graphics, initially. Like I told you, I found the book which had these color plates and I was like, wow, that’s awesome! So, I was at UNC Chapel Hill, graphics group, and I studied graphics for my graduate studies. And then, in my second or third year, my advisor, Ming Lin, she does a lot of research in physical simulations. How do we make water look nice in physical simulations? Lots of it is CGI. How do you model that? How do you model cloth? How do you model hair? So, there’s all this physics for that. And so, I took a course with her and I was like, you know what? I want to do audio because you get a different sense, right? It’s simulation, not for visuals, but you get to hear stuff. I’m like okay, this is cool. This is different. So, I did a project with her and I published a paper on sound synthesis. So, like how rigid bodies, like objects rolling and bouncing around and sliding make sound, just from physical equations. And I found a cool technique and I was like okay, let me do acoustics with this. It’s going to be fun. And I’m going to publish another paper in a year. And here I am, still trying to crack that problem of how to do acoustics in spaces!

Host: Yeah, but what a place to be. And speaking of that, you have a really interesting story about how you ended up at Microsoft Research and brought your entire PhD code base with you.

Nikunj Raghuvanshi: Yeah. It was an interesting time. So, when I was graduating, MSR was my number one choice because I was always thinking of this technology as, it would be great if games used this one day. This is the sort of thing that would have a good application in games. And then, around that time, I got hired to MSR and it was a multicore incubation back then, my group was looking at how do these multicore systems enable all sorts of cool new things? And one of the things my hiring manager was looking at was how can we do physically based sound synthesis and propagation. So, that’s what my PhD was, so they licensed the whole code base and I built on that.

Host: You don’t see that very often.

Nikunj Raghuvanshi: Yeah, it was nice.

Host: That’s awesome. Well, Nikunj, as we close, I always like to ask guests to give some words of wisdom or advice or encouragement, however it looks to you. What would you say to the next generation of researchers who might want to make sound sound better?

Nikunj Raghuvanshi: Yeah, it’s an exciting area. It’s super-exciting right now. Because even like just to start from more technical stuff, there are so many problems to solve with acoustic propagation. I’d say we’ve taken just the first step of feasibility, maybe a second one with Project Acoustics, but we’re right at the beginning of this. And we’re thinking there are so many missing things, like outdoors is one thing that we’ve kind of fixed up a bit, but we’re going towards what sorts of effects can you model in the future? Like directional sources is one we’re looking at, but there are so many problems. I kind of think of it as the 1980s of graphics when people first figured out that you can make this work. You can make light propagation work. What are the things that you need to do to make it ever closer to reality? And we’re still at it. So, I think we’re at that phase with acoustics. We’ve just figured out this is one way that you can actually ship in practical applications and we know there are deficiencies in its realism in many, many places. So, I think of it as a very rich area that students can jump in and start contributing.

Host: Nowhere to go but up.

Nikunj Raghuvanshi: Yes. Absolutely!

Host: Nikunj Raghuvanshi, thank you for coming in and talking us today.

Nikunj Raghuvanshi: Thanks for having me.

(music plays)

To learn more about Dr. Nikunj Raghuvanshi and the science of sound simulation, visit

Up Next

Audio and Acoustics

Believe your ears – Hitting all the right notes in spatial sound rendering at ICASSP 2019

Mixed reality (MR) applications and devices are seeing increased adoption, integrating computation into the fabric of our daily lives. This requires realistic rendering of virtual audio-visual content to deliver sensory immersion to MR users. Producing renderings indistinguishable from reality within tight computational budgets is both a tantalizing and challenging goal. A key component is spatial […]

Microsoft blog editor

Ivan Tashev podcast

Audio and Acoustics

Hearing in 3D with Dr. Ivan Tashev

Episode 50, November 14, 2018 - Dr. Tashev gives us an overview of the quest for better sound processing and speech enhancement, tells us about the latest innovations in 3D audio, and explains why the research behind audio processing technology is, thanks to variations in human perception, equal parts science, art and craft.

Microsoft blog editor

Artificial intelligence, Computer vision, Graphics and multimedia

Teaching computers to see with Dr. Gang Hua

Episode 28, June 13, 2018 - Dr. Hua talks about how the latest advances in AI and machine learning are making big improvements on image recognition, video understanding and even the arts. He also explains the distributed ensemble approach to active learning, where humans and machines work together in the lab to get computer vision systems ready to see and interpret the open world.

Microsoft blog editor