Return to Podcast Home
Microsoft Research Podcast
Dr. Arul Menezes

Arul Menezes – Partner Research Manager. Photo courtesy of Maryatt Photography.

Episode 24, May 16, 2018

Humans are wired to communicate, but we don’t always understand each other. Especially when we don’t speak the same language. But Arul Menezes, the Partner Research Manager who heads MSR’s Machine Translation team, is working to remove language barriers to help people communicate better. And with the help of some innovative machine learning techniques, and the combined brainpower of machine translation, natural language and machine learning teams in Redmond and Beijing, it’s happening sooner than anyone expected.

Today, Menezes talks about how the advent of deep learning has enabled exciting advances in machine translation, including applications for people with disabilities, and gives us an inside look at the recent “human parity” milestone at Microsoft Research, where machines translated a news dataset from Chinese to English with the same accuracy and quality as a person.



Arul Menezes: The thing about research is, you never know when those breakthroughs are going to come through, you know? So, when we started this project last year, we thought it would take a couple of years, but uh, you know, we made faster progress than we expected, and then sometime last month, we were like, it looks like we’re there! We should just publish! And that’s what we did!

Host: You’re listening to the Microsoft Research Podcast, a show that brings you closer to the cutting-edge of technology research and the scientists behind it. I’m your host, Gretchen Huizinga.

Humans are wired to communicate, but we don’t always understand each other. Especially when we don’t speak the same language. But Arul Menezes, the Partner Research Manager who heads MSR’s Machine Translation team, is working to remove language barriers to help people communicate better. And with the help of some innovative machine learning techniques, and the combined brainpower of machine translation, natural language and machine learning teams in Redmond and Beijing, it’s happening sooner than anyone expected.

Today, Menezes talks about how the advent of deep learning has enabled exciting advances in machine translation, including applications for people with disabilities, and gives us an inside look at the recent “human parity” milestone at Microsoft Research, where machines translated a news dataset from Chinese to English with the same accuracy and quality as a person. That and much more on this episode of the Microsoft Research Podcast.

Host: Arul Menezes, welcome to the podcast today.

Arul Menezes: Thank you. I’m delighted to be here.

Host: So, you’re a Partner Research Manager at Microsoft Research, and you head the machine translation team, which, if I’m not wrong, falls under the umbrella of Human Language Technologies?

Arul Menezes: Yes.

Host: What gets you up in the morning? What’s the big goal of your team?

Arul Menezes: Well, translation is just a fascinating problem, right? I’ve been working in it for almost two decades now, and, uh, it never gets old, because there’s always something interesting or unusual or unique about getting the translations right. The nice thing is that we’ve been getting steadily better over the last few years. So, it’s not a solved problem, but we’re making great progress. So, it’s sort of like a perfect problem to work on.

Host: So , it’s enough to get you out of bed and keep you going, because you’re making…

Arul Menezes: Yeah, it’s not so hard that you give up, and it’s not solved yet, so it’s perfect.

Host: You don’t want to go back to bed. So, your team has just made a major breakthrough in machine translation, and we’ll get into the technical weeds about how you did it in a bit. But for now, tell us what you achieved and why it’s noteworthy.

Arul Menezes: So, the result we showed was that our latest research system is essentially at parity with professional human translators. And the way we showed that is that we got a public test set that’s generally used in the research community of Chinese-English news. We had it translated by professional translators, and we also had it translated by our latest research systems. And then we gave it to some evaluators who are bilingual speakers. And of course, it’s a blind test, so they couldn’t tell which was which. And at the end of the evaluation, our system and the humans scored essentially the same. So, you know, the result is that for the first time, really, we have a credible result that says that humans and machines are at parity for machine translation. Now, of course, keep in mind, this is a very specific domain. This is news, and it’s one language pair. So, you know, we don’t want to sort of oversell it. But it is exciting.

Host: What about the timing of it? You had had plans to do this, but did it come when you expected?

Arul Menezes: The thing about research is you never know when those breakthroughs are going to come through, you know? So, when we started this project last year, we thought it would take a couple of years, but uh, you know, we made faster progress than we expected, and then sometime last month, we were like, it looks like we’re there! We should just publish! And that’s what we did!

Host: Is this sort of like a Turing Test for machine translation? “Which one did it, a computer or a human?”

Arul Menezes: In a limited sense. We didn’t ask people to actually detect which was the human and which was the machine, because there may be little tells like, you know, maybe there’s a certain capitalization pattern or whatever. What we did was we had people just score the translation on a scale, like, just a slider, really, and tell us how good the translation was. So, it was a very simple set of instructions that the evaluators got. And the reason we do that is so that we can get very consistent results and people can understand the instructions. And so, you score the translations, sentence by sentence, and then you take the averages across the different humans and the machine. It turned out they basically were the same score.

Host: Why did you choose Chinese-English translation first?

Arul Menezes: So, we wanted to pick a publicly-used test set, because, you know, we’re trying to engage with the research community here and we wanted to have a test set that other people have worked on that we could release all of our findings and results and evaluations. There’s an annual workshop in machine translation, that’s been going on for the last ten or more years, called the WMT. And so, we use the same methodology that they use for evaluation, and we also wanted to use the same test set. And they recently added Chinese. They used to be focused more on European languages, but they added Chinese. And so, we thought that would be a good one to tackle, especially because it’s an important language pair, and um, you know, it’s hard, but not too hard, obviously. At least as it turned out.

Host: You’ve had another very impressive announcement recently, just this month even, that impacts what I can do with machine translation on my phone. And I’m all ears. What is it, and why is it different from other machine translation apps?

Arul Menezes: Yeah, so we’re super excited about that, because, you know, we’ve had a translator app for Android and Apple phones for a while. And one of the common use cases is, of course, when people are traveling. And the number one request we get from users is, “Can I do translation on my phone even though I’m not connected? Because when I’m traveling, I don’t always have a data plan. I’m not always connected with wi-fi at the point when I’m trying to communicate with someone like a taxi driver or a waiter or reception at a hotel.” And so, we’ve had for a while what we call an offline pack. You can download this pack before you travel, and then once you have that, you can do translations on your phone without being connected to the Cloud. But the thing about these packs is that they haven’t been using the latest neural net technology because neural nets are very expensive. They take a lot of computation. And no one’s been able to really run neural machine translation on a phone before. And so last year, we started working with a major phone manufacturer. And they had a phone that has a special neural chip. And we thought it would be super exciting to run neural translation offline, on the phone, using this chip. And so this month, we have been working to improve the efficiency, do a lot of careful engineering, and we managed to get it working on any phone without relying on the special hardware. So, what we released this month was that anyone who has an Android or iPhone can download these packs, and then they’ll have neural translation on their phone. So that means even if they’re not connected to the Cloud, they’re going to get really fluent translations.

Host: So, it’s the latest cutting-edge translation technology?

Arul Menezes: Right, yeah.

Host: On a regular phone.

Arul Menezes: Running right on your phone, yeah. Super exciting.

Host: I wish I had that last summer.

Arul Menezes: Me too actually, yeah. You know, it’s a very useful app when you travel.

Host: Is it unique to Microsoft Research and Microsoft in general, or…?

Arul Menezes: Yeah, as far I know nobody else has neural translation running on the phone. Now, this is only text translation. We don’t yet have the speech recognition.

Host: Are you working on that?

Arul Menezes: We are. Uh, we don’t really have a date for that yet, but it’s something that we’re interested in.

Host: I’ll postpone my next trip until you’ve got it done.

(music plays)

Host: Let’s get specific about the technology behind MSR’s research in machine translation. You told me that neural network architectures are the foundation for the AI training systems.

Arul Menezes: Right.

Host: But your team used some additional training methods to help boost your efforts to achieve human parity in Chinese-English news translation. So, let me ask you about each one in turn. And let’s start with a “round-trip” translation technique called Dual Learning. What is it? How did it help?

Arul Menezes: Right. So, one of the techniques we used to improve the quality of our research system that reached the human parity, was what we call Dual Learning. The way you train a regular machine translation system is, typically, with parallel data. So these are previously translated documents in, say, Chinese and English, that are aligned at the sentence level, and then the neural net model essentially learns to translate the sentence from Chinese into English, and that’s the signal that we use to train the models. Now, you can do the same thing in the opposite direction in English-Chinese. So, what we do with Dual Learning is now we couple those two systems, and we train them jointly. So, you use the signal from the English to Chinese translation to improve the Chinese to English, and vice versa. So, it literally is very much like what a human would do, where you might do a round-trip translation where you translate from English to Chinese, but you’re not sure if it’s good, so you translate back into English and see how it went. And if you get it consistent, you have some faith that the translation may be good. And so, this is what we did, so it’s basically a joint loss function for the two systems. And then there’s another thing you can do when you have this dual learning working, which is that, in addition to the parallel data, you can also use monolingual data. Let’s say you have Chinese text. You can send it through the Chinese to English system and then the English to Chinese system, and then compare the results. And that’s a signal you can use to train both systems.

Host: So, another technique you used is called Deliberation Networks. What is that, and how does that add to the translation accuracy?

Arul Menezes: Right. So, the other thing that our team did – and I should say that both the Dual Learning and the Deliberation Network work was actually done by our partners in Microsoft Research Asia. The effort was a joint effort of my team here in Redmond, which is the machine translation team, and the two teams in Microsoft Research Beijing, the natural language group and the machine learning team there. Both the Dual Learning and the Deliberation Network came out of the machine learning team in MSR Beijing.

Host: Cool.

Arul Menezes: The way Deliberation Networks work is essentially it’s a two-pass translation process. So you can think about it as creating a rough translation and then refining it. And, you know, a human might do the same thing, is where you essentially create sort of a first draft and then you edit it. So, the architecture of the Deliberation Network is that you have a first-pass neural network encoder-decoder that produces the first translation. Then, you have a second pass which takes both the original input in Chinese, as well as the first pass output, and it takes both of those as inputs in parallel, and then produces a translation by looking over both the original input, as well as the first pass output. It’s essentially learning, let’s say, which part of the first pass translation to copy over, say? And which parts maybe need to be changed, and the parts that it changed, it would decide to look at the original. So, the output of the second pass is our final translation. I mean, in theory, you could keep doing this, but, you know, we just do two passes, and that seems to be enough.

Host: Yeah. I was actually going to ask that. It’s like, how many passes is enough before you kind of land on…

Arul Menezes: I would imagine that after, like, two passes, you’re likely to converge.

Host: So, the third tool that we talked about is called Joint Training, or left to right, right to left consistency. Explain that and how it augments the system.

Arul Menezes: Yeah, so again, this is a work from the natural language group in MSR Beijing. They noticed that if you produce a translation output one word at a time from left to right, or you train a different system that produces the translation again, one word at a time, but from right to left, you actually get different translations. And the idea was, if you could make these two translations consistent, you might get better translation and the reason is if you think about, in many languages when you produce a sentence, there’s later parts of the sentence that need to be consistent say, grammatically or in terms of gender or number or pronoun, with something earlier in the sentence. But sometimes, you need something earlier in the sentence to be consistent with something later in the sentence, but you haven’t produced that yet, so you don’t know what to produce. Whereas if you did it right to left, you’d be able to get that right. So, by forcing the left to right system and the right to left system to be consistent with each other, we could improve the quality of the translation. And again, this is a very similar iterative process to what we’re talking about with the Dual Learning, except that instead of the consistency being Chinese to English and English to Chinese, it’s left to right and right to left.

Host: So, what was the fourth technique that you added to the mix to get this human parity in the Chinese to English translation?

Arul Menezes: Yeah, so we also did what’s called System Combination. So, we trained a number of different systems with different techniques, with variations on different techniques, with different initializations. And then we took our best six systems and did a combination. In our case, it was what’s called a sentence level combination. So, it really is just picking, of the six, which one is the best. So essentially, each of the six systems produces an end-best list, like say the ten best candidates for translation, so now you’ve got sixty translations and you rescore them and pick the best. People have done system combination at the word level before where you take part of a translation from one and part of a translation from the other. But that doesn’t work very well with neural translation because you can really destroy the fluency of the sentence by just sort of cutting and pasting some pieces from here and there.

Host: Right. Yeah, we’ve seen that done without machines. It gets butchered in translation.

(music plays)

Host: Most of us have fairly general machine translation needs, but your researchers addressed some of the needs in a very domain-specific arena, in the form of Presentation Translator. Can you tell us more about that?

Arul Menezes: Right, so, uh, Presentation Translator is this unique add-in that we have developed for PowerPoint where, when you are giving a presentation, you just click the button and you can get transcripts of your lecture displayed on screen so that people in the audience can follow along. In addition, the transcripts are made available to audience members on their own phone. So, they use our app and just enter a code, and then they can connect to the same transcription feed. And they can get it either in the language of the speaker, or in their own language. And so essentially, with this one add-in, we’re addressing two real needs. One is for people who are deaf or hard of hearing, where the transcript can help them follow along with what’s going on in the classroom or in a lecture. And also, language learners, foreign students, who can follow along in their own language if they are not that familiar with the language of the speaker. And so, we’ve had a lot of excitement about this in the educational field with both school districts, as well as colleges. And in particular, the Rochester Institute of Technology, which has – one of the colleges in the university is called the National Institute for the Deaf – and so they have a very large student body of deaf students. And so, they have been providing sign language interpretation. This gave them an opportunity to expand the coverage by providing this transcription in the classroom.

Host: So is it from text to text on the PowerPoint presentation to…

Arul Menezes: So, it’s the users speak…

Host: It is?

Arul Menezes: Yeah, so the professor is lecturing, and everything that they say is transcribed both on screen and on people’s phones.

Host: Oh my gosh.

Arul Menezes: And then because it’s on their phone, they can also save the transcript and then that becomes class notes. And the other thing that’s really cool about Presentation Translator is that it uses the content of your PowerPoint – this is why it’s connected to PowerPoint – it uses the content of your slides to customize the speech recognition system so that you can actually use the specialized terminology of the class and it will be recognized. So, you know, if someone’s teaching a biology class, it’ll recognize things like “mitochondria” or “ribosome,” which in other contexts would not be recognized.

Host: So, you told me about how you can use this with domain-specific – or business-specific – needs as well. So, tell us about that.

Arul Menezes: Right. One of the things we’re super-excited about is that we have the ability to customize our machine translation system for the domain and the terminology of specific companies. We have a lot of customers who use translation to translate their documentation, their internal communications, product listings… and the way to get really high-quality translation for all of these scenarios is to customize the translation to the terminology that’s being used by that business.

Host: Part of the challenge of machine translation is that human language can’t be reduced to ones and zeros. It’s got nuance, it’s got richness and fluidity. And so, there are detractors that start to criticize how “unsophisticated” machine translation is. But you said that they’re missing the point, sort of, of what the goal is?

Arul Menezes: Yeah.

Host: Talk about that a little bit and how should we manage our expectations around machine translation?

Arul Menezes: Yeah so, I mean, the kind of scenarios that we’re focused on with machine translation today have to do with, sort of, everyday needs that people have, whether you’re a traveler, or you maybe want to read a website, or a news article, or a newspaper. Or you’re a company where you’re communicating with customers that speak a different language, or you’re communicating between different branches of the enterprise that speak different languages. Most of the language that is being translated today is pretty prosaic. I mean, it’s not that hard… Well, it is hard, but we’ve got it to the point where we can do a pretty good job of translating that kind of text. Of course, you know, if you start getting into fiction and poetry it is very hard. And we’re nowhere, obviously with that kind. But that’s not our goal at this point.

Host: So, how would you define your goal?

Arul Menezes: I think the goal for translation today is to make the language barrier disappear for people in everyday contexts, you know, at work, when they’re traveling, so that they can communicate without a language barrier.

Host: Right. So that kind of leads into the idea that every language has its own formal grammar and semantics. And it also has local patois, as it were. And it often leads to humorous mistranslations. So how are machine learning researchers tackling that “lost-in-translation” problem so machines don’t end up making that classic video game mistranslation, “All your base are belong to us?”

Arul Menezes: There’s two things. With better techniques, we have gotten a lot better at producing fluent translations. So, we would not produce something like that today. But it is still the case that we’re very dependent on the data we have available. So, in the languages where we have sufficient data, we can do a really good job. When you get to languages where there’s not that much data, or you get to, you know, dialects or variations of language where there’s not that much data, it becomes a lot tougher. And I think this is something machine translation shares with all AI and machine learning fields, is that you know we’re very dependent on the data. There are ways to get iteratively better by continually learning based on how people use your product, right.

Host: How much are you dealing, inter-disciplinarily, with other fields? You’re computer scientists, right? And your data is language, which is human and expressive and all diff… all over the world. Who do you bring in to help you?

Arul Menezes: So, we have linguists on our team that, you know, make sure that we’re translating things correctly. So, for example, one of the linguists on our team looks for things that our automatic metrics don’t catch. So, you know, every time we produce a new version of our translation system, we have various scoring functions. The one that we use, which is a very popular metric, is called Bleu. And so that gives you a single number that says, how well is your system doing? So, you know, in principal, if the version of your system this month is, you know, a slightly better Bleu score than the version last month, you’re like great, it’s better! Ship it! But then what Lee, who’s the linguist on my team, does, is she looks at it and tries to spot things that may not be caught by that score. So, for example, how are we doing with names? How are we doing with capitalization? How are we doing with dates and times and numbers? There’s a lot of like phenomena that are very noticeable to humans that are not necessarily picked up by the automatic metric.

(music plays)

Host: Let’s talk about you for a second. How did you end up doing machine translation research at Microsoft Research?

Arul Menezes: Yeah, so I was in a PhD program in, sort of, the systems area in the computer science department at Stanford. And I spent a couple of summers up here, and at the end of my second summer, I decided I wanted to stay. And so, I did. I just never went back. And I worked on a number of products at Microsoft. But at some point, I wanted to get back into research. And so, I moved to Microsoft Research, and I started the translation project actually in about the year 2000. So, basically, the year my daughter was born, and now she’s going off to college, so…

Host: And you’ve watched her language grow over the years.

Arul Menezes: Yeah, it’s actually, when you’re studying language, listening to how kids learn language is fascinating. It’s just wonderful.

Host: There’s a spectrum here, at Microsoft of, you know, pure research to applied research, stuff that ends up in products. You seem to straddle what your work is about, being in products, but also in the research phase.

Arul Menezes: Yeah, one of the things that’s super exciting about our team is that – and it makes us somewhat unique I think – is we have everything from the basic research and translation, to the web service that serves up the APIs, you know, the cloud service that people call, to the apps that we have on the phone. So, we have everything from, you know, the things that users are directly using down the basic research, and it’s all in one team, so, you know, when somebody comes up with something cool, we can get it out to users very quickly. And that’s very exciting.

Host: I always ask my podcast guests my version of the “what could possibly go wrong” question. Which is, is there anything about your work in machine translation that keeps you up at night?

Arul Menezes: Well, so, we always have this challenge that we are learning from the data. And the data is sometimes misleading. And so, we have things that we do to try and clean up the data. We do have a mechanism, for example, to be able to respond to those kinds of issues quickly. And it has happened. We’ve had situations where somebody discovered a translation that we produced that was offensive and posted it on Twitter. And, you know, it kind of went viral, and some people were upset about it, and so we had to respond quickly and fix it. And so, we have people who are on call 24 hours a day to fix any issue that arises like that.

Host: So, it’s a thing that literally does keep somebody up at night?

Arul Menezes: Definitely, yeah.

Host: At least doing the night shift version of it! As we wrap up, Arul, what advice would you give to aspiring researchers that might be interested in working in human language technologies, and why would someone want to come to Microsoft Research to work on those problems?

Arul Menezes: So, I think we live in an absolutely fascinating time, right? Like, people have been working on AI for – or machine translation for that matter – for fifty, sixty years. And for decades, it was a real struggle. And I would say, just in the last ten years, with the advent of deep learning, we’re making amazing progress towards these really, really hard tasks that people, at some point, had almost given up hope, you know, that we would ever be successful at recognizing speech or translating anywhere close to the level that a human can. But here we are. It’s a super exciting time. What’s even more exciting is, not only have we made tremendous progress on the research side, but now all of those techniques are being put into products, and they’re impacting people on a daily basis, and, um, I think Microsoft is an amazing place to be doing this, because we have such a breadth, you know? We have a range of products that go all the way from individual users in their homes to multi-national companies. And so, we have just so many places that our technology can be used in. The range of opportunity here at Microsoft I think is incredible.

(music plays)

Host: Arul Menezes, thank you for taking time to come out and talk to us today. It’s been really interesting.

Arul Menezes: Thank you. Thank you.

To learn more about Arul Menezes and the exciting advances in machine translation, visit