HE compilers for Private AI and other game changers with Dr. Olli Saarikivi

Published August 28, 2019

Share this page

Episode 87, August 28, 2019

As computing moves to the cloud, there is an increasing need for privacy in AI. In an ideal world, users would have the ability to compute on encrypted data without sacrificing performance. Enter Dr. Olli Saarikivi, a post-doctoral researcher in the RiSE group at MSR. He, along with a stellar group of cross-disciplinary colleagues, are bridging the gap with CHET, a compiler and runtime for homomorphic evaluation of tensor programs, that keeps data private while making the complexities of homomorphic encryption schemes opaque to users.

On today’s podcast, Dr. Saarikivi tells us all about CHET, gives us an overview of some of his other projects, including Parasail, a novel approach to parallelizing seemingly sequential applications, and tells us how a series of unconventional educational experiences shaped his view of himself, and his career as a researcher.

Microsoft Research Podcast: View more podcasts on Microsoft.com
iTunes: Subscribe and listen to new podcasts each week on iTunes
Email: Subscribe and listen by email
Android: Subscribe and listen on Android
Spotify: Listen on Spotify
RSS feed
Microsoft Research Newsletter: Sign up to receive the latest news from Microsoft Research

Transcript

Olli Saarikivi: When we looked at homomorphic encryption as a target for training, as we were doing in Parasail, we noticed that there’s actually a lot of other lower hanging fruit that we can do for homomorphic encryption, and instead of training, we started looking at inference. So, how do you even evaluate a neural network model on top of homomorphic encryption? Which is a thing you need to be able to do before you can actually do training. So, what CHET is doing is, it is building a complier for homomorphic encryption that automates many of these concerns that we would otherwise have to deal with by hand.

Host: You’re listening to the Microsoft Research Podcast, a show that brings you closer to the cutting-edge of technology research and the scientists behind it. I’m your host, Gretchen Huizinga.

Host: As computing moves to the cloud, there’s an increasing need for privacy in AI. In an ideal world, users would have the ability to compute on encrypted data without sacrificing performance. Enter Dr. Olli Saarikivi, a post-doctoral researcher in the RiSE group at MSR. He, along with a stellar group of cross-disciplinary colleagues, are bridging the gap with CHET, a compiler and runtime for homomorphic evaluation of tensor programs, that keeps data private while making the complexities of homomorphic encryption schemes opaque to users.

Host: Olli Saarikivi, welcome to the podcast!

Olli Saarikivi: Thank you.

Host: So, you’re a post-doc researcher in the RiSE group, which is Research in Software Engineering, and you’re interested in, and I quote, “distributing ML training with semantics-preserving parallelization and advancing Private AI with homomorphic encryption.”

Olli Saarikivi: Yes. And that was kind of the short story… I’m touching a lot of other things currently also, but yeah.

Host: If that’s the short story, we’re in trouble! Um, there’s a bunch of stuff in there that we’ll unpack as we get into the specific ways that those are playing out in the research you’re doing. But for now, I want to start kind of broad strokes. Tell us what big questions you’re asking, what big problems you’re trying to solve, what gets you up in the morning?

Olli Saarikivi: So, currently, all of my projects are, in some way or the other, about performance. And it’s not just about looking at the specific application and figuring out what’s the best way to get this to run fast, it’s about finding ways to make performance accessible to developers. And, um, well there was that privacy preserving thing…

Host: Right.

Olli Saarikivi: …there we’re looking at making homomorphic encryption which is a technique for…

Host: Sure.

Olli Saarikivi: …preserving privacy more accessible, while giving them a good performance when doing that.

Host: Okay. So, let’s go in a little bit further on that because I’ve had several of your colleagues in RiSE on the podcast – some of my favorite people at MSR – and they’re working on problems that I care a lot about in terms of testing and verification in software, which I know you’ve had a lot of experience in those areas as well. But your most recent work has shifted to the thing you just mentioned: performance. And so, I want to know what you mean by that and what prompted the pivot from your interest in testing and verification to performance.

Olli Saarikivi: Yeah. It is a very broad term. So, indeed my background is in program analysis topics, like symbolic execution, software verification, that kind of stuff. But it was actually when I came to Microsoft, we worked on this project for optimizing stream processing programs and here, it’s actually turned out that the same techniques that we used for analyzing programs for safety also work for analyzing programs for applying optimizations. So, performance really is about performance as it’s unlocked by powerful complier analysis. And I think these kinds of topics are becoming more and more important as the like computing landscape gets more and more heterogeneous, we’re getting like GPUs and FPGAs and all kinds of accelerators and these are hard to use. And really, we need to start thinking about these problems in like a very domain specific way or what are the like specific constraints of what are we compiling to. And it’s a thing that you can make more accessible to a user if you can provide like good abstractions on top of it through powerful complier optimizations.

Host: Right.

Olli Saarikivi: The homomorphic encryption libraries that we’re using are getting implementations on top of GPUs and FPGAs. On a very high level, it looks a lot like one of these accelerators. You get things like a very constrained programming model, weird performance constraints and all of these kind of low-level details that a typical developer has a hard time grappling with. And this is the part where building a good complier for it can help in the same way that having a good complier helps target a GPU if you don’t have to write the lowest level of code and you can use…

Host: Right.

Olli Saarikivi: …a bit of a higher-level language. So, that’s kind of what we want to do for homomorphic encryption, which is kind of just another target in the landscape of heterogeneous computing.

Host: Let’s talk about that developer right now. Every researcher has a reason, usually in the form of a group of people, for the work they’re doing. So, is that how you would define your target audience, is developers? Who are – or who do you think will be – the main beneficiaries of the work you’re doing?

Olli Saarikivi: So, for the work we’re doing with homomorphic encryption, it is definitely developers in some specific domain. Let’s say you’re a developer working for a bank and you want to increase your privacy by adopting homomorphic encryption. Now, the thing is that that developer probably will not be a cryptographer who is like intimately familiar with all the details of homomorphic encryption, but they have all of this domain-specific knowledge for their own domain.

Host: Sure.

Olli Saarikivi: And now we want to enable them to effectively use tools from homomorphic encryption in their own domain without burdening them with all of the crypto-specific details.

Host: So, they can be using the tools, but not being experts in the science behind the tools.

Olli Saarikivi: Yeah. And that’s really the aim of any complier project.

Host: Yeah.

Olli Saarikivi: Like for traditional programs, it’s that you don’t want to force people to use assembly for their coding and instead use, I don’t know, C# or something.

Host: Well, let’s talk about stream processing for a minute because I want to land on a couple of other big projects that you’re involved in that are really cool, but this area of efficient stream processing is something that you’ve done a lot of work in. Give us an overview of the high points of what you’re doing in this area. What are the technical underpinnings, motivation, rational, and what do you hope the outcomes will be?

Olli Saarikivi: So, this is work that I did during my two internships at Microsoft Research before I became a post-doc. So, again, the point here is to make performance accessible without having to kind of go in and do all the low-level details yourself. So, the idea here is that, in these stream processing applications, like, let’s say you want to parse a log file and then do some kind of processing on the things you’ve parsed out of it and maybe then do a query on top of that and then encode your data and then write it back to disk. So, you have a kind of like many stages when you process input into output. And a nice way to write these kinds of programs is to write it as separate stages.

Host: Right.

Olli Saarikivi: It can actually happen that that’s not the best way to do it for performance. One reason is that, if you write it as a separate stages, you typically get some kind of buffering in between the stages. Sometimes buffering is good, but typically, you would get kind of excessive buffering…

Host: Right.

Olli Saarikivi: …if you just write it in these small stages that you compose together. And another reason is that there’s a lot of opportunities that you’re leaving on the table. Let’s say you have a very defensive coded component somewhere later on in your pipeline that does all of these kind of checks on the input that it’s like properly formatted or whatever. But if you now compose it with some component that is actually correct, and guaranteed to produce properly formatted data, then all of those defensive checks in these latter components are unnecessary. Like, you should just remove them. But having the developer remove them by hand is… it’s a lot of work and then your code becomes less modular. So, what we do instead is that we actually compile all of these stages separately into this model of computation called symbolic transducers, which is very suited to representing stream computations. And the nice thing about these is that we have a fusion operator defined, which allows us to combine many of these stages as symbolic transducers, into one big symbolic transducer.

Host: Okay.

Olli Saarikivi: It’s basically a form of in-lining, kind of, with some fancy, solver-assisted optimizations…

Host: Right.

Olli Saarikivi: …happening inside there. And, now that we have this one big transducer that’s fused… and they do get big – like, even though we have the solver helping, there is a blow up – but now at this point we can actually start applying these program analysis-based optimizations onto that, for example, looking at reachability of certain control states, and start pruning and removing stuff from that symbolic transducer, and this allows us to implement these optimizations that were not available when these stages were considered just separately. And then when we generate code for this, we can actually get some very efficient code that does these inter-stage optimizations and removes buffering and stuff like that.

Host: Okay. Who does this matter for the most?

Olli Saarikivi: So, the targets for this kind of thing is mainly when you are actually dealing with enough data that throughput matters. So, typical things might be cloud query applications, so we were actually looking at an internal database system for integrating this into systems where you are already burning like lots of computational power and like running queries against your system and you want to reduce that to a lower level to save money.

Host: So… Yeah, I was going to say, it saves times and money…

Olli Saarikivi: It saves time and money.

Host: Well time is money.

Olli Saarikivi: Yes.

Host: In computation and other areas. So, are there other areas like in the regex field…?

Olli Saarikivi: Yes. Yeah, so that is actually a direction we took this project. So, we actually looked at doing regular expression matching using theory based on symbolic automata, which is actually important because regular expressions are not just over some small alphabets. If you’re dealing just with ASCII, which is 128, or Extended ASCII, 256 characters, you’re fine dealing with it concretely.

Host: Right.

Olli Saarikivi: But symbolic automata allow you to deal with Unicode and larger alphabets, which is the reality in dealing with strings and doing pattern matching these days.

Host: Right.

Olli Saarikivi: And there is again, like, all of these automata theoretic algorithmics available for optimizing these symbolic automata, so you’re able to get some very efficient regular expression matching routines out of that. And, yeah, that was actually a very fruitful line of work we are…

Host: Right.

Olli Saarikivi: …currently beating RE2, which is this well-known kind of default library that people go for…

Host: Right.

Olli Saarikivi: …high performance regular expression matching.

(music plays)

Host: Well, let’s talk about another cool project you’ve been a part of called Parasail. And your agenda with the project is, and I quote, again, “high performance ML and encrypted ML.” What is Parasail and what are the research challenges? And how does it defy conventional wisdom?

Olli Saarikivi: So, it’s actually a very interesting project. It’s more of a meta-project rather than, like, just one specific project. So, Parasail is a line of research that is concerned with parallelizing seemingly sequential computations. And the idea behind that is that, if you take a sequential computation, which basically means that there’s some states that evolves through, like, a sequence of computational steps, it seems like, on the face of it, hard to parallelize this because you have this sequential dependency with the state getting threaded through each of the steps of computation. But it actually turns out that you can do symbolic execution on each of these stages, given a concrete input, and by doing symbolic executions such that you kind of assume that the starting state is unknown, you can do some pre-computation based on that input. And this allows you to parallelize a lot of the computation before you actually have to do this final sequential step of stringing together these individually executed steps.

Host: So, why is that important?

Olli Saarikivi: So… there’s a lot of data in the world. For example, doing large aggregated queries over cloud-scale databases, which have data split onto terabytes and terabytes of data style things. And then, obviously, machine learning. There’s huge amounts of data available for machine learning and you want to parallelize these kinds of processes. So, actually, it’s this first thing that I mention that I got into the Parasail project through. So, we were actually looking at parallelizing streaming computations represented as symbolic transducers.

Host: Okay.

Olli Saarikivi: Which is that other line of work…

Host: Right.

Olli Saarikivi: …that I was mentioning. But it would also be useful to parallelize that. And now the idea is to do symbolic execution on a symbolic transducer to do the parallelization. Now, if we actually look at the machine learning part, it’s very different. The ideas are kind of same on the very high level, but instead of symbolic execution, we’re looking at a second-order optimization method, which has kind of the same flavor, but it’s not the same tools. And this is why I find this project especially interesting. The meta-level idea of this kind of parallelization has been very fruitful, and when we look at specific problems, for example, this stream processing stuff or machine learning stuff, you end up with very different instantiations of the same idea. So, we are kind of getting a lot out of this very simple idea just by applying it to different domains.

Host: Right.

Olli Saarikivi: And to be clear, it is a lot of work to apply it to a new domain, but it kind of sets the framework for the research.

Host: Yeah, “to be clear!” It’s a lot of work in general, and then you’ve got new domains to fit it to, right?

Olli Saarikivi: Yeah, yeah.

Host: Well, let’s switch streams again. I’ve had a couple of guests on the podcast who specialize in homomorphic encryption, which we’ve talked about briefly, but I’m not going to assume that all our listeners know exactly what that is. So, while it’s not your core expertise, it is central to the project we’re going to talk about next. Give us a quick remedial course on homomorphic encryption and how it works, including the difference between so-called “regular homomorphic encryption” and this flavor that you’re working with called “fully homomorphic encryption.”

Olli Saarikivi: Yeah. So, it actually turns out that many existing encryption schemes are slightly homomorphic with respect to some operations. So, let’s take, as an example, RSA, which is a commonly used encryption scheme. So, it has the property that, if you encrypt an integer A, using RSA and you get a ciphertext for A, and then you encrypt an integer B, also with RSA, so now you have two ciphertexts…

Host: Right.

Olli Saarikivi: …so, you can multiply these two ciphertexts together. So, RSA has a special homomorphic property that if you multiply two ciphertexts together, you get a new ciphertext that is the encryption of what the multiplication of A and B would have been. So, what, in effect, you have done is that you have done computation on encrypted values. And the magical thing is that you didn’t need the secret key to be able to do this compute. So, homomorphic encryption is a form of encryption that allows you to do computation on encrypted data, without having read-access to the data. The thing is that is you just have multiplication, that’s not very useful by itself. For example, if you want to evaluate a polynomial, you need both multiplication and addition. And that is actually the hard part for the cryptographers to arrive at. There’s a lot of examples of encryption schemes that give you either addition or multiplication, but an encryption scheme that gives you both is a relatively new thing. So, the first homomorphic encryption scheme that supported both operations, and could be called fully homomorphic, was introduced ten years ago. And the encryption schemes have a come a long way since then. So, now we have encryption schemes that support both addition and multiplication of encrypted integers. The thing is that it is still a bit slower than normal computation, but the great thing about it is that it gives you a trust model that really nothing else can. So, with homomorphic encryption, you keep the secret key, you don’t give it to anyone, and you only have to trust the math…

Host: Right.

Olli Saarikivi: …basically.

Host: I want to do a little bit of a detour and bring the issue of privacy front and center because good artificial intelligence requires data and a great deal of data is stuff that we gather from what we do on the internet or put in the cloud, and without certain safeguards, like homomorphic encryption, generally things are not private, right?

Olli Saarikivi: Yeah. That’s true.

Host: So, there’s an urgent need, in my mind at least, for a new paradigm, and this is what people are calling Private AI. What is Private AI and how can it help us?

Olli Saarikivi: So, Private AI is actually a very broad term. And it is very broad because there are lots of very different kinds of privacy concerns. So, if we take, for example, homomorphic encryption, what homomorphic encryption allows you to do is you can make some parts of your data encrypted. So, as a privacy concern, it addresses the concern of your data being leaked as you hand it off to someone else. So, if you encrypt it with homomorphic encryption instead and you can do your AI on the encrypted data, then you’ve kind of plugged that one hole. But now, there’s other kinds of privacy concerns. For example, let’s say someone uses your data as a part of their training set for training their machine learning model. Now, even if your data doesn’t get leaked, it gets somehow integrated into, like, a part of the model that’s getting trained because that’s what training means. It has to learn something about your data. But you wouldn’t want someone to be able to fully reconstruct your data just by looking at that model. And this is a form of privacy that is addressed by something called differential privacy. And, as a technique, this is completely orthogonal to homomorphic encryption and it addresses a concern that is very orthogonal to what homomorphic encryption can address.

Host: Right.

Olli Saarikivi: So, I think, for Private AI, the goal should be to, first off, look at what kind of privacy problems there are in AI and provide education to users because the problem is that users don’t even realize what kinds of privacy problems there can be. And now, if you’ve identified the problems, then obviously, we should also find the solutions for them…

Host: We’ve got to fix it.

Olli Saarikivi: …but the education aspect is actually very important…

Host: I agree.

Olli Saarikivi: …for Private AI.

Host: Right.

Olli Saarikivi: Making people realize, what are the actual implications of giving out their data or using it in training?

Host: Okay. Well, let’s talk about CHET now. Not the guy who works in Men’s Suits at Nordstrom, but the, and I quote, “optimizing complier for fully homomorphic neural network inferencing.”

Olli Saarikivi: Again, a mouth full!

Host: I love it. And, alternately, you call it an “optimizing complier for practical and interactive Private AI.” Um, it’s a tool stack for cloud inference on private data. Give us a Virtual Earth 3D view of CHET and unpack the stack for us.

Olli Saarikivi: Yeah. So, this is actually a project that got started with this Parasail research that I was talking about previously. So, it actually turns out that doing parallelization helps with homomorphic encryption because homomorphic encryption behaves better with parallel computations than serial computations. So, this is how we got into working with homomorphic encryption in the first place. Then when we looked at homomorphic encryption as a target for training, as we were doing in Parasail, we noticed that there’s actually a lot of other lower hanging fruit that we can do for homomorphic encryption, and instead of training, we started looking at inference. So, how do you even like evaluate a neural network model on top of homomorphic encryption? Which is a thing you need to be able to do before you can actually do training. So, what CHET is doing is, it is building a complier for homomorphic encryption that automates many of these concerns that we would otherwise have to deal with by hand.

Host: Right.

Olli Saarikivi: For example, it selects encryption parameters automatically based on the program you want to run. And now, it turns out that neural network inferencing in actually something that maps well onto the capabilities of homomorphic encryption, so it’s a very attractive application to look at.

Host: Okay, so let’s back up just a little bit and say, does CHET stand for something? Complier for Homomorphic Encryption…?

Olli Saarikivi: Yes. So, the T is coming from tensors, so it’s a Complier for Homomorphic Evaluation of Tensor Programs, which is kind of like a more programming language-flavored term for neural networks.

Host: So, I’ve seen posters and decks and explanations and it’s super technical. It sounds easy when you say “automating” it. There’s a lot of work that goes into getting it so that it’s automated and someone can use it that didn’t have the expertise you do.

Olli Saarikivi: So, there is a lot of work, but I feel that the work is actually in understanding the problem. Actually, in CHET, most of the techniques are rather traditional complier techniques.

Host: Interesting.

Olli Saarikivi: I think the magic comes in like finding the right things to automate, finding the right abstraction to expose to the user, and then just implementing it. But there’s a lot of kind of low hanging fruit in this space to be done all along the tool stack and this is kind of what we are addressing. Figuring out what the problems are and then just taking in rather standard techniques to address them. Now I’m glossing over details. There’s also…

Host: Yes, you are.

Olli Saarikivi: …hard problems in this space, but so far, the first things we’ve addressed are kind of simple things that no one had really looked at before. And I think a reason that this project has been very fruitful is that it’s a collaboration between the right people. So, we are collaborating with the homomorphic encryption group, who have the knowledge of homomorphic encryption and can explain the ideas to us and what needs to be done and what is kind of difficult to do and then if we need to solve some problem, they can explain the mechanics of it. And we are coming in as programming languages/compliers people and we have the know-how on how to build programming language tool stacks on top of computational targets. And we can just view homomorphic encryption as a software CPU. It’s an emulator for a CPU with a very limited instruction set that just happens to provide security if you use it. And we can just target that as any other complier would.

Host: How do you feel, when you are working with other people with different expertise, and you start to “get” what they’re doing? I’m sensing, like you said, this project informed this project and, you know, “it turns out that” this helps that, and I see this flow of research and discovery, but it’s pulling from different disciplines and maybe applying a lens to a problem that you didn’t before and then it expands your own horizons.

Olli Saarikivi: Yes, definitely. It has been a lot of learning. So, just as an anecdote, when I came to Microsoft Research for my post doc, I had no experience on machine learning or homomorphic encryption previously. I started out with the Project CHET that was doing machine learning on top of homomorphic encryption. So, like, it was combining two applications areas, both on the complier front and the back end that I had no experience with. But the key part is that I had knowledge about programming language techniques there in the middle and that’s allowed me to be effective in this area. So, there was kind of a relative advantage for me working on this problem, but it was also lots of learning.

Host: And now do you feel like you know just enough about those other areas to be dangerous?

Olli Saarikivi: Yes. I would say so.

Host: I would only be dangerous at a cocktail party where I could say “homomorphic encryption” and some people would think I’m way smarter than I am.

Olli Saarikivi: Yeah, saying those words at a cocktail party really kills the mood!

(music plays)

Host: All right, well, Olli, we’ve reached the part of the podcast where we discuss the potential outcomes of your research, both intended and unintended, and talk about what could possibly go wrong. Is there anything about your work that “keeps you up at night?” Anything that concerns you that we ought to be thinking about, or that we hope you’re thinking about?

Olli Saarikivi: Uh… yes! So, most of my projects are rather safe. I cannot see how they would go horribly wrong. But obviously the homomorphic encryption part could go, like, horribly wrong. Since we are building a complier for targeting homomorphic encryption, it places at lot of responsibility on us to actually get it right.

Host: Right.

Olli Saarikivi: Now, homomorphic encryption is a thing that is relatively easy to use in a safe way, meaning that most of the mistakes you make will not actually leak user data, they will make it return garbage, which is kind of a nice property. But there’s a few things that we still need to be careful with. For example, we need to be careful to select correct encryption parameters, so that is something we’ve put a lot of thought into and there’s also like a “defense in depth” kind of perspective here, so CHET selects encryption parameters and then SEAL, which is Microsoft’s homomorphic encryption library…

Host: Right.

Olli Saarikivi: …checks that those have been selected correctly, so there’s kind of two stages in that. Another thing that is important in this space is open sourcing. So, the SEAL encryption library is actually currently open source. It’s available on GitHub and we are in the process of open sourcing CHET. And, really, in this kind of a space where people are really concerned about their privacy, these kinds of projects really need to be open source. There’s no other way about it. It really just buys trust. If someone is going to use it, they should be able to see what it does.

Host: Well, tell us a little bit about your personal story, or your journey from Finland to Redmond. What got a young Olli Saarikivi in Helsinki, interested in computer research and how did he end up at Microsoft Research?

Olli Saarikivi: So, I actually started out studying physics. So, I did a year of that, and noticed that it was not for me. I think the main reason was that I wasn’t actually very interested in the day-to-day of doing physics, but I was interested in the day-to-day of doing computer science, which is programming, and I enjoyed the process of it, so that’s what, really, the early driving factor into going deeper into computer science. Then, I worked with a professor, Keijo Heljanko who had an extensive background on model checking. And that’s how I got into this testing and software verification stuff. And, in the field of testing and verification, Microsoft Research is the best in the world. It has, like, a significant selection of the best people in the world, specifically in the RiSE group, in this field. So, you couldn’t not be aware of Microsoft Research when working in this area. And I think, like, the biggest thing that struck me about the papers I was reading, coming out of Microsoft Research, was the access to important problems. So, inside Microsoft, there is obviously a lot of software getting written, so the field of program analysis just has a lot of material to work on. And that is so important. So, working at a company like Microsoft is really an advantage. So, that’s why I wanted to get into Microsoft Research. And now having gotten into it…

Host: Don’t want to leave!

Olli Saarikivi: …it’s amazing. Like, you work with such great people. I love it.

Host: So, what was your educational background? Where did you go to school and how did you, you know, wander across the pond, as it were?

Olli Saarikivi: So, that’s actually an interesting question. I started out school in India.

Host: Whoa!

Olli Saarikivi: Which I hadn’t mentioned before.

Host: No, you didn’t!

Olli Saarikivi: And the school in India was an American school. The American Embassy School, which is a very interesting school. It’s basically like a piece of America just transplanted into the middle of New Delhi. And on the first day going to school there, I did not know a word of English. So, I was just kind of thrown into the deep end basically, but that gave me a great background in English, and I think that has actually helped me a lot as a researcher, because you have to communicate so much. Research is, like, all about you know communication. So, having a strong grasp of the language has helped me.

Host: So, I’m intrigued now. How did you go from Finland to India to an American school? Was there some family connection, or…?

Olli Saarikivi: Yeah. So, it was actually my father who was an engineer/manager at Nokia, and he got sent on an assignment to India to set up the business there. So, that was only two years of my childhood. Then I came back to…

Host: OK, that’s fascinating.

Olli Saarikivi: …to Finland, and…

Host: And went to university there?

Olli Saarikivi: …yeah, yeah. So, all of my university schooling has been in Finland.

Host: Including your PhD?

Olli Saarikivi: Yes. Yes.

Host: And then the internships were what brought you here?

Olli Saarikivi: Yeah. So, I managed to get an internship during my PhD studies and, apparently it went well, so I managed to get another and then they finally hired me on as a post-doc. So, yeah.

Host: Well actually, the question I was going to ask you about what one interesting trait has helped you in your career as a researcher, that little tidbit about an American school in India from a Finnish guy is pretty close to the funniest thing I’ve heard. But is there anything else that you have, like some tidbit that we couldn’t find about you on a web search, that has actually influenced your career as a researcher?

Olli Saarikivi: Yeah. I actually went to boarding school for my high school, but it was a strange kind of boarding school. It was a STEM-focused boarding school in the middle of the Finnish countryside, like no civilization for kilometers around it. Takes in twenty students a year and everyone’s a nerd, basically. And that was the first place where I realized that there’s lots of people who are a lot smarter than me. So, I was, like, mediocre in that school, which I think was an important formative experience. As a researcher, it teaches like intellectual humility, which I think is a good trait to have.

Host: Yes, it is! So, twenty kids?

Olli Saarikivi: Uh, twenty kids a year, yeah.

Host: And how were you selected?

Olli Saarikivi: There was a selection process.

Host: A test?

Olli Saarikivi: Yeah, you had to take a test, basically like high school level math. I think they also select people a lot due to personality. Like they need to find twenty kids that can actually live in a boarding school, going home only every second week. And uh…

Host: Right. So, personal character and, you know, get-along-ability…

Olli Saarikivi: Yeah. People who fit in… into the group.

Host: So, would you say, prior to that, that you thought you were the smartest kid on the planet?

Olli Saarikivi: Yeah, so, in the grades before that, there was like, maybe, one kid who kind of rivaled me, so I think that’s a typical experience for many people going into research early on and I think it’s important to get fast into the stage where you’re not the smartest person in the room!

Host: As we close, I want to give you the last word. So, here is your chance to say anything you want, by way of advice or inspiration, or maybe wisdom or warning, to our listeners, many of whom are in… somewhere close to where you are on your career path, maybe a little bit behind. What would you say to emerging researchers who might be interested in following your footsteps?

Olli Saarikivi: Well, as you’ve heard now, I’ve touched many topics in my research. But I think one thing that has helped me in each of these topics is working on a project that someone actually cares about. That there is some user that you can picture in your mind that you can sympathize with and you can like think of what their problems are. It changes the research question from, what tricky problem could I solve? into more of like, what tricky problem do I need to solve for this specific application that I’m looking at? And it just makes going deep into the research problem so much easier because you can just imagine your problem and look at what the problems to solve are. And it also makes writing papers easier. Your motivation section comes for free when you like have actual motivation! So, I would really encourage people to do the work of defining their problem well and finding a problem that is actually well-motivated. And for sure, it is work to do that. You could just take the first hard problem that comes along, but not all hard problems are actually useful problems to solve.

Host: Olli Saarikivi, thank you so much for coming in!

Olli Saarikivi: Thank you very much.

(music plays)

Host: To learn more about Dr. Olli Saarikivi and how researchers are inviting efficiency to the privacy party, visit Microsoft.com/research