Project Orleans and the distributed database future with Dr. Philip Bernstein

Published April 8, 2020

Share this page

Episode 114 | April 8, 2020

Forty years ago, database research was an “exotic” field and, because of its business data processing reputation, was not considered intellectually interesting in academic circles. But that didn’t deter Dr. Philip Bernstein (opens in new tab), now a Distinguished Scientist in MSR’s Data Management, Exploration and Mining group (opens in new tab), and a pioneer in the field.

Today, Dr. Bernstein talks about his pioneering work in databases over the years and tells us all about Project Orleans (opens in new tab), a distributed systems programming framework that makes life easier for programmers who aren’t distributed systems experts. He also talks about the future of database systems in a cloud scale world, and reveals where he finds his research sweet spot along the academic industrial spectrum.

Microsoft Research Podcast (opens in new tab): View more podcasts on Microsoft.com
iTunes (opens in new tab): Subscribe and listen to new podcasts each week on iTunes
Email (opens in new tab): Subscribe and listen by email
Android (opens in new tab): Subscribe and listen on Android
Spotify (opens in new tab): Listen on Spotify
RSS feed (opens in new tab)
Microsoft Research Newsletter (opens in new tab): Sign up to receive the latest news from Microsoft Research

Transcript

Phil Bernstein: It’s very unusual to have to build a data management service where it could be a blob store, a JSON store, a relational database, it could be anything, and it could be any one of many products for each of those data structures and yet you only want to have to build this feature once and then have it run successfully no matter what underlying storage system you’re plugging in.

Host: You’re listening to the Microsoft Research Podcast, a show that brings you closer to the cutting-edge of technology research and the scientists behind it. I’m your host, Gretchen Huizinga.

Host: Forty years ago, database research was an “exotic” field and, because of its business data processing reputation, was not considered intellectually interesting in academic circles. But that didn’t deter Dr. Philip Bernstein, now a Distinguished Scientist in MSR’s Data Management, Exploration and Mining group, and a pioneer in the field.

Today, Dr. Bernstein talks about his pioneering work in databases over the years and tells us all about Project Orleans, a distributed systems programming framework that makes life easier for programmers who aren’t distributed systems experts. He also talks about the future of database systems in a cloud scale world, and reveals where he finds his research sweet spot along the academic–industrial spectrum. That and much more on this episode of the Microsoft Research Podcast.

(music plays)

Host: Phil Bernstein, welcome to the podcast!

Phil Bernstein: Thank you.

Host: You’re a Distinguished Scientist, and a bit of an OG at Microsoft Research and you’ve been at the forefront of innovation in database technology for several decades, but currently you’re working at MSR under the umbrella of the Data Management, Exploration and Mining Group, or DMX. Before we dive deeper on you and your specific work, give us an overview of DMX and how database research is situated in the broader framework of Microsoft Research today.

Phil Bernstein: Well, Microsoft has a huge database business and the database business, in general, from the very beginning in the 70s, was largely driven by research and so research has always been a very important ingredient in improving database products, is this need to innovate all the time. And that comes both from the engine side, building the core technology to manipulate large amounts of data, complex data, but also the tools to make it possible to design the database, to be able to manage it, and the DMX Group covers both. It covers base engine technology for manipulating data, for building cloud services, and then also for tools to integrate data, to find ways to reduce the cost of ownership by reducing the level of effort on the part of database administrators. So it’s the full range, and that’s where the data management, exploration, being able to look around and understand your data, and mining to really get into the tools to analyze the data afterwards.

Host: Well, let’s situate you, now. How would you describe your “research identity” in terms of what gets you excited about the work you do and what gets you up in the morning?

Phil Bernstein: Well, I look for high impact. I’m trying to figure out what to work on that’s going to make a difference, and also where my incremental value is going to be high because there aren’t enough people working on it or paying attention to the problem. There are two technical areas where I’ve focused, mostly, over the decades. One is transaction processing, which is how to build systems like online retail or banking systems, money transfer systems, those sorts of things, and that stuff is very low level, you’re very deep into the database engine. And then, on the flip side, much higher level, the integration of data. So the data’s is in the database. You’ve got to manage it. How do you tie it together? And I’ve worked on both, and I’ve kind of flip-flopped back and forth between them depending on the problem of the day and where the short- and the medium-term opportunities tend to be.

Host: Well, I want to take it back for a minute because you just mentioned a couple of topics that I think are important. You’ve done some seminal work in transaction processing and distributed databases, so let’s go back several years. Give us a snapshot of the computing landscape when you started and then tell us what changes you’ve seen over the years, what things look like for researchers in the cloud era, and why understanding the past is helpful when innovating for the future?

Phil Bernstein: Well, it’s been a long road. I mean I started my research in the mid-1970s, so it’s over forty years ago. At that point, the database business was small. I mean, it was barely a business. It was absolutely disreputable as a research topic because data management sounded like business data processing, which was believed to be not intellectually interesting. You’re writing Cobol programs and that was the end of it. And hardly any of the fundamental issues had really been explored at all, and the ones that had been explored, certainly not in any depth. So the opportunities were everywhere. And in fact, early in my career I – I had to stop working on some problems not because they weren’t interesting, but because there were just too many problems to work on. I had to – had to focus more in order to get something done. Also, those were the, you know, the mainframe computers, I mean, there was no distributed anything.

Host: Right.

Phil Bernstein: We knew it was coming, but it hadn’t come yet. Database management was all in a glass house and it was, you know, glass enclosed, air-conditioned room used for business data processing, period…

Host: Hmm.

Phil Bernstein: …no personal computers. You know, you talked to people about working on computing and that was considered very exotic.

Host: Right.

Phil Bernstein: You know, now everybody’s got one and – at least – and has a pretty good feel for what they all do.

Host: In their pocket.

Phil Bernstein: Yeah, it’s really different.

Host: Okay, so when did you start seeing changes and how did that impact what you were doing as a researcher?

Phil Bernstein: There’ve always been changes, so it’s hard to say that there was any given point where the changes were really big. I started out looking at database design for my PhD research, but as soon as I left and embarked on my own career, I got involved in distributed databases, which seemed like one of the next big things, and I worked on it for many years. I stopped for a while and then, with cloud computing, it all came back, and I’m working on it again so…

Host: Right.

Phil Bernstein: …it’s a sort of a pendulum. These topics come and go depending on what the workloads are that are needed, what the computing environment is that has to support the data management.

Host: Well, the cloud presented a massive jump in scale for distributed systems, writ large, so you say you kind of came back into it. Was it because hey, this is a big, new nut to crack and I want to be in on it?

Phil Bernstein: Certainly, I wanted to be in on it, but I was also asked to be in on it!

Host: Help!

Phil Bernstein: At the time, it was… Microsoft was starting to think about changing its database strategy, its base products, to work on commodity hardware, to scale out on large numbers of inexpensive machines running in the data center. And they kind of looked around and, you know, who is it that we have on staff who knows something about this? And I was one of the people that they tapped and so we developed a new strategy and it took many, many years for that to unfold. This was back in 2006 so it was more than fifteen years ago, but where we are now with the products, we actually landed roughly where we were trying to get back from the beginning, so…

Host: Well, OK, so rounding out this four-part question that I kind of just laid out and am still walking through with you because I’m really interested in this. Bill Buxton was on the show and he talked about the long nose of innovation, and things come and go, and the idea that you can’t innovate for the future unless you really understand the past, so why, from a database perspective, is understanding the past helpful when you’re trying to innovate for the future?

Phil Bernstein: The set of mechanisms that we use to solve database problems, they don’t change very fast. Back in the early days, we were learning about certain base technologies for the first time, but now, there’s this repertoire of ingredients that you put into solving a database problem. I’m very sympathetic to graduate students who are trying to learn this stuff because, you know, I learned it slowly over a period of many years as it was unfolding, but people getting into the field, they learn it in a very compressed amount of time and they don’t necessarily have a deep understanding of why things are the way they are and so when they encounter a problem, they’re trying to solve it just based on an understanding of the problem and then trip over some approach that they think, oh, I’ll bet that would be helpful, but then they don’t realize this is actually a variation on something that has been applied in several other contexts before.

(music plays)

Host: Well, let’s get specific and talk about some of your current work and there’s a project you’ve been working on called Orleans, which you’ve called, somewhat generally, a “distributed systems programming framework” or a “programming model and run time for building cloud-native services.” Both are pretty high-level, so tell us what is Orleans and what’s the motivation behind it? Or the pain point that prompted it?

Phil Bernstein: So maybe we should start with, what’s the programming framework?

Host: There you go.

Phil Bernstein: So it’s a form of middle-ware. That’s to say that it’s generic software. It’s not application-specific but it’s not a low-level platform either. Generally, a framework takes a bunch of services that are available from operating systems, networking, distributed systems, and packages them up to be easier to use by integrating them in some nice way, and so Orleans is a programming framework. That’s what it does is this integration of lower level services. The problem it’s addressing is that of building distributed applications that run in data centers, in the cloud, on large numbers of machines. And the reason why this is a problem is that mainstream programmers who have learned how to build applications are generally not distributed systems experts and there are many ways to go wrong when you try to carve up an application and get it to run on a lot of servers. It needs to be elastic. That is to say, without changing the application, you need to be able to add servers if the workload increases, or reduce the number of servers if you don’t have so many customers using it. It needs to be reliable because these machines are relatively inexpensive and they actually fail at a significant rate and so you don’t want the whole thing to come tumbling down every time you lose a server. So if you’re going to spread the workload on multiple machines, maybe you don’t do it so well and one of the machines becomes a bottleneck and sort of the whole thing grinds to a halt because this one machine is being overtaxed. So these are the kinds of problems that an application developer faces and Orleans is basically trying to factor them out so that you don’t have to worry about them at all. The framework does all that. You just focus on building the application.

Host: All right. So that’s kind of like the “problem statement” of why it exists. Tell us how you would define it.

Phil Bernstein: Maybe, from a practical standpoint, it would be good to just mention the kinds of applications you would use it for.

Host: Right.

Phil Bernstein: And these are what are sometimes characterized as “stateful interactive services.” What do I mean by that?

Host: Yeah.

Phil Bernstein: Well, maybe easiest to see by example: internet of things, games, telemetry to monitor some other system, typically a computer system, social networking, mobile computing. In all of these cases, the application is managing information about something going on in the world. That’s the main application function. And so the second characteristic is that these applications are all object-oriented in the technical sense. Like object-oriented programming language.

Host: Yeah.

Phil Bernstein: In internet of things, the objects are, well, they’re things, you know? They’re sensors, they’re devices of various kinds. In games, they might be things like players, games, scoreboards and the like. Obviously, for mobile computing, they’re mobile devices. So in all these cases, your application that’s running in the cloud has objects that are surrogates or models of the physical thing, or logical thing in the case of games, that are out there in the world and so what you’re doing is, the application is spreading its workload across servers by spreading the objects around. Now, if you want those objects to be spread around on multiple servers, they better not share memory because they may not be co-located on the same server…

Host: Right.

Phil Bernstein: …which means that the only way they’re going to be able to interact is to send messages to each other. And another decision that was made in Orleans, is to have the objects be single-threaded. That there’s no internal parallelism in these objects. And the reason for that is that programming, in that case, is much more challenging for application developers because now you’ve got parallel activities that are going at the state of this object and they can trip over each other and so they need to synchronize, and engineers, historically, have a hard time getting that right. Conceptually, it doesn’t sound that bad, but when you actually have to write programs that run at high speed and they access this shared state of the object, it actually is quite hard to get it right in all cases. So Orleans said, no, we’re just not going to allow that. So objects are single-threaded and they don’t share memory and they communicate by exchanging messages. Now, what’s new in Orleans is something called the virtual actor model. And the characteristics that I just described, of single threading, no shared memory, message-based communication, in the technical literature, that’s often called an actor. It’s just another word for object that has these characteristics. And in the virtual actor model, the application developer does not control when the object is instantiated, when it’s activated, where it’s placed on machines, all of that is handled by the framework. What Orleans does in that case is that it will first, look around to see if the object is running and if it is, then it will perform the function that was requested. If it’s not running, then Orleans will pick a server on which to activate the object, will spin up the object on that server, and then will do the invocation that was requested by the application and will remember where the object’s located so that future calls can go to that copy that’s already running. If the object isn’t used for a while, Orleans will notice that also and it will deactivate the object and free up its resources. So it’s sort of like a paging system in operating systems where you bring in pages of memory as needed and then evict them when they’re no longer needed. It’s sort of the same thing here, but it’s being done with objects. And this was a new concept when Orleans was developed.

Host: And when exactly was Orleans developed?

Phil Bernstein: The project started, I think it was like 2008, 2009, in there.

Host: Let’s drill in a little more technically – you’ve alluded to several of the things that I think are important about the project itself – and then unpack some of the big challenges you’ve addressed, like scalability and reliability, in the cloud scale world.

Phil Bernstein: Reliability and scalability are natural consequences of the virtual actor model, so let’s look at scalability. Remember that if you invoke an object and it’s not running, Orleans will place it on a server. So it’s up to Orleans to balance the load across all these servers. Ideally, when you activate an object that was not running, you want to put it on a lightly-loaded server so that you don’t overload any other servers. So Orleans is in charge of keeping the load balanced across the servers and that enables scalability. Let’s look at the reliability part. Suppose a server fails. Well, obviously all the objects that were running on that server are immediately gone, but the next time any of those objects are invoked, Orleans will recognize the fact that they’re not running anymore and so it will just resurrect the object. It will just activate it on one of the servers that is healthy, that is running, and will continue making forward progress. So the application developer doesn’t have to be too concerned about balancing the load across servers and doesn’t have to be worried about fault tolerance, which is something that previous actor-oriented systems all exposed to the application developer. Orleans lets you forget about that. But there is one consequence of this, which is that when an object is activated, what state is it in? You know, what does it know about itself? And that is an application programming problem because, at the moment that that server fails and the objects go away, their state in main memory is lost.

Host: Hmm.

Phil Bernstein: And so when the object is reactivated on another server, it’s going to be entirely up to the application program for that object to reinitialize its state. And state is another word for data, and reading data to initialize an object is just another way of saying it needs to do data management, and that’s how I got into this game was that I said, gee, I think you folks could use some help because this is a pretty big burden on the application developer to figure out how to do all of this state management.

Host: Well, talk about how this open source project has evolved and grown over the last few years. How have you added to the work and why have you moved in those directions?

Phil Bernstein: Well, we’ve gained a lot by being open source. Orleans was one of the first projects that went open source. As I said a little while ago, I got into this because I could see that application developers had to do a lot of state management and that the standard abstractions that are part of the database repertoire are relevant to building these sorts of applications so maybe I can just start adding them, you know, add indexing, add transactions, add geo-distribution, replication, and just make it easier for the application developer. I wasn’t even sure if this was research because it was just applying what I knew about data management to yet another product, if you will. But it turned out that it was research, which I didn’t really see going in and it’s research for two reasons. One is that it uses storage that’s actually cloud storage. It’s not storage that’s running on the server with the application. That’s very unusual. When you build a data management system, you expect to be able to control storage. I mean that’s such an important ingredient in doing data management. But here, the storage is – it’s a service. And the second is, because its plug-in, it can be anything. Again, it’s very unusual to have to build a data management service where it could be a blob store, a JSON store, a relational database, it could be anything, and it could be any one of many products for each of those data structures and yet you only want to have to build this feature once and then have it run successfully no matter what underlying storage system you’re plugging in. And that is a pretty unique challenge. It’s not something I had ever seen done before, so it has required to re-think these abstractions from the beginning…. And it’s interesting.

Host: So what have you done, additionally, or how have you “new and improved it,” as it were?

Phil Bernstein: Well, take transactions as an example. When you build a transaction system you have to keep track of which transactions have succeeded – which is called committing the transaction – and which ones have not. And that’s generally done in a log, and that log is in storage. And the rate at which you can run the transactions is heavily dependent on the rate at which you can record that information in the log. So it’s a good idea to have one log and be able to simply append these descriptions of transactions that start and commit in this log. But there’s a problem here, which is that cloud storage doesn’t offer a log. And so every database system I know of has a log and here we’re going to implement transactions and there is no log. Um… what are we supposed to do? And you know, so we said, well, we’ve got plenty of storage, so I guess we’re going to have to do our own log on top of cloud storage, which is what we did, and that worked, but it created some complexity in the system that our customers didn’t like very much and we had to go back and do it again a different way because they didn’t really want this custom log we had built…

Host: Interesting.

Phil Bernstein: …and so what we did was we re-did it so that we managed the state of the transaction as part of the state of the object. So we piggyback our own log information on the storage that’s used by the object.

Host: Interesting.

Phil Bernstein: And that was something we hadn’t seen done before, so…

Host: And how was that received?

Phil Bernstein: They liked it a lot. That one stuck and that’s what’s shipping.

Host: Every research project has at least one “not yet.” I’m putting that in air quotes. Probably a lot more than one… meaning things that we don’t support yet, we can’t do yet, that aren’t on the map yet. What are the open problems that you still face in this arena and how do you think you’re getting closer to – or at least thinking about getting closer to – solving them?

Phil Bernstein: Well, one of the big things that we don’t do that we want to do is what’s now called serverless operation. What that means is that when you develop the application and deploy it, you’re unaware of the fact that there are many servers out there. That’s not reality today with Orleans because Orleans is simply a programming framework and when you develop your application, you have to explicitly reserve servers in Azure and then deploy your application on those servers. So you’re very much aware of the fact that there are servers, and they’re your servers, you’ve reserved them to run your application. Now, what we’d like is to have this be a serverless service where you don’t know about any servers. You still write your application in the way you always have, and you just drop it in the in-hopper and press a button and our infrastructure on Azure then just grabs that code, and we take care of all this server stuff of provisioning the server and uploading the code to those servers and deal with the failures and add servers and reduce the number of servers and all of that stuff in a way that’s completely transparent to the person, or the group, that’s running this application operationally. So serverless operation is a big one. And the other is kind of related, which is just automating system management, capacity planning. Let’s say you’re a game developer and you’ve got thousands, tens of thousands, perhaps, of gamers, you know, playing. Just monitoring it, figuring out what’s going wrong, looking at the behavior of the users. Right now, that’s all part of the application and yet it’s something that every application developer faces. So why should everybody have to do this in a custom way, on their own? Can’t we do something to automate it? And then third is, I mean, there’s still data management abstractions which are not built into Orleans that I would be interested in adding someday. Um you know, we’ve added some. We’ve got transactions, indexing, geo-distribution are in there, but there are certainly are others that we could add over time, depending on the need of the applications and competing priorities.

(music plays)

Host: Phil, you’ve been in every situation along the research spectrum from academia to industrial research to product, and back, and in that sense, you’re kind of a walking, talking example of human tech transfer yourself. Talk about your experiences in each of these areas and what the value is, having had experience in each of them, as you’ve landed here at Microsoft Research?

Phil Bernstein: Sure. Well, you know, I have done all this other work. I was a professor, I did a start up, I was working for a hardware company for some years in product development, and now I’m in industrial research in a software company, but I’ve been at Microsoft for twenty five years so that says something about which one I prefer. But let me talk about some of the common features in all this work. New ideas are coins of the realm. I mean, your job is to come up with new ideas to solve. In other cases to solve a problem that maybe others have identified. The highest impact on this kind of work is generally done in teams, so you’re always working with other people. Customer pain points are generally good motivation for research. Partnerships are often worth nurturing. There are many activities that you’re required to do as a researcher. It’s not just doing research but it’s also participating in the research community. It’s writing research papers. It’s reviewing research papers written by others. And everybody feels under time pressure to do well at all of these things and so learning how to align your activities so they’re all pointing at the same goal are important. So all that is true for everybody. But beyond that, academic research, product development and industrial research are different in many ways. Academic research tends to be entrepreneurial. That the professor is generally running their own research group. That means you’re writing grant proposals. You’re expected to teach. There are committees. Everybody’s got to do their share of committee work. So it’s a very complex job, but when it works out well, it’s super exciting. It’s really like running your own company, although you’re doing it in the context of a research group. Product development is quite different and so is industrial research because you end up doing a larger fraction of the work yourself. In product development, you’re writing specifications, you’re writing a lot of code. Speed is a virtue. You’ve got to be willing to live with the fact that there’s often insufficient time to do the complete solution you want because the product’s got to go out the door at a certain time and if you’re not ready with your piece, well, the train’s going to leave the station whether you’re on board or not and so, in order to ship products, you have to learn how to steer a path to the right technical compromises of what goes into the product and what gets saved for the next version. And when you’re in academia you don’t have to do that, you know. You just basically include everything you want to include and, you know, it doesn’t have to be product-quality so it’s okay.

Host: Right!

Phil Bernstein: On the other hand, shipping to a large audience is really a kick. I mean, getting feedback from grateful customers, it’s a unique emotional experience that is really wonderful when it all works, and makes working long hours super worthwhile because you’ve really done something that has a tangible effect in the world. Now, where does industrial research fit in this? You know, it’s somewhere in between, right? It’s research, but you’re doing it in an industrial setting. Well, the main thing is that we have more time. We are not under the same time constraint so we can actually work out the details. We have more control over selecting our problems and so we can identify problems maybe the product group isn’t even ready to think about yet, and, as I said, de-risk them, you know. Get it to the point where product group can pick it up and feel like they can put it on a schedule and they know how long it’s going to take and have lots of confidence that it’s actually going to work in the end.

Host: We talked about what gets you up in the morning, Phil, but now’s the time on the podcast where I ask what keeps you up at night? So what kinds of things keep you up at night and what are you and your colleagues doing about it?

Phil Bernstein: What I think about most is, am I working on the right problem? Really, problem selection is everything in research. If I solve it, is it going to have high impact? Is it likely to be something that a product group is going to pick up? You know, what is the barrier to actually making it real? Maybe I understand the nature of the problem very well, but I don’t have any really brilliant research idea on how to solve it. And sometimes I worry, you know, whether I’m just too far ahead of my time, which is a unique thing about industrial research, you know. We don’t tend to work on problems that the product group can solve. They’re every bit as smart as we are and they have a lot more people, and so anything that they’re going to do in the next couple of years, it’s really not a good idea for us to work on. We have very little added value. And we don’t really want to be working on stuff that’s ten years out. That’s a good thing in a university, but you’ve got to pay the bills. So we tend to work in this two to five year range and there are times that I just get it wrong. I just, I think this is going to be an important thing four or five years from now, and two years into it, it still seems like it’s going to be four or five years and it’s just, the goal posts keep moving out and I think maybe this was not the ideal place to be. So that’s probably the biggest thing that I worry about outside of doing the work itself.

Host: Right. You have a long and varied path in high tech. We’ve alluded to it a bit in our conversation. But I’d love to hear your story. Tell us about your roots, your journey and your ultimate path. You know, give us the Reader’s Digest version…. Is Reader’s Digest even a thing anymore? Give us the Twitter…

Phil Bernstein: I’m old enough to know!

Host: Give us the tweet version!

Phil Bernstein: Sure, and it really is a journey. When I look back on, there are so many forks in the road where, if I had taken the other path, it would have turned out very differently and I had no idea how. So I got a PhD in computer science and I had a choice between a research lab and a university. I went to a university. I became a professor at Harvard. That all sounds very impressive except that, at the time, Harvard’s computer science department was not very good and so it’s um…

Host: You made it good, Phil.

Phil Bernstein: It was – it was impressive to people in the real world, but in the computer science world it was like, why would you go there? And I had done a lot of consulting on the side, partly to enrich my understanding of real problems, and partly because universities don’t pay very well. And that led to a gig with a start-up doing computer development and they ultimately offered me to be in charge of their whole software operation and so I left academia and I became a vice president at a start-up for two years, and after about a year and a half I decided I really hated it, umm… and that was not the right place for me, and I actually went back to a university for a couple of years, completed some research that I had been doing before that sojourn at a start-up, and then they shut down the university, um… which um… was a bit of a shock, but it was a start-up university. It was called Wang Institute of Graduate Studies. Its goal was to create a professional degree program in software engineering, which is still a very good idea, much like, you know, a law school versus a philosophy department, or a medical school versus a biology department. Anyway, I had to go do something else, so I went to work for a hardware company, Digital Equipment Corporation, and worked on their transaction processing products for a while and then their middle-ware for data integration for a while, and then they started unraveling. I seem to have a history of this that um… and I don’t think I was a cause, but… because of this work I had done on meta-data management and integration, I got a call from Microsoft to be architect for a development that they were doing in this area, Microsoft Repository. And so I took it and that’s what brought me to Microsoft. I worked on that product for four years, at which point it became clear to me that there were just other things that the company thought were more important and so I moved back into research and that’s where I’ve been ever since.

Host: Are you an east coast kid?

Phil Bernstein: I grew up on the east coast, yeah, in New York, and then went to school in Canada at the University of Toronto.

Host: Oh, you did?

Phil Bernstein: And then after that I moved to Boston and I had this long string of jobs in Boston. Harvard, the start-up, back to another university, then Digital Equipment Corporation, so all living in the same place.

Host: So you did the big jump to the west coast by Microsoft?

Phil Bernstein: Yeah.

Host: Tell us something we don’t know about you. I’ve been asking this in the context of whether it’s a personal trait or a defining life moment that may have influenced a career in research, but if I’m honest, I actually just want to know what goes on in the lives of researchers outside the lab. So however you want to answer it, Phil.

Phil Bernstein: Something about me that you wouldn’t ordinarily know: I am fascinated by finance, investing. Now you might think, that oh boy, you know, he really likes to make a lot of money and all, and I’d love to make a lot of money. I’m actually really bad at it. I mean it’s like I don’t manage my own savings. I delegate that to a professional. But what I like about it, it’s endlessly complex, it’s always changing, and there’s one success metric. There’s no way to fake it. Either you’re making more money or you’re not, so I’m just totally hooked. I mean, I read a lot about it, you know, it’s a hobby. I get no, really, personal benefit. My wife makes a joke that I sound very good. You know, people ask me about investments and I sound extremely knowledgeable and all, and then she looks at me and she says, but how come you can’t make any money? Well, you know, you can’t have everything.

Host: Well, Phil, it’s time to wrap up. Before we go, I want to give you a chance to offer some parting advice to our listeners. And many of them are just getting started on their path to high tech research and you’re a veteran in the field so you’re in a unique position to impart some wisdom. Knowing what you know, and having done what you’ve done over the course of your career, what thoughts would you share with our audience? I’ll give you the last word.

Phil Bernstein: Thanks for the opportunity. I actually have strong opinions on this one. I think the most important thing is to know what you’re optimizing and I think there are only four possibilities: money, power, fame or personal happiness. Now, everybody wants all four, but if you don’t prioritize one of them over the others, you might not get any of them at the level that you really want. There will be many forks in the road along the way, and if every time you face that fork in the road, you choose based on a different optimization criterion, you’re lowering the chances that you’re going to get the one that you want most. But beyond that there are many other little snippets of advice. I’ll try to do them quickly! Early in your career, choose your research for the long term. It’s so easy to pick something because it’s a hot topic, but if you want to succeed in a big way, you want to be an expert at something that’s going to be super-important fifteen years from now. When you’ve gotten past that apprentice/journeyman stage, you’re now considered to be an expert and now this thing is super-important and you’ve had fifteen years to really become one of the best people working in that area. So choose a topic where your incremental value is higher, which means probably it’s going to be an unpopular topic, which means you have to be brave. Exploit what you’re good at, but also work around what you’re not good at and look for opportunities to grow. Also, you want to exploit synergies with your environment. Based on what’s around you, you can get research leverage from the fact that your company is really good at a certain something and therefore you have a competitive edge in working in that area. But despite all of this, you still want to be flexible. Opportunities will show up randomly and that may turn out to be the most important thing in terms of your long-term success is that you grabbed the right opportunity at the right time, which might have been leaving behind something you had actually invested quite a bit of time in. And then finally, a piece of advice I got very early in my career as a researcher, which is, if you want to be good, write a lot of research papers. If you want to be great, never publish a weak one. Because you want people, when they see your name on a paper, they want to say, oh, his or her papers are always super interesting. I’ve got to read this one. If only every third paper you write is like that, much less likely to get their attention. I’ll stop there.

Host: I could stay here for a long time because you’ve given me some advice that I could use. These are great. Phil Bernstein, thank you for coming on!

Phil Bernstein: My pleasure. Thank you, Gretchen.

(music plays)

To learn more about Dr. Philip Bernstein and the latest research in database management, exploration and mining, visit Microsoft.com/research