Proving the Correctness of Multiprocess Programs
When I first learned about the mutual exclusion problem, it seemed easy and the published algorithms seemed needlessly complicated. So, I dashed off a simple algorithm and submitted it to CACM. I soon received a referee’s report pointing out the error. This had two effects. First, it made me mad enough at myself to sit down and come up with a real solution. The result was the bakery algorithm described in . The second effect was to arouse my interest in verifying concurrent algorithms. This has been a very practical interest. I want to verify the algorithms that I write. A method that I don’t think is practical for my everyday use doesn’t interest me.
In the course of my work on parallelizing sequential code (see ), I essentially rediscovered Floyd’s method as a way of extracting properties of a program. When I showed a colleague what I was doing, he went to our library at Massachusetts Computer Associates and gave me a copy of the original tech report version of Floyd’s classic paper Assigning Meanings to Programs. I don’t remember when I read Hoare’s An Axiomatic Basis for Computer Programming, but it was probably not long afterwards.
In the mid-70s, several people were thinking about the problem of verifying concurrent programs. The seminal paper was Ed Ashcroft’s Proving Assertions About Parallel Programs, published in the Journal of Computer and System Sciences in 1975. That paper introduced the fundamental idea of invariance. I discovered how to use the idea of invariance to generalize Floyd’s method to multiprocess programs. As is so often the case, in retrospect the idea seems completely obvious. However, it took me a while to come to it. I remember that, at one point, I thought that a proof would require induction on the number of processes.
This paper introduced the concepts of safety and liveness as the proper generalizations of partial correctness and termination to concurrent programs. It also introduced the terms “safety” and “liveness” for those classes of properties. I stole those terms from Petri nets, where they have similar but formally very different meanings. (Safety of a Petri net is a particular safety property; liveness of a Petri net is not a liveness property.)
At the same time I was devising my method, Susan Owicki was writing her thesis at Cornell under David Gries and coming up with very much the same ideas. The generalization of Floyd’s method for proving safety properties of concurrent programs became known as the Owicki-Gries method. Owicki and Gries did not do anything comparable to the method for proving liveness in my paper. (Nissim Francez and Amir Pnueli developed a general proof method that did handle liveness properties, but it lacked a nice way of proving invariance properties.) My method had deficiencies that were corrected with the introduction of temporal logic, discussed in .
The Owicki-Gries version of the method for proving safety properties differed from mine in two ways. The significant way was that I made the control state explicit, while they had no way to talk about it directly. Instead, they introduced dummy variables to capture the control state. The insignificant way was that I used a flowchart language while they used an Algol-like language.
The insignificant syntactic difference in the methods turned out to have important ramifications. For writing simple concurrent algorithms, flowcharts are actually better than conventional toy programming languages because they make the atomic actions, and hence the control state, explicit. However, by the mid-70s, flowcharts were passé and structured programming was all the rage, so my paper was forgotten and people read only theirs. The paper was rediscovered about ten years later, and is now generally cited alongside theirs in the mandatory references to previous work.
More important though is that, because they had used a “structured” language, Owicki and Gries thought that they had generalized Hoare’s method. To the extent that Floyd’s and Hoare’s methods are different, it is because Hoare’s method is based on the idea of hierarchical decomposition of proofs. The Owicki-Gries method doesn’t permit this kind of clean hierarchical decomposition. Gries, commenting in 1999, said: “We hardly ever looked at Floyd’s work and simply did everything based on Hoare’s axiomatic theory.” I suspect that, because they weren’t thinking at all about Floyd’s approach, they didn’t notice the difference between the two, and thus they didn’t realize that they were generalizing Floyd’s method and not Hoare’s.
The result of presenting a generalization of Floyd’s method in Hoare’s clothing was to confuse everyone. For a period of about ten years, hardly anyone really understood that the Owicki-Gries method was just a particular way of structuring the proof of a global invariant. I can think of no better illustration of this confusion than the EWD A Personal Summary of the Owicki-Gries Theory that Dijkstra wrote and subsequently published in a book of his favorite EWDs.( Throughout his career, Edsger Dijkstra wrote a series of notes identified by an “EWD” number.) If even someone as smart and generally clear-thinking as Dijkstra could write such a confusing hodge-podge of an explanation, imagine how befuddled others must have been. A true generalization of Hoare’s method to concurrent programs didn’t come until several years later in .
I think it soon became evident that one wanted to talk explicitly about the control state. Susan Owicki obviously agreed, since we introduced the at, in, and after predicates for doing just that in . Quite a bit later, I had more to say about dummy variables versus control state in .
Dummy variables were more than just an ugly hack to avoid control variables. They also allowed you to capture history. Adding history variables makes it possible to introduce behavioral reasoning into an assertional proof. (In the limit, you can add a variable that captures the entire history and clothe a completely behavioral proof in an assertional framework.) What a program does next depends on its current state, not on its history. Therefore, a proof that is based on a history variable doesn’t capture the real reason why a program works. I’ve always found that proofs that don’t use history variables teach you more about the algorithm. (As shown in , history variables may be necessary if the correctness conditions themselves are in terms of history.)
When we developed our methods, Owicki and I and most everyone else thought that the Owicki-Gries method was a great improvement over Ashcroft’s method because it used the program text to decompose the proof. I’ve since come to realize that this was a mistake. It’s better to write a global invariant. Writing the invariant as an annotation allows you to hide some of the explicit dependence of the invariant on the control state. However, even if you’re writing your algorithm as a program, more often than not, writing the invariant as an annotation rather than a single global invariant makes things more complicated. But even worse, an annotation gets you thinking in terms of separate assertions rather than in terms of a single global invariant. And it’s the global invariant that’s important. Ashcroft got it right. Owicki and Gries and I just messed things up. It took me quite a while to figure this out.
Sometime during the ’80s, Jay Misra noticed that the definition of well-foundedness (Definition 8 on page 136) is obviously incorrect.