Comega

Established: April 8, 2004

Cω Overview

This section discusses the motivation behind Cω and provides a fairly detailed, yet high-level, account of the main features of the language.

Important: Cω is an experimental research language. There are no plans to turn it into a commercial language supported by Microsoft. It is not supported by either the C# or the Visual Studio teams. There are no plans to integrate it into any product.

Introduction

In the last decade, strongly-typed, garbage collected object-oriented languages have left the laboratory and become mainstream industrial tools. This has led to improvements in programmer productivity and software reliability, but some common tasks are still harder than they should be. Two of the most critical for the development of the next generation of loosely-coupled, networked applications and web services are concurrent programming and the processing of relational and semi-structured data.

It is a truth universally acknowledged that concurrent programming is significantly more difficult than sequential programming. It used to be the case that only highly skilled developers, working on such projects as operating system kernels or the core engines of databases, did non-trivial concurrent programming. Over time, however, more developers have had to deal with more concurrency. This began to happen with the arrival of multi-tasking GUIs and multi-user server-side applications, and the initial approach was to encapsulate the concurrency somewhere where the typical application programmer wouldn’t have to deal with it. GUI frameworks are typically single-threaded despite the natural concurrency of graphical interfaces, whilst a common approach to concurrency control in server applications is to delegate it to the transactional mechanisms of a database. However, we believe the arrival of network-centric computing has exposed these palliatives as inadequate.

Even the simplest application now has to simultaneously manage both a GUI and network communications, and locking up a single threaded user interface whilst performing a lengthy remote operation is unacceptable. As applications perform more remote communications with more remote services, we are increasingly forced to replace synchronous RPC with asynchronous (one-way) messaging. But asynchronous programming is also difficult — incoming messages arrive at unpredictable times and in unpredictable orders, and one naturally ends up needing multiple threads to handle them.

Unfortunately, support for concurrent programming, especially asynchrony, in current mainstream programming languages is weak. Shared memory concurrency on a single machine, even in modern languages like C# and Java, is handled by 1970s threads and locks model which is implemented entirely in terms of library routines. Distributed concurrency and asynchronous messaging are handled by different library routines with their own model; the .NET libraries, for example, include a fairly complex delegate-based API for asynchronous messaging that still offers little support for the hard problem of handling incoming messages.

The current situation with regard to external data is just as bad. Most web-based (and many non-web-based) applications are basically thin interface-generating layers over relational databases (the `3-tier’ model). Yet dealing with relational data from within current languages is messy, error-prone and dangerous. APIs for database access typically construct SQL queries as strings and return results as untyped collection objects which are deconstructed imperatively. This makes code lengthy and unreadable, negates much of the benefit of working in a language with strong static safety guarantees and loses much of the advantage of the underlying declarative query language. Worse still, it is insecure: when query strings are derived from user input, preventing script-injection attacks requires very careful coding.

The other form of external data with which application increasingly have to deal is tree-shaped semistructured data, such as XML documents. Here again, support for processing such data in current languages is weak. One either uses an untyped DOM-like representation, giving up static guarantees about conformance to a particular DTD or Schema, or uses an external tool to produce a more strongly typed mapping into the language’s type system, with an accompanying loss of fidelity, efficiency and genericity in queries. Even in the DOM case, in which one can more easily support a reusable library of query methods, constructing queries is still significantly more complex than it is in a more domain-specific language such as XQuery. On the other hand, interfaces from general purpose languages to external XML processing languages are often string-based, with all the same impedance mismatches and security holes as in the SQL case.

Methodology

The design of Cω is based on three principles:

Asynchronous concurrency and processing of relational and semistructured data are sufficiently important that they should be directly supported in a modern general purpose programming language. The advantages of direct linguistic support include:

Stronger compile-time guarantees.

Intentions and invariants are more apparent in the code. They become part of the interface rather than being buried in the dynamic flow of control into mysterious library routines.

The compiler has more information and so has the freedom to choose different implementation strategies (e.g. performing query optimizations).

More natural syntax.

Better support from other tools such as editors and debuggers.

We should extend an already-popular language, rather than design a new one from scratch.

The extensions should be principled. The aim is to take models and lessons learnt from the design of more academic, special purpose languages and try to incorporate them within the mainstream object-oriented framework. In the case of concurrency, we took ideas from a theoretical model called the join calculus and a join-based concurrent functional language called JoCaml. In the case of our data extensions, many of the underlying ideas come from functional programming.

Cω Concurrency – The basic idea

In Cω, methods can be defined as either synchronous or asynchronous. When a synchronous method is called, the caller is blocked until the method returns, as is normal in C#. However, when an asynchronous method is called, there is no result and the caller proceeds immediately without being blocked. Thus from the caller’s point of view, an asynchronous method is like a void one, but with the useful extra guarantee of returning immediately. We often refer to asynchronous methods as messages, as they are a one-way communication from caller to receiver (think of posting a letter rather as opposed to asking a question and waiting for an answer during a face-to-face conversation).

By themselves, asynchronous method declarations are not particularly novel. Indeed, .NET already has a widely-used set of library classes which allow any method to be invoked asynchronously (though note that in this standard pattern it is the caller who decides to invoke a method asynchronously, whereas in Cω it is the callee (defining) side which declares a particular method to be asynchronous). The significant innovation in Cω is the way in which method bodies are defined.

In most languages, including C#, methods in the signature of a class are in bijective correspondence with the code of their implementations — for each method which is declared, there is a single, distinct definition of what happens when that method is called. In Cω, however, a body may be associated with a set of (synchronous and/or asynchronous) methods. We call such a definition a chord, and a particular method may appear in the header of several chords. The body of a chord can only execute once all the methods in its header have been called. Thus, when a method is called there may be zero, one, or more chords which are enabled:

If no chord is enabled then the method invocation is queued up. If the method is asynchronous, then this simply involves adding the arguments (the contents of the message) to a queue. If the method is synchronous, then the calling thread is blocked.

If there is a single enabled chord, then the arguments of the calls involved in the match are de-queued, any blocked thread involved in the match is awakened, and the body runs.

When a chord which involves only asynchronous methods runs, then it does so in a new thread.

If there are several chords which are enabled then an unspecified one of them is chosen to run.

Similarly, if there are multiple calls to a particular method queued up, we do not specify which call will be de-queued when there is a match.

Conclusion

Cω is an experimental programming language intended to make it easier to write the data-intensive distributed applications being written today. To do so we need to provide application writers with better support for data and control. Cω contains elegant primitives for asynchronous communication, and offers a strongly-typed integration of the object, relational and semi-structured data models.