Established: December 16, 2004

The Dryad Project is investigating programming models for writing parallel and distributed programs to scale from a small cluster to a large data-center.


New! Dryad and DryadLINQ are now available in source form at the Dryad GitHub repository, with pre-built binaries available from NuGet.org. For release documentation see our Getting Started with DryadLINQ page.

Most of the information below is historical and will be updated over time and migrated to the DryadLINQ documentation site.

Dryad is an infrastructure which allows a programmer to use the resources of a computer cluster or a data center for running data-parallel programs. A Dryad programmer can use thousands of machines, each of them with multiple processors or cores, without knowing anything about concurrent programming.

The Structure of Dryad Jobs

A Dryad programmer writes several sequential programs and connects them using one-way channels. The computation is structured as a directed graph: programs are graph vertices, while the channels are graph edges. A Dryad job is a graph generator which can synthesize any directed acyclic graph. These graphs can even change during execution, in response to important events in the computation.

Dryad is quite expressive. It completely subsumes other computation frameworks, such as Google’s map-reduce, or the relational algebra. Moreover, Dryad handles job creation and management, resource management, job monitoring and visualization, fault tolerance, re-execution, scheduling, and accounting.

The Dryad Software Stack

As a proof of Dryad’s versatility, a rich software ecosystem has been built on top Dryad:

  • SSIS on Dryad executes many instances of SQL server, each in a separate Dryad vertex, taking advantage of Dryad’s fault tolerance and scheduling. This system is currently deployed in a live production system as part of one of Microsoft’s AdCenter log processing pipelines.
  • DryadLINQ generates Dryad computations from the LINQ Language-Integrated Query extensions to C#.
  • The distributed shell is a generalization of the pipe concept from the Unix shell in three ways. If Unix pipes allow the construction of one-dimensional (1-D) process structures, the distributed shell allows the programmer to build 2-D structures in a scripting language. The distributed shell generalizes Unix pipes in three ways:
    1. It allows processes to easily connect multiple file descriptors of each process — hence the 2-D aspect.
    2. It allows the construction of pipes spanning multiple machines, across a cluster.
    3. It virtualizes the pipelines, allowing the execution of pipelines with many more processes than available machines, by time-multiplexing processors and buffering results.
  • Several languages are compiled to distributed shell processes. PSQL is an early version, recently replaced with Scope.





Customers Get Dryad, DryadLINQ

By Douglas Gantenbein, Senior Writer, Microsoft News Center Researchers and businesspeople around the world now have at their disposal a new way to perform massive computations over large quantities of unstructured data more quickly and easily than they’ve ever imagined.…

January 2011

Microsoft Research Blog

Project Trident: Navigating a Sea of Data

By Rob Knies, Managing Editor, Microsoft Research How deep is the ocean? Geologically, the answer is straightforward: almost seven miles. This we know from a series of surveys, beginning in the 19th century, of the depth of the Mariana Trench,…

July 2009

Microsoft Research Blog

Dryad: Programming the Datacenter

By Rob Knies, Managing Editor, Microsoft Research Concurrent programming is demanding. While part of a program is modifying data, the other parts must be prevented from doing likewise. Manually organizing such tasks is challenging for the most adept experts. People…

October 2008

Microsoft Research Blog