Established: January 25, 2010

DryadLINQ is a simple, powerful, and elegant programming environment for writing large-scale data parallel applications running on large PC clusters.


The goal of DryadLINQ is to make distributed computing on large compute cluster simple enough for every programmer. DryadLINQ combines two important pieces of Microsoft technology: the Dryad distributed execution engine and the .NET Language Integrated Query (LINQ).

Dryad provides reliable, distributed computing on thousands of servers for large-scale data parallel applications. LINQ enables developers to write and debug their applications in a SQL-like query language, relying on the entire .NET library and using Visual Studio.

DryadLINQ translates LINQ programs into distributed Dryad computations:

  • C# and LINQ data objects become distributed partitioned files.
  • LINQ queries become distributed Dryad jobs.
  • C# methods become code running on the vertices of a Dryad job.

DryadLINQ has the following features:

  • Declarative programming: computations are expressed in a high-level language containing a superset of the best features of SQL, functional programming and .Net.
  • Automatic parallelization: from sequential declarative code the DryadLINQ compiler generates highly parallel query plans spanning large computer clusters. DryadLINQ also exploits multi-core parallelism on each machine.
  • Integration with Visual Studio: programmers in DryadLINQ take advantage of the comprehensive VS set of tools: Intellisense, code refactoring, integrated debugging, build, source code management.
  • Integration with .Net: all .Net libraries, including Visual Basic, and dynamic languages are available.
  • Type safety: distributed computations are statically type-checked.
  • Automatic serialization: data transport mechanisms automatically handle all .Net object types.
  • Job graph optimizations:
    • static: a rich set of term-rewriting query optimization rules is applied to the query plan, optimizing locality and improving performance.
    • dynamic: run-time query plan optimizations automatically adapt the plan taking into account the statistics of the data set processed.
  • Conciseness: the following line of code is a complete implementation of the Map-Reduce computation framework in DryadLINQ:


A commercial implementation of Dryad and DryadLINQ was released in 2011 in beta form under the name Linq to HPC: http://msdn.microsoft.com/en-us/library/hh378101.aspx.





Customers Get Dryad, DryadLINQ

By Douglas Gantenbein, Senior Writer, Microsoft News Center Researchers and businesspeople around the world now have at their disposal a new way to perform massive computations over large quantities of unstructured data more quickly and easily than they’ve ever imagined.…

January 2011

Microsoft Research Blog

Terapixel Project: Lots of Data, Expertise

By Rob Knies, Managing Editor, Microsoft Research How can you achieve the impossible? Easy—as long as you have the right people and the right tools. The Terapixel project from Microsoft Research Redmond is proof positive. The effort—to create the largest,…

July 2010

Microsoft Research Blog

Project Trident: Navigating a Sea of Data

By Rob Knies, Managing Editor, Microsoft Research How deep is the ocean? Geologically, the answer is straightforward: almost seven miles. This we know from a series of surveys, beginning in the 19th century, of the depth of the Mariana Trench,…

July 2009

Microsoft Research Blog

Dryad: Programming the Datacenter

By Rob Knies, Managing Editor, Microsoft Research Concurrent programming is demanding. While part of a program is modifying data, the other parts must be prevented from doing likewise. Manually organizing such tasks is challenging for the most adept experts. People…

October 2008

Microsoft Research Blog