Computer systems increasingly rely on heterogeneity to achieve greater performance, scalability and energy eﬃciency. Because heterogeneous systems typically comprise multiple execution contexts with very diﬀerent programming abstractions and runtimes, programming them remains extremely challenging. Dandelion is a system designed to address this programmability challenge for data-parallel applications. Dandelion provides a uniﬁed programming model for heterogeneous systems that span a diverse array of execution contexts including CPUs, GPUs, FPGAs, and the cloud. It adopts the .NET LINQ (Language INtegrated Query) approach, integrating data-parallel operators into general purpose programming languages such as C# and F# and therefore provides an expressive data model and native language integration for user-deﬁned functions. This enables programmers to write applications using standard high-level languages and development tools, independent of any speciﬁc execution context. Dandelion automatically and transparently distributes the data-parallel portions of a program to the available computing resources, including compute clusters for distributed execution and the CPU and GPU cores of individual compute nodes for parallel execution. To enable the automatic execution of .NET code on GPUs, Dandelion crosscompiles .NET code to CUDA kernels and uses a GPU dataﬂow runtime called EDGE to manage GPU execution. This paper describes the design and implementation of the Dandelion compiler and runtime, focusing on the distributed CPU and GPU implementation. We report on our evaluation of the system using a diverse set of workloads and execution contexts.