Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving | Spark Summit Europe 2018

We present Spark Serving, a new spark computing mode that enables users to deploy any Spark computation as a sub-millisecond latency web service backed by any Spark Cluster. Attendees will explore the architecture of Spark Serving and discover how to deploy services on a variety of cluster types like Azure Databricks, Kubernetes, and Spark Standalone. We will also demonstrate its simple yet powerful API for RESTful SparkSQL, SparkML, and Deep Network deployment with the same API as batch and streaming workloads. In addition, we will explore the “dual architecture”: HTTP on Spark. This architecture converts any spark cluster into a distributed web client with the familiar and pipelinable SparkML API. These two contributions provide the fundamental spark communication primitives to integrate and deploy any computation framework into the Spark Ecosystem. We will explore how Microsoft has used this work to leverage Spark as a fault-tolerant microservice orchestration engine in addition to an ETL and ML platform. And will walk through two examples drawn from Microsoft’s ongoing work on Cognitive Service composition, and unsupervised object detection for Snow Leopard recognition.

Databricks provides a unified data analytics platform (opens in new tab), powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.

Download the report