This article walks through the development of a technique for running Spark jobs in parallel on Azure Databricks. The technique enabled us to reduce the processing times for JetBlue's reporting threefold while keeping the business logic implementation straight forward. The technique can be re-used for any notebooks-based Spark workload on Azure Databricks.
This code story describes a collaboration with ZenCity around detecting trending topics at scale. We discuss the datasets, data preparation, models used and the deployment story for this scenario.
This code story describes CSE's work with ZenCity to create a data pipeline on Azure Databricks supported by a CI/CD pipeline on TravisCI. The aim of the collaboration was to create a pipeline capable of processing a stream of social posts, analyzing them, and identifying trends.
A scalable unsupervised approach for driver safety estimation on Pointer Telocation's dataset
We achieved zero-downtime reconfiguration and management of the Spark Streaming job used in Project Fortis with Azure Service Bus.