Why Azure Data Factory Version 2?
Kajal Mukherjee, Cloud Solution Architect
Azure Data Factory (ADF) is a Microsoft Azure PaaS solution for data transformation and load. ADF supports data movement between many on premises and cloud data sources. The supported platform list is elaborate, and includes both Microsoft and other vendor platforms. ADF is a very powerful tool providing complete flexibility for movement of structured and unstructured data sets, including RDBMS, XML, JSON, and various NoSQL data stores. The core strength of ADF is the flexibility of using USQL or HiveQL.
Many Microsoft customers have been using SQL Server Integration Services (SSIS) for their data movement needs primarily involving SQL Server databases for many years. SSIS has been in existence for a long time. The integration of SSIS and ADF has been a key customer requirement for migrating to the PaaS platform for ETL without the need to rewrite the entire data transformation logic across the enterprise.
The recent release of Azure Data Factory – Azure Data Factory Version 2 (ADF v2) – has taken a major step towards meeting this requirement. SSIS packages can now be integrated with ADF and can be scheduled/orchestrated using ADF v2. The SSIS package execution capability makes all fine-grained transformation capabilities and SSIS connectors available from within ADF. Customers can utilize existing ETL assets while expanding ETL capabilities with the ADF platform.
ADF v2 allows SSIS packages to be moved to cloud using Integration Runtime (IR) to execute, manage, monitor, and deploy these packages to Azure. IR allows for three different scenarios: Azure (pure PaaS with endpoints), self-hosted (within a private network), and Azure-SSIS (combination of the two).
The capability of SSIS package integration with ADF has led to expansion of a core feature of the ADF platform. There is now a separate Control Flow in the ADF platform. The activities are now broken into Data Transformation activities and Control Flow activities which is similar to the SSIS platform.
In addition to the SSIS integration, ADF v2 has also expanded its functionality on a few other fronts. It supports an extended library of expressions and functions that can be used in the JSON string value. Data pipeline monitoring is available using OMS tools in addition to the Azure portal. This is a big step towards meeting requirements of customers with established OMS tools for any data movement activity.
There has also been a change in job scheduling in ADF v2. In the prior version, jobs were scheduled based on time slices. This feature has been expanded in ADF v2. Jobs can be scheduled based on triggering events, such as the completion of a data refresh in the source datastore.
You can find more information about ADF v2, its key features and how it is different from ADF v1 in the articles below.
ADF v2 is a significant step forward for the Microsoft data integration PaaS offering. There are many opportunities for Microsoft partners to build services for integrating customer data using ADF v2 or upgrading existing customer ETL operations built on SSIS to the ADF v2 PaaS platform without rebuilding everything from scratch.
Find the list of supported platforms for ADF v2 here. There is a series of Quickstart Tutorials for various real-life data integration scenarios using ADF v2 for a hands-on experience.