Data & Analytics Partners: Giving Extract, Transform, and Load a new paradigm

By Kajal Mukherjee, Cloud Solutions Architect and Rishi Arora, Cloud Solutions Architect

Integration of data from varied sources is a requirement for enterprise analytics. The focus of business users is far more than internal data sources. Structured and unstructured data from various internal and external sources are requirements for decision making. Extract, transform, and load (ETL) tools have long been used to integrate data from data sources and transform them into a proper, usable format. These tools are powerful for movement of structured or semi-structured data in batch format, but they fall short when working with unstructured data and real-time data movements.

With the growth of streaming data from data sources such as Internet of Things (IoT) and weblogs, the need arises to transform big data volumes in short order. That’s when traditional ETL tools can fall short and data orchestration tools like Azure Data Factory thrive by sourcing the transformation of big data telemetry volumes to compute engines such as HDInsight, Azure Data Lake Analytics,  and SQL Data Warehouse. Even our favorite ETL tool, SQL Server Integration Services (SSIS), can’t transform billions of records as efficiently as Azure Data Lake Analytics or HDInsight can.

On last month’s Data Platform & Advanced Analytics Partner Community call, we discussed Azure Data Factory. We’re continuing the conversation this month.

Data movement service in the cloud

Think about having the flexibility of spinning up an HDInsight cluster just to perform a transformation and then spinning it down as soon as the job completes. That’s all possible today within an Azure Data Factory pipeline. Data Factory is a globally deployed data orchestration service in the cloud. It can reliably move data from internal and external sources to a common platform for enterprise analytics and decision support systems. Data Factory supports movement of both structured and unstructured data located on-premises and in cloud data sources and targets, making it easier to integrate varied data types originating from diverse sources like traditional databases, machine data sources, and web sources.


Data Factory is simple in design. It uses powerful U-SQL and Hive Query Language for data transformation, in addition to standard T-SQL stored procedure. It can scale both horizontally and vertically to support larger data loads. It can tap into the powerful analytics engine called Azure Machine Learning and integrate it as part of its pipeline to understand data patterns and even identify anomalies. Today, Data Factory pipelines are typically authored using JSON-based templates where data sources, compute scripts, and targets can be plugged in for custom deployments. However, available in preview mode is the newly developed Data Integration App which offers a drag-and-drop GUI-style interface for developing your Data Factory pipelines. More enhancements are coming soon to Data Factory via the Azure Portal.

Partner opportunity

Data movement is important to the success of many customers, and Data Factory makes it attractive for Microsoft partners to provide services to support customer needs where traditional extract, transform, and load tools fall short. Data Factory is not tied to any specific data source, making it attractive for use with data sources from a diverse set of vendors. Data movement projects are often enterprise initiatives and can lead to additional opportunities with other data and analytics efforts.

Join the community call on Tuesday, March 21

Join us for the next Data Platform and Advanced Analytics Partner Community call for a discussion about this topic.

Sign up for the March 21 partner call

Data Platform, Intelligence, and Analytics Partner Community

data-analytics-call-mar-2017     Data and analytics Yammer group     data-analytics-playbook