Near-real-time analytics for transactional workloads
Part of the SQL Server 2022 blog series.
Traditionally, data to serve analytical systems have been extracted from operational data stores using custom-built extract, transform, and load (ETL) processes. These processes are often long-running, exert pressure on the source systems, and only run periodically in batch mode. While this kind of latency and overhead may be acceptable for some workloads, more and more companies are finding themselves in a place where they need to do analytics over operational data closer to real-time—something that traditional ETL systems cannot support.
Enter Azure Synapse Link for SQL.
Azure Synapse Link for SQL provides an automated way to extract data from source operational systems without having to build custom ETL processes. Some of the benefits of Azure Synapse Link for SQL are:
- Low code/no code solution: With Azure Synapse Link for SQL, you don’t need to build custom processes to extract the data and load it into an analytical system. You choose the tables that you want to replicate, specify how you want them stored in the target Azure Synapse Analytics dedicated SQL pool, and Azure Synapse Link for SQL takes care of the rest.
- Minimal impact on the source systems: We have strived to minimize the impact of data extraction from the source system. Where a traditional ETL process will run queries against the source tables, which can get expensive, Azure Synapse Link for SQL uses the new change feed functionality built into SQL Server 2022 and Azure SQL Database to get the data without having to run custom queries.
- Near-real-time data movement: Data is continually moved from the source systems into the Azure Synapse Analytics environment. Optionally, you can switch to “scheduled mode” if you don’t need near-real-time data movement.
How does it work?
Azure Synapse Link for SQL is powered by the new change feed functionality that has been added to SQL Server 2022 and Azure SQL Database. This functionality allows us to monitor tables for changes as they happen without the additional overhead that is brought along by a change data capture (CDC)–based data movement solution.
When a transaction is committed on a table that is being replicated by Azure Synapse Link for SQL, that transaction is written into a “landing zone,” which is a Gen2 Azure Data Lake storage (ADLS) account. From there, an ingestion service picks up the data and loads it into an Azure Synapse Analytics dedicated SQL pool. Once the data lands there, you can query the data like any other dedicated SQL pool.
Who will benefit?
Here are some examples of scenarios that would benefit from Azure Synapse Link for SQL:
- Database consolidation: Azure Synapse Link for SQL allows you to bring data from multiple source databases together into a single dedicated SQL pool for analytics. Whether you have multiple tenant databases that you want to use for market-based analytics, or you have grown by acquisition and have multiple source systems to bring together for analytics, Azure Synapse Link for SQL can bring all of that data together into a unified analytical platform.
- Hybrid on-premises/cloud: Since Azure Synapse Link for SQL supports both Azure SQL Database and SQL Server 2022, you can bring data into a common analytical system from wherever it lives.
- Near-real-time extension: If you have an ETL system that meets most of your needs but have a few tables where you want data to arrive closer to real-time, you could use Azure Synapse Link for SQL to transfer those tables from the source systems into the Azure Synapse Analytics dedicated SQL pool alongside the data that is processed in your nightly ETL system, and perform reporting an analytics tasks over all of the data.