Easier management of PolyBase and relational data through SQL Server 2019
The days when a database administrator (DBA) could specialize solely in a single database technology are rapidly ending. Today, we’re much more likely than ever before to be asked to bring together many types of data from diverse sources. Although specialization still has its place, having the knowledge and tools at our disposal to cross those boundaries makes us much more useful.
That’s one reason to get excited about the continued expansion of the PolyBase technology introduced in SQL Server 2016, which has become much more powerful in the release of SQL Server 2019.
Before PolyBase, when trying to use external data sources from SQL Server, you either had to transfer data from one source to another or query both sources and then write custom logic to joining and integrate the data at the client level. PolyBase simplifies the process of reading from external data sources. It does so by enabling your SQL Server instance to process Transact-SQL (T-SQL) queries that access both external data and relational data inside the instance.
Initially, PolyBase targeted Apache Hadoop and Azure Blob Storage. The ability to target big data inside Hadoop nodes expanded the ability to do modern analytics seamlessly from a SQL Server platform. No additional software needs to be installed in the Hadoop environment, and you don’t even need to know Hadoop intimately to query it. PolyBase even pushed some computations to the Hadoop node to optimize queries.
What’s awesome about PolyBase in 2019 is that Microsoft has added connectors that allow you to query a much wider range of external data. PolyBase now works with Oracle, Teradata, MongoDB, and any database that conforms to Open Database Connectivity (ODBC) standards. You can query the data where it lives without having to bring it into SQL Server. Best of all, it provides distributed, scalable query performance.
PolyBase can also simplify things when you do have to move data from one place to another. For example, you can bring data from Hadoop, Azure Blob Storage, or Azure Data Lake Store into relational tables. This allows you to take advantage of the fast analysis and columnstore capabilities of SQL Server without the need for a separate extract, transform, load (ETL) or import tool. On the other hand, it’s also easy to export data from SQL Server to cost-effective online storage. Features like PolyBase make it much easier for database administrators to help organizations get maximum value from all their data, while optimizing the ratio of cost to performance.
To learn more about what you can do with Microsoft SQL Server 2019, check out the free Packt guide Introducing Microsoft SQL Server 2019. If you’re ready to jump to a fully managed cloud solution, check out the Essential Guide to Data in the Cloud.