PolyBase

PolyBase is a fundamental breakthrough in data processing used in SQL Server 2012 Parallel Data Warehouse to enable truly integrated query across Hadoop and relational data.

Complementing Microsoft’s overall Big Data strategy, PolyBase is a breakthrough new technology on the data processing engine in SQL Server 2012 Parallel Data Warehouse designed as the simplest way to combine non-relational data and traditional relational data in your analysis. While customers would normally burden IT to pre-populate the warehouse with Hadoop data or undergo an extensive training on MapReduce in order to query non-relational data, PolyBase does this all seamlessly giving you the benefits of “Big Data” without the complexities.

Key Capabilities

  • Unifies Relational and Non-relational Data

    PolyBase is one of the most exciting technologies to emerge in recent times because it unifies the relational and non-relational worlds at the query level. Instead of learning a new query like MapReduce, customers can leverage what they already know (T-SQL)

    • Integrated Query: Accepts a standard T-SQL query that joins tables containing a relational source with tables in a Hadoop cluster without needing to learn MapReduce.
    • Advanced query options: Apart from simple SELECT queries, users can perform JOINs and GROUP BYs on data in the Hadoop cluster.
  • Enables In-place Queries with Familiar BI Tools

    Microsoft Business Intelligence (BI) integration enables users to connect to PDW with familiar tools such as Microsoft Excel, to create compelling visualizations and make key business decisions from structured or unstructured data quickly.

    • Integrated BI tools: End users can connect to both relational or Hadoop data with Excel abstracting the complexities of both.
    • Interactive visualizations: Explore data residing in HDFS using Power View for immersive interactivity and visualizations.
    • Query in-place: IT doesn’t have to pre-load or pre-move data from Hadoop into the data warehouse and pre-join the data before end users do the analysis.
  • Part of an Overall Microsoft Big Data Story

    PolyBase is part of an overall Microsoft “Big Data” solution that already includes HDInsights (a 100% Apache Hadoop compatible distribution for Windows Server and Windows Azure), Microsoft Business Intelligence, and SQL Server 2012 Parallel Data Warehouse.

    • Integrated with HDInsights: PolyBase can source the non-relational analysis from Microsoft’s 100% Apache compatible Hadoop distribution, HD Insights.
    • Built into PDW: PolyBase is built into SQL Server 2012 Parallel Data Warehouse to bring “Big Data” benefits within the power of a traditional data warehouse.
    • Integrated BI tools: PolyBase has native integration with familiar BI tools like Excel (through Power View and PowerPivot).

Microsoft’s research team under Dr. David DeWitt’s leadership talks about PolyBase and their vision for its future.

PolyBase was birthed in Microsoft’s Gray Systems Lab in cooperation with the University of Wisconsin. Learn about this unique research facility which is breaking new ground under Dr. David DeWitt’s leadership.