Indexing HDFS data in PDW: splitting the data from the index

  • Vinitha Reddy Gankidi
  • Nikhil Teletia
  • Jignesh M. Patel
  • Alan Halverson
  • David J. DeWitt

Very Large Data Bases |

Published by VLDB Endowment

Download PDF | View Publication | View Publication

There is a growing interest in making relational DBMSs work synergistically with MapReduce systems. However, there are interesting technical challenges associated with figuring out the right balance between the use and co-deployment of these systems. This paper focuses on one specific aspect of this balance, namely how to leverage the superior indexing and query processing power of a relational DBMS for data that is often more cost-effectively stored in Hadoop/HDFS. We present a method to use conventional B+-tree indices in an RDBMS for data stored in HDFS and demonstrate that our approach is especially effective for highly selective queries.