Opportunities and Challenges for Analyzing Cancer Data at the Inter- and Intra-Institutional Levels

  • Julie Wu ,
  • Jordan Bryan ,
  • Samuel M. Rubinstein ,
  • Lucy Wang ,
  • Michele Lenoue-Newton ,
  • Raed Zuhour ,
  • Mia Levy ,
  • Christine Micheel ,
  • Yaomin Xu ,
  • Suresh K. Bhavnani ,
  • ,
  • Jeremy L. Warner

JCO Precision Oncology |

PURPOSE
Our goal was to identify the opportunities and challenges in analyzing data from the American Association of Cancer Research Project Genomics Evidence Neoplasia Information Exchange (GENIE), a multi-institutional database derived from clinically driven genomic testing, at both the inter- and the intra-institutional level. Inter-institutionally, we identified genotypic differences between primary and metastatic tumors across the 3 most represented cancers in GENIE. Intra-institutionally, we analyzed the clinical characteristics of the Vanderbilt-Ingram Cancer Center (VICC) subset of GENIE to inform the interpretation of GENIE as a whole.

METHODS
We performed overall cohort matching on the basis of age, ethnicity, and sex of 13,208 patients stratified by cancer type (breast, colon, or lung) and sample site (primary or metastatic). We then determined whether detected variants, at the gene level, were associated with primary or metastatic tumors. We extracted clinical data for the VICC subset from VICC’s clinical data warehouse. Treatment exposures were mapped to a 13-class schema derived from the HemOnc ontology.

RESULTS
Across 756 genes, there were significant differences in all cancer types. In breast cancer, ESR1 variants were over-represented in metastatic samples (odds ratio, 5.91; q < 10−6). TP53 mutations were over-represented in metastatic samples across all cancers. VICC had a significantly different cancer type distribution than that of GENIE but patients were well matched with respect to age, sex, and sample type. Treatment data from VICC was used for a bipartite network analysis, demonstrating clusters with a mix of histologies and others being more histology specific.

CONCLUSION
This article demonstrates the feasibility of deriving meaningful insights from GENIE at the inter- and intra-institutional level and illuminates the opportunities and challenges of the data GENIE contains. The results should help guide future development of GENIE, with the goal of fully realizing its potential for accelerating precision medicine.