3730 - Building a Comprehensive Cancer Database across Fourteen Treatment Facilities in a Large Health Care System
Presenter(s)
L. Tchelebi1, W. D. Lindsay2, K. Yee2, J. Wishinsky1, M. E. Labarca1, and L. Potters1; 1Northwell, New Hyde Park, NY, 2Oncora Medical, Inc, Philadelphia, PA
Purpose/Objective(s): Healthcare systems typically utilize several disparate electronic medical records (EMRs) across various disciplines in both inpatient and outpatient settings throughout their network. Consolidating patient data from these varied EMRs into one unified database has the potential to significantly improve patient care. The goal of this study was to build and validate a unified cancer database across a 14-hospital healthcare system to support advanced analytics, quality improvement initiatives, and multi-center oncology research.
Materials/Methods: Multiple data sources were integrated, including inpatient and outpatient electronic health records, oncology information systems from multiple vendors, and cancer registry data from the individual cancer registries of each of the 14 hospital facilities during the time period 2014-2024. Data extracted encompassed medications, laboratory and imaging results, surgical procedures, diagnoses, genomic data, and structured data from the cancer registry, including stage, site specific data elements, and survival. Data standardization was performed via the the data extraction platform. Oncology information system integrations provided granular treatment details across academic, teaching, and community hospitals.
Results: The integrated database consolidated over 100,000 registry-confirmed cancer cases across 14 facilities, totaling 212,436 patient records. Key achievements include: implementation of longitudinal survival tracking through registry-sourced vital status data, integration of comprehensive treatment records from multiple systems across radiation, surgery, and medical oncology, incorporation of genomic profiles from internal and external laboratories, successful data integration from all facilities, ranging from 73,299 records to 2,089 records. Analysis identified 316 distinct primary sites, with prostate adenocarcinoma (15,412 cases), invasive ductal carcinoma of the breast (11,795 cases), non-small cell lung cancer (11,697 cases), cutaneous squamous cell carcinoma (10,864 cases), and adenocarcinoma of the colon (10,067 cases) representing the most prevalent malignancies.
Conclusion: This study demonstrates the successful development and implementation of a methodology for integrating heterogeneous oncology data into a unified, reliable database. The implemented data provenance approach ensures confidence in merged records while enabling cross-hospital research and quality monitoring. Future research will focus on expanding genomic data integration and developing predictive analytics capabilities for personalized cancer care.