Skip to main content

Datasets

The Humana Institute offers various health datasets, containing medical, pharmacy laboratory, and social variables. Access requires specific training and regulatory approval.
  • Partnership with Greater Houston HealthConnect (GHH) 

    Humana Institute faculty are able to request data from the largest health information exchange (HIE) in the region. The GHH network includes over 2,500 venues of care and includes 16+ million unique (i.e. unduplicated) patients who are tracked using a master-file process that links encounters across disparate systems. This process also allows GHH to link HIE records to existing, a priori samples if there are sufficient identifiers available and given appropriate regulatory approvals. A cost is typically associated with completing research data requests.
  • Humana: Diabetes Claims Dataset

    This data is a subset of Medicare Advantage enrollees with Type 2 diabetes and includes over 300 variables detailing demographic, calculated probabilities of health behaviors (diet habits, ability to lose weight, abusing alcohol, etc.) medical data, medical claims, prescription claims, and laboratory testing for each client from 2016-2020. Codebook can be provided upon request. Access is through a secure SAS server so expertise in SAS is required to use this dataset, after all applicable research regulatory processes are completed.
  • National Inpatient Sample + Cost to Charge Files

    Obtained from the Healthcare Cost & Utilization Project, this data is a stratified sample of hospital discharges in US (excluding rehabilitation and long-term acute care hospitals). Variables include ICD-10 codes, patient demographics, hospital characteristics, total charges and payment sources, severity and comorbidity measures, length of stay and discharge status. Each year contains approximately 7 million individual observations.
  • Emergency Department Sample + Cost to Charge Files

    Obtained from the Healthcare Cost & Utilization Project, this is a stratified sample of emergency department visits in US hospitals. Variables include ICD-10 codes, discharge status, total charges and payment sources, patient demographics, and hospital characteristics. The full description of included data elements can be found here. Each year contains 20-30 million individual observations (unweighted).
  • Healthcare Cost Institute Data

    HCCI holds data on over 25 million commercially insured individuals per year (2018–2022). It contains information on costs and utilization pattens, and trends over time. This data primarily comes from major health insurers, offering a comprehensive view of healthcare costs across various sectors, including hospital services, prescription drugs, and physician care. Analysis of this data helps policymakers, researchers, and the public better understand the drivers of healthcare costs and informs strategies to control spending while maintaining access to quality care. There is a cost associated with using this dataset.
     
  • Texas Inpatient Hospital Discharge Data

    This dataset contains information on patients discharged from hospitals in the state of Texas. This dataset typically includes detailed information about each hospital stay, such as patient demographics, diagnoses, procedures performed, length of stay, hospital charges, and payer information.
Please email humana-institute@uh.edu with questions about using any of these resources.

Research Highlights