Medicaid Data for Research

Health Analytics for Medicaid Claims Data

The Health Analytics group at Georgia Tech conducts research in data science to improve decision making in health care delivery and public health. Georgia Tech (with support from IPaT, ISyE, and Children’s Healthcare of Atlanta) purchased Medicaid claims data for multiple states and years to be used for health analytics research. Additional data sets have also been obtained for research. To learn more about the data set and how it has and may be used for research, please see Q&A document here. For questions or more information, please contact Richard Starr.

The document answers these questions:

  1. What kind of Medicaid data do you have?
  2. What populations, states, and years are available?
  3. How can researchers access the data?
  4. How can scientists from Children’s Healthcare of Atlanta, Emory, or other institutions use the data for research?
  5. Does doing research with this data cost money?
  6. Can other data be combined with the MAX Medicaid claims?
  7. What research topics can be studied?
  8. Can other research questions be studied?
  9. Does the GT Health Analytics group use other data for research?
  10. I am working with someone else at Georgia Tech; do they have access to the data?
  11. What else do I need to know?


Health Analytics for Medicaid Claims Data

We have the Medicaid Analytic eXtract (MAX) files that are person-level, Research-Identifiable-Files. The five file types are listed below. A National Provider ID (NPI) and Characteristics file is available for 2009 and later years, which provides additional information on providers. The files include claims paid for patients under managed care organizations and fee-for-service plans. More information is available from CMS.  

1. Personal Summary: patients, demographics, birthdate, etc.
2. Inpatient: claims, diagnoses, procedures, LOS, payment
3. Other Therapy: claims for physician, lab, clinic, outpatient
4. Long Term Care: facility type, date of service, etc.
5. Prescription Drug: paid drug claims

The initial Data Use Agreement and approved research protocol covers all Medicaid beneficiaries who are children or pregnant women. The initial data obtained covers years 2005 – 2009 inclusive for 14 states including those in the southeast (Georgia, Alabama, Arkansas, Louisiana, Mississippi, N. Carolina, S. Carolina, Tennessee, Texas) and comparison areas (California, Minnesota, New York, Pennsylvania). 

Georgia Tech has initiated the process for purchasing additional data and with expanded populations. This data request covers all 50 states, years 2010 and 2011, and all beneficiaries under Medicaid. Additional years will be purchased when they are available. The request is to purchase four file types including Personal Summary, Inpatient, Other Therapy, and Prescription Drug. 

The MAX claims data includes confidential information and is protected by a Data Use Agreement between Georgia Tech and CMS. The existing Data Management Plan only allows direct access to the raw data and non-aggregated data by employees and students of Georgia Tech who have also undergone appropriate training and IRB approval. 

Data can be extracted and aggregated on-site at Georgia Tech, and data with a sufficient level of aggregation can be provided to collaborators. At a minimum, the shared data must contain at least 11 entries in each cell and in the numerator and denominator of ratios. Data is also reviewed at GT before release to ensure identification of patients or providers across multiple data sets is not possible.