Capstone Projects

Assisting Health Plans Predict Payments by Drug Manufacturers to Physicians through Machine Learning Methods

Program: Data Science Master's
Location: Tulsa, Oklahoma (remote)
Student: Cullen Hogan

The objective of this project was to predict payments that were made to physicians by pharmaceutical and medical device companies.  This was accomplished by utilizing the CMS Open Payments database combined with the Medicare Part D prescription drug database.  Recent studies into the Open Payments database have helped to identify fraud, waste, and abuse (FWA) from physicians who receive extraordinary amounts of payments from drug and device companies.  Building a predictive model for these types of payments could be used by health plan investigators and audit teams to improve efficiencies in reducing provider related FWA.

This project specifically focused on physicians in Oklahoma and Arkansas, due to the size of the Open Payments and Part D drug data.  Part D drug data features such as total day’s supply, number of beneficiary’s’, total drug costs, and provider specialty were summarized for every physician who received payments in the Open Payments database.  Multiple machine learning methods were tested such as Linear Regression, Decision Trees, Random Forests, and the AutoML Tree-based Pipeline Optimization Tool (TPOT) method.  The TPOT method yielded the most accurate model, the Extra Trees Regressor, for predicting payments made to physicians with an R-squared value of 62.9%.  Future steps for this project would be to leverage internal claims databases to improve results, this would add additional medical related claims for each provider which cannot be obtained through the Part D drug data.  This model is ready to use and provides good direction for health plans looking to prioritize audits and investigations.