Using Machine Learning to Predict Changes in Client Health
Program: Data Science Master's Degree
Location: California (onsite)
Student: Franck Reyherme
This project analyzes the use of machine learning algorithms to predict changes in the health of clients using Illuminate Education’s flagship product, its Data & Assessment (DnA) application. The primary objectives are to (1) develop a model that reliably predicts changes in client health, and (2) identify leading indicators for changes in client health. The ability to anticipate changes in client health and understand the driving factors for these changes represents tremendous business value for Illuminate Education. A model that reliably predicts changes in client health can help the company reduce churn, increase customer satisfaction, and provide high-quality customer service at scale. To this end, extensive customer behavior and demographic data is collected from over 800 different databases across over 40 servers and multiple API’s. The data is cleaned and prepared for modeling. Metric cohort analysis is conducted on over 20 customer behaviors to analyze its relationship to client health. The results show that the metrics most closely associated with the engagement of district-level administrators are the strongest leading indicators of client health. Based on a review of relevant academic literature, the study focuses on two popular machine learning algorithms: Weighted Logistic Regression and Cost-Sensitive Support Vector Machines. Hundreds of models are evaluated using 10-fold cross-validation and the F1 score as the primary evaluation metric. The study shows that the Support Vector Machine algorithm significantly outperforms Logistic Regression and produces a high performing predictive model that reliably forecasts changes in the health of DnA clients.