Inspecting Restaurant Inspections – Using Machine Learning to Improve Sanitation in NYC
Program: Data Science Master's Degree
Location: Not Specified (remote)
Student: Sean Murphy
The NYC Department of Health and Mental Hygiene (DOHMH) is responsible for conducting sanitation inspections of the nearly 27,000 restaurants in NYC’s boroughs annually. Inspections can result in grades of A, B, or C, depending on the number of violations found. Grades of B or C indicate more violations, and such inspections can result in follow-ups such as reinspections, forced closures, extra correspondence, and adjudication of contested grades. Moreover, poor restaurant sanitation can lead to an increased incidence of foodborne illness and food poisoning for customers. This research investigates the use of machine learning to predict future inspection grades from past inspection data, with the objective of achieving a high recall for at-risk restaurants. With reliable predictions, the DOHMH can proactively mitigate sanitation risk through targeted advertising, support, education, and reminders for otherwise high-risk eateries. Logistic regression, random forest, and gradient boosting models were chosen for modeling due to their high interpretability. After fitting on past inspection data, none of the models proved satisfactory. Recall and F1 score were highest for the gradient boosting model, but ultimately unsatisfactory. None of the models exhibited improvement in accuracy over naïve prediction of the majority class. The difficulties in modeling can be attributed primarily to two factors: a lack of comprehensive data collection over time, and the low signal for inspection grades present among the predictor set. Future modeling attempts should overcome both of these challenges in order to achieve significant lift and satisfactory performance.