Predictive Modeling for Ozone in the Colorado Springs Area: Application of Machine Learning Techniques
Program: Data Science Master's Degree
Location: Colorado Springs, Colorado (remote)
Student: Paul Michael Trygstad
This project created predictive models for ground-level ozone in Colorado Springs, CO. Ground level ozone is dangerous to health and noncompliance with EPA regulations can impact the economy. Sufficiently accurate models could enhance ozone forecasting, protecting health and compliance. Mean EPA air quality data and NOAA meteorological data were gathered and combined in python to model ground-level ozone independently at both permanent ozone monitors in the Colorado Springs area, AFA and MAN. Two datasets were prepared in RStudio, one with original variables and one with transformed variables. A 5-fold inner and outer cross validated modeling suite was run for both monitors with both datasets. Models examined include standard linear regression, generalized linear regression, linear regression with forward selection, penalized linear regression with LASSO optimization, generalized additive modeling, random forest, gradient boosted trees, and artificial neural nets.
Gradient boosted trees produced the best fit for all runs. The dataset with original values produced better modeling performance. The final mode fits had R2 scores of 0.66 and 0.71 for AFA and MAN, respectively. The most influential variables were temperature, relative humidity, and calendar date. Atmospheric mixing height was investigated and found to have moderate relative influence in the final model fits. The results indicated data mining solutions may be a feasible means to improve forecasting results, protecting public health and assisting compliance management.