Application of Machine Learning Techniques for Isoprene Forecasting: a Colorado Front Range Case Study

Program: Data Science Master's Degree
Host Company: Boulder Atmosphere Innovation Research (AIR)/NA
Location: Boulder, Colorado (remote)
Student: Gabriel Greenberg

This project focused on isoprene (a terrestrially emitted compound) that contributes to air pollution through its impact on ozone, a toxic respiratory irritant. This project’s main objective was to construct and evaluate three machine learning architectures (SARIMA, neural networks, and random forests) to determine the plausibility of using a data-based machine-learning (ML) approach to forecast isoprene 24 hours in advance compared with existing physics-based models. There is little research on using machine learning to forecast isoprene and using an ML approach has the possibility to improve ozone forecasting and protect local community members (particularly those with respiratory conditions) from air pollution. The models constructed in this project were compared with existing physics-based isoprene models which have been used for a long time but suffer from several limitations including (spatial resolution, challenging setup, and inaccuracies associated with inhomogeneous vegetative ground cover) that a local data-based approach can mitigate. This project used Boulder AIR’s (a local Colorado air monitoring company) ground-based isoprene measurements along with meteorological, chemical, drought, and botanical data. None of the models produced forecasted isoprene particularly well, but did show promise with continued work and improved model architectures like an LSTM. While difficult to make a direct comparison, the ML models performed similarly (slightly worse) than a physics-based model study conducted in 2015. With continued work, using ML models to predict isoprene or atmospheric pollutants such as ozone may provide a useful and simpler alternative to existing physics-based simulation.