Program: Data Science Master's
Host Company: National Park Service
Location: San Francisco, California (onsite)
Student: Elizabeth Edson
Drought in 2021 caused sections of Redwood Creek in Marin County, California, to become an inhospitable habitat for juvenile endangered salmon. This necessitated unscheduled and rapid intervention from National Park Service (NPS) staff to physically remove fish to deeper pools upstream. A data science approach was therefore taken to address the following management concerns: Where are juvenile salmonids predicted to be and to what risk are they of living in critical quality habitat in the Fall, how many field days will be needed for fish rescues, and what sections are the highest priority to visit. To answer these questions, a combination of long-term monitoring data and opportunistic sampling data were used to assess the performance of several suitable supervised learning techniques. Random forest models were selected through a double k-fold cross-validation process as providing the highest level of predictive accuracy for both juvenile data and habitat quality data, with the addition of utilizing a synthetic minority oversampling technique (SMOTE) for balancing the categorical response variable in the habitat data. Prediction datasets were combined into a dashboard allowing for easy dissemination of information. The final dashboard contained high-level alert boxes, a map of sections and risk values, and a visit/ work schedule prioritization table. Acknowledging the dynamic nature of Redwood Creek, the model selection process that creates the dashboard was written to be fully repeatable in future years allowing the dashboard to continue using the best predictive models based on existing and newly available data.