Capstone Projects

Shapelet-Based Time Series Classification: Diagnosing Atrial Fibrillation: a Case Study

Program: Data Science Master's Degree
Location: Not Specified (remote)
Student: Kathryn Douglass

Electrocardiogram (ECG) classification is a widely studied time series classification (TSC) task with critical applications in clinical diagnostics. Traditional approaches typically rely either on highly domain-specific signal processing techniques or on features derived from deep learning models, which lack interpretability.

This study investigates shapelet-based TSC as a domain-agnostic and inherently explainable alternative to these methods, using atrial fibrillation (AF) diagnosis as a case study. This project utilized ECGs collected from individuals with normal and AF heart rhythms that were available and open-source through PhysioNet. Scalable and Accurate Subsequence Transform (SAST) was used to extract shapelets and perform a shapelet transform on the waveforms, engineering features that were subsequently used to train XGBoost classifiers. Model explainability was assessed by visualizing the most influential shapelets and evaluating their alignment with the established pathophysiology of AF.

The results demonstrate that shapelet-based models facilitate transparent communication of key features to medical professionals. The findings also show that shapelets effectively capture clinically relevant features, such as P-wave abnormalities – a signature that conventional signal processing techniques struggle to detect. Although this study’s best model was not sufficiently
accurate for clinical deployment, this research indicates that shapelet-based methods offer a feasible and promising direction for AF diagnosis, thus meriting further exploration for this application and others in future work.