Capstone Projects

Programmatic Automation of Quality Assurance of Clickstream Data Anomalies

Program: Data Science Master's Degree
Location: Not Specified (remote)
Student: Alan Scieszinski

Like many e-commerce websites, Acme Clothing’s website is continuously being refined and updated. When the site is updated, there can be unforeseen consequences. For example, the digital tags can start failing to fire the data they should. Alternatively, even more concerning, online orders placed by customers can stop being collected by the order management systems. Today, many hours of manually clicking and debugging the website are required to ensure the data quality. This human quality assurance is limited to the time and speed of the worker. In addition, it is more open to human error than a programmatic approach. This client-based project describes the programmatic automation of the quality assurance of clickstream data anomalies using unsupervised and semi-supervised machine learning algorithms. It also shows how joining offline with online data produces new business insights. It displays the data in online dashboards to facilitate the company’s ongoing business management and decision-making needs regarding clickstream data. 

Project Objectives  

  1. Improve QA’s completeness, accuracy, and speed by utilizing a programmatic approach.   
  1. Develop a reusable workflow to source, clean, and analyze the product page’s clickstream data variables in preparation for modeling.  
  1. Establish alert thresholds for the subset of variables analyzed in this project.  
  1. Create a dashboard mockup to display the data in a way that the end stakeholders will understand and use. 
  1. Decide what clickstream variables to add to Acme Clothing’s database to improve analytics capabilities.