Use of Autoencoders for Fraud Prediction Compared to Supervised Learning Modeling Techniques: a case study

Program: Data Science Master's Degree
Location: Not Specified (remote)
Student: Katelyn Woodruff

Fraudulent transactions are a costly dilemma for retailers. Not monitoring fraud costs companies in lost sales and inventory but being too harsh on fraud costs companies good sales and the future value of their customers. It is important for companies to find the balance between not penalizing good sales but also preventing loss due to fraudulent transactions. This project evaluates the use of autoencoders versus supervised learning techniques, including logistic regression, random forest, and XGBoost. The study used a synthetic data set sourced from Kaggle. The findings indicate that the autoencoder was unable to outperform the XGBoost model. The autoencoder was able to pick up some trends which was an improvement to the logistic regression; however, it had an exceptionally high rate of false positives. Since the autoencoder is unsupervised, it appears it picked up on other anomalies within the transactions not just fraudulent anomalies. Future comparison with a data set from a real world business and a more robust feature set is needed to prove the usefulness of autoencoders in fraud detection.