Capstone Projects

Automated Document Classification System

Program: Data Science Master's
Location: Not Specified (remote)
Student: Saritha Thampan

Document classification is the process of labeling the documents into different categories based on their content and is widely used in many business processes like email spam filtering, sentiment analysis, web page classification, legal document classification, and financial document classification.

Automated Document Classification System is a case study with the following objectives

(1) Research and implements an automated document classifier that will read scanned images of documents, extract the text, and classify them into distinct document categories relevant to the extracted text.

(2) Compare the performance of different machine learning and deep learning methods and recommend the best methodology for document classification.

(3) Gain a better understanding of computer vision and text classification methods and deep learning networks.

(4) Apply data mining concepts learned in the program to develop the automated document classifier.