Redacting Protected Health Information(PHI)/Personal Identifiable Information(PII) from X-Ray Files

Program: Data Science Master's Degree
Location: Greater Boston, Massachusetts (remote)
Student: Abhishek Alexander

This capstone project endeavored to create an automated system to detect and redact protected health information (PHI) in X-ray images using machine learning technology, addressing concerns about patient privacy and data security in the healthcare industry. By leveraging convolutional neural networks (CNNs), the system detected the PHI elements such as patient names and medical record numbers in the X-ray images, followed by the redaction of PHI elements using OCR and image manipulation techniques. Key stages included data collection, preprocessing, model development, and performance evaluation. An extensive dataset of labeled X-ray images containing PHI was curated and used to train CNN architectures. Model performance was assessed using metrics like precision and recall, with generalization capabilities tested on unseen data. Successful implementation could enhance patient privacy, reduce unauthorized access to sensitive information, and streamline compliance with data protection regulations like HIPAA. Overall, the project contributed to advancing machine learning in healthcare and fulfilled the need for automated PHI redaction solutions in medical imaging.