University of Wisconsin Extended Campus is now Wisconsin Online Collaboratives! This name reflects the partnerships of the 13 universities within the Universities of Wisconsin–our state's premier system of public higher education. Through these partnerships we will continue to support online degrees, certificates and courses–along with support services to you.

Capstone Projects

Data mining PII via optical character recognition on public image hosting sites

Program: Data Science Master's
Location: Not Specified (onsite)
Student: Shawn Peters

Previous research has revealed that credentials compromise is the catalyst for many data breaches and cyberattacks. Most web applications offer safeguards that give users control over their security and privacy, but cybercriminals’ tactics continue to evolve. This study aims to demonstrate that username and password data can be mined from nontraditional sources, specifically text in images. An optical character recognizer will be used to scrape text from images on an image hosting service, categorize the data by keywords, and mine for credentials. Text scraped from 1.18 million images is analyzed for personally identifiable information and mined for user-, service-, and system-level credentials. Analysis of a focused subset of the data uncovered over 1000 usernames and passwords, and a branch of additional mining uncovered several social security numbers. This investigation proves that compromising textual data is contained in images hosted publicly, and that data could be collected for criminal use.