Capstone Projects

ML Applications in Data Profiling for Metadata Management

Program: Data Science Master's
Location: Not Specified (remote)
Student: Munsoor H. Razack

To be competitive in today’s data-driven world and keep up with changing regulatory and privacy laws, organizations must be knowledgeable about their data sources, the data, and its purpose and context. Metadata is where this information on the data is compiled and managed. This exploratory case study describes the different types of metadata and their role in a data management strategy and explores methods for data profiling. Propriety tools simplify and automate data profiling and cataloging and some of these tools and their features are described. The case study examines machine learning techniques from other research done for data profiling, such as classification methods. It explores automated machine learning (AutoML) as an approach to enable micro and small businesses to get started with data profiling for metadata management but avoid a large investment upfront while evaluating their business use cases.