Capstone Projects

Machine Learning Based Entity Resolution

Program: Data Science Master's
Location: Not Specified (remote)
Student: Deepa Madhavan

Machine Learning based Entity Resolution is a system that is built to automatically resolve and link entities across disparate data sources. This is achieved by building a resolver model which extracts various features from the data fed and compares them against standard entity mapper to classify with the help of Machine Learning algorithms. The goal of this effort is to design and build a configurable, scalable, automated system for the resolution and linkage of entities spread across disparate data sources.  

Imagine having access to data sources of various types (database, CSVs, etc.) along with metadata about the data source itself, metadata about individual data fields, and some sample of data records for each data field. 

This domain independent Auto Resolver System will be very effective and futuristic as it uses Machine Learning to understand the data better and make intelligent linkage decisions. It helps organizations to improve quality of data, cost reduction, better decision making, enhance data reliability, industry compliant standards 

  • Build and train this model with multiple data sets with heterogeneity 
  • Compare and link with any number of entities in no time 
  • Build features from the input data and compare them against the standard entity mapper 
  • Make it highly configurable and scalable to make intelligent decisions which replaces manual effort