Capstone Projects

Utilizing Natural Language Processing Techniques to Drive Source to Target Mappings in ETL Processes

Program: Data Science Master's
Host Company: Cogitativo, Inc.
Location: Berkeley, California (onsite)
Student: Shawn Chapler

This project explores the use of NLP techniques to drive source to target mappings in ETL processes.  It explored the use of a noisy channel model to address abbreviations, so common tokens could be created across terms.  It also explored topic modeling techniques to determine whether or not it could be used to identify sub-topics within the file headers.   While the focus was in the healthcare domain, both the techniques and findings can be applied more broadly.