Improve Pipeline and Query Performance f or Natural Language Analysis
Program: Data Science Master's Degree
Location: Illinois (onsite)
Student: Priscilla Ian
My capstone project improves the performance of Speciate’s queries primarily in terms of accuracy and relevancy. The majority of the project is centered around text deduplication. For example, when searching for the term “Apple,” one should be able to distinguish between the fruit and the computer company. The objective of the project is to write software that helps solve some of the above mentioned problems: Clean the dataset, structure it, and enrich the documents with useful information, store the data in an efficient manner that can be queried cost effectively, and automatically create a visualization from the data for an arbitrary query.