Capstone Projects

An overview of search and an introduction to applications of machine learning techniques in the field of search

Program: Data Science Master's
Host Company: Bothell
Location: Washington (onsite)
Student: Gregory Parker

Many business organizations have a great need for search-based software applications. However, ‘Enterprise Search’ is a domain that is notoriously difficult to implement solutions in. To make matters worse, search is a topic that does not have the extensive free online learning resources available as some other domains do. As such, many junior software engineers may find themselves tasked with having to quickly design and implement a search engine component of an enterprise search application, having had little to no formal training. The first part of this paper is intended to provide a concise overview of the field of search and important concepts that will allow the junior engineer to get started without having to read entire books to get the big picture of how search works. The second part of this paper is intended to give the junior engineer a rapid overview of ways to incorporate machine learning layers on top of the search system. Data science topics include using neural networks to generate synonyms, using ML ranking algorithms to re-rank results, and using unsupervised learning techniques like clustering to group documents in order to aid users. The approach for this section is to first introduce common business problems associated with enterprise search applications, and then follow that with a discussion that describes the machine learning approaches that can address those problems.