Capstone Projects

Evaluating High Performance Computing Strategies to Deliver Timely Insights

Program: Data Science Master's
Location: Not Specified (remote)
Student: Andrew Seewald

Gaining insights from large, unstructured text data has become common in recent years. This paper aims to cover some background related to analyzing large text data, compare different technical frameworks for analyzing text data, and provide recommendations on how a less experienced team can easily provide value from text data. Standard python libraries, Apache Spark, and Nvidia Rapids approaches are reviewed to determine their utility in analyzing text data. Sample mobile application reviews were gathered from mobile application stores to perform the analysis. The analysis showed that standard python libraries were the slowest processing method, but there are also some situations, particularly for new analytics teams, where the ease of use and lower hardware requirements make it an attractive alternative over the much faster Apache Spark and Nvidia Rapids.