Program: Data Science Master's
Host Company: National Radio Astronomy Observatory
Location: Charlottesville, Virginia (onsite)
Student: Erica Keller
The Atacama Large Millimeter/Submillimeter Array (ALMA) Helpdesk answers user questions about ALMA data proposals, acquisition, reduction, and analysis and all associated tools produced by the observatory. A decade of historical tickets (N = 4,857) are accessible to staff to assist in answering user questions. This project built models and tools to using Natural Language Processing to assist staff in finding similar tickets faster to be able to answer new tickets more efficiently. The objectives included the following:
- Build classification models for ALMA Helpdesk tickets and provide a comparison of performance/accuracy to determine the best model.
- Build a navigable network visualization showing how ALMA Helpdesk tickets are related based on their content.
- Measure the similarity between historical ALMA Helpdesk tickets and a newly submitted ticket. Provide a list of similar tickets to staff to decrease ticket resolution time.
The project was limited to seven Helpdesk departments of the North American ALMA Regional Center. Existing support helpdesk systems were reviewed to determine appropriate models and tools. Labels and a model to label new tickets were generated using K-Nearest Neighbor, Support Vector Machine, and Random Forest algorithms. The labelling model is defined for each Helpdesk department. All models have greater than 90% accuracy using only the subject text. The generated ticket label, similar tickets, and a link to the interactive network are posted to the new ticket for staff to use in their investigation. All objectives of the project were met.