Machine Learning Techniques for Healthcare Translations in a Low Resource Language: Hmoob Dawb

Program: Data Science Master's Degree
Location: Not Specified (remote)
Student: Sweetie Pa

Hmong patients with limited English proficiency (LEP) may struggle to gain control over their care during healthcare encounters, even with an interpreter present. A solution for this is to create a machine translation model for LEP patients with a focus on healthcare terminology in place of a translator. This proved to be a challenge for the Hmong language because it is also considered a low resource language. A small, public dataset was published to Hugging Face and was created from parallel translations from various open-source resources which were health or healthcare related. This study also resulted in the creation of four machine translation models from two base models with two different functionalities. Each of the models was evaluated to determine the best model out of the four. These findings resulted in multiple takeaways, modifications, and suggestions for future research on building a translation model for the Hmong language during healthcare encounters. These modifications include the continuation of fine-tuning models pretrained on other tonal languages, creating a larger dataset to prevent overfitting, considerations for the community, and practices that should be adopted going forward to further this research.