Capstone Projects

Analysis of Amyotrophic Lateral Sclerosis and Huntington’s Disease Trends and Their Links to Other Genetic-based Research Using Text Mining and Network Analysis

Program: Data Science Master's Degree
Location: Not Specified (onsite)
Student: Eleonore Molstad

In biomedical research, there are large sums of money available for diseases with high incidence rates, such as cancer or heart disease, but there is little money available for rare diseases, such as Amyotrophic Lateral Sclerosis (ALS) or Huntington’s Disease (HD). Rare diseases have a genetic component and are usually fatal. While many of these diseases are related to a single genetic mutation, like HD, others, like ALS, seem to be related to a non-additive combination of genes. Understanding these combinations is necessary to creating diagnostic tests and devising effective treatment. Using text mining and network analysis to identify and extract characteristics (like gene or protein names) while building a network of related papers could assist rare disease researchers by leveraging the research done on more commonly researched diseases. Articles were downloaded from PubMed using ALS and HD as search parameters. Their abstracts were analyzed using LDA topic modeling and dynamic topic modeling. Genes referenced in articles were identified and network analysis was used to analyze the genetic research about ALS and HD. Historical research trends and areas of potential research were also identified.