Classification of Podcasts using Content-Based and Context-Based Data

Program: Data Science Master's Degree
Location: Not Specified (onsite)
Student: Samuel A. Bailey

This case study uses both supervised and unsupervised natural language processing techniques for text classification to compare utility of larger podcast transcript data to that of smaller podcast metadata in the data science task of categorization. Transcripts and metadata from 300 podcasts from 10 different categories were used for evaluation of different text-classification methods. Awareness of differences in utility between podcast transcripts and metadata for classification into categories is significant for a podcast streaming platform in focusing their time, effort and investment in properly categorizing this audio content for an easy to navigate and personalized user-experience.