Capstone Projects

Graph Database Prototype for Influencer Identification, Audience Segmentation, and Topic Analysis on Behalf of a Corporate Communications Department

Program: Data Science Master's
Location: Minnesota (onsite)
Student: James Warden

The news and social media monitoring tools that the public relations industry uses have traditionally emphasized volume. However, the falling cost of graph database management systems and their ease of use open new ways of analyzing conversations and identifying who is driving them. This project demonstrated how a graph database could improve insights for a corporate communications department. It first crafted a workflow to build a graph database from news articles, blogs, and social media posts. A Python script then used named-entity recognition and post metadata to identify entities associated with the content and the relationships between them, which it then stored as nodes and edges in a graph database. Finally, the author used centrality measures, similarity scores, and the Louvain and label propagation community detection algorithms to identify influencers, segment audiences, and understand context that would otherwise remain undiscovered if looking at pure volume. The final proof of concept showed how communications departments with these tools in their toolbox will be better positioned to understand the world in which they operate.