September 23, 2024 in Five-Minute Analyst

Words Matter: Sentiment Analysis of a Political Debate

Nick Ulmer, CAP

SHARE: PRINT ARTICLE:

https://doi.org/10.1287/LYTX.2024.04.02

The U.S. presidential election has always taken on the character of a touch of spectacle, and the “prize fight” of election season tends to be the debates. These are planned, negotiated and trained for in a manner reminiscent of Ali vs. Foreman in the “Rumble in the Jungle.”

The debate on September 10, 2024, followed this tradition, in which Vice President Kamala Harris squared off against former President Donald Trump. A lot of analysis has already been spent on the direct meaning of their words; this article focuses on the overall tone of the words, not as expressions of policy but rather the tone the words convey.

I previously wrote a Five-Minute Analyst piece on the Biden vs. Trump debates. Harris vs. Trump took on a different character. For this analysis, I use three sentiment lexicons for analysis, which were chosen for their general applicability as well as convenience. Just like bids from contractors, it is always good to get three. These lexicons work by assigning a score – either point based or value based – to individual words out of context. Thus, “bad” would have a negative score and “not bad” would also have a negative score, because although it conveys a positive emotion, it uses two negative words. This is because words matter. The workflow is to extract the transcript of the debate [1] into a single file, assign speakers to comments, and after some preparation, create a unified data set of the words we want to analyze – in that order. There were 2,022 words from Harris and 2,144 words from Trump. We normalize the numbers with each lexicon analysis, proportionally increasing Harris’ word count to match Trump’s word count. This requires the assumption that if Harris had the difference in word count added that those additional words would reflect the same proportions of sentiment. We feel this is valid because each speaker had – to a first approximation – “equal” opportunity to speak as measured by time.

The first of the three sentiment lexicons is “afinn.” This lexicon has the fewest words from which to match, but the results are given a numeric score of -5 to 5, most negative to most positive [2]. Trump’s comments match with 349 words and Harris’ with 277. Again, I normalize to produce comparable statistics. Because there is a numeric score, we chose to produce a chart that provides the cumulative sentiment score of each candidate’s language. Harris dips negative by the middle of the debate but manages to pull her cumulative score back up by the end. Trump’s language is so negative that his line drops at a steady rate. If this is not a perfect picture of how the debate went, then I’m not sure one exists.

cumulative score of matched 'afinn' sentiment lexicon — Figure 1. A cumulative score of matched “afinn” sentiment lexicon words, from beginning to the end of the debate, shows Harris’ balanced language against an increasingly negative tone set by Trump.

The next analysis uses the “bing” sentiment lexicon [3]. This lexicon includes 6,786 words and acts as a binary classifier, with “positive” and “negative” responses. Trump matched 305 words in the list during the debate, whereas Harris matched 272. Over the 90-minute debate, Trump racked up more than two-thirds of his 305 in the negative category, while Harris was almost balanced between positive and negative words.

count of words from the 'bing' sentiment lexicon — Figure 2. A count of words from the “bing” sentiment lexicon reveals that more than two-thirds of Trump’s 305 match words were negative, and Harris was almost equal across her 272 match words. Note that Harris’ word count was scaled proportionally to ensure a fair visual comparison.

Moving to the third and last sentiment lexicon, we looked at the “nrc” word counts [4]. This lexicon has a large pool of words (~14,000) but assigns words across eight possible emotions (trust, fear, anticipation, sadness, anger, joy, disgust and surprise). The good news is that on this lexicon, both candidates had more words associated with “trust” than any other emotion.

count of words from 'nrc' sentiment lexicon — Figure 3. A count of words from the “nrc” sentiment lexicon reveals Harris “wins” in trust and anticipation words, whereas Trump “wins” in sadness and anger words. Note that Harris’ word count is again scaled up proportionally to match Trump’s.

This has been a great analysis, but it is only valuable if communicated and understood broadly. Please vote responsibly and stay safe, my fellow analysts.

I would also like to thank my longtime collaborator (and original Five-Minute Analyst) Harrison Schramm for technical recommendations and feedback during this five-minute analysis.

Download a text file of debate transcript here.

Download an RMarkdown file of the analysis here.

References

ABC 7 News Chicago, “Harris-Trump presidential debate transcript,” https://abc7chicago.com/read-harris-trump-presidential-debate-transcript/15289001/.
Nielsen, A., 2011, “AFINN,” Informatics and Mathematical Modelling, Technical University of Denmark, https://www2.imm.dtu.dk/pubdb/pubs/6010-full.html.
Liu, Bing, 2020, “Sentiment Analysis: Mining Opinions, Sentiments, and Emotions,” Cambridge University Press, https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html.
Mohammad, Saif M. and Turney, Peter, 2011, “NRC Word-Emotion Association Lexicon,” https://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm.

Nick Ulmer, CAP
([email protected])

Nick Ulmer, CAP, has been an operations research analyst since 2014. He is the inaugural chair of the INFORMS Military Veterans Interest Forum and a Principal Operations Research Analyst for CANA LLC, leading teams of analytics professionals to produce high level analytics products across federal and commercial domains.

Keywords: