January 27, 2021 in Five-Minute Analyst

Inauguration Speeches

Harrison Schramm

SHARE: PRINT ARTICLE:

https://doi.org/10.1287/LYTX.2021.02.01

Today is the day after President Joseph Biden’s inauguration, and several of my friends (and news sources) have remarked about the quality of his speech. It seemed like a worthy five-minute project to compare his speech with that of former president Trump’s in 2017. We have done this type of analysis twice before, first on State of the Union speeches and later on nomination acceptance speeches. In the final installment of this column under my pen, we’ll consider the differences between the past two inauguration speeches. As a technical note, we make heavy use of the “tm” package in R for basic text mining, as well as the versatile “RWeka” package for creating tokenizers of higher order. Both speeches are readily available; Biden’s speech was found here, and Trump’s was obtained here.

The key steps to text analysis are as follows: Get the text, save the text in a machine-friendly format, import the text, perform cleaning operations in preparation, and – in this case – create a “term-document-matrix.” Two of the cleaning operations are of particular note. One is removing “stopwords,” which are best thought of as the words that would be removed when sending a telegram (or a tweet?). The other is to “stem” the document, sending words such as “America,” “American” and “Americans” to the same root.

The first thing that we do in comparing the speeches is to simply count the words and bin them into three separate categories: those that Biden uses most frequently, those that Trump uses most frequently, and those that are common to both, producing the plot shown in Figure 1.

**Figure 1:** Use of words in inaugural speeches by speaker. The “common” words were chosen as words that made up greater than .5% of each speech. Of the remaining words, the 10 most common for both speakers were chosen.

As one would expect, by far the most commonly used word in an inaugural speech is one that maps to “America” as described in the stemming part above.

Counting individual words is interesting, but it does not tell the whole story. A more interesting tale emerges when we consider combinations of words – so called “n-grams.” As these documents are both relatively short (fewer than 2,000 words) and the speakers both have signature styles, for building and analyzing n-grams, we chose not to remove stopwords or stem, thus analyzing the text in its rawest form.

**Figure 2:** Common two-word phrases (“digrams”) from inaugural speeches. The most common by far is former president Trump’s “we will” (also used by President Biden), followed closely by Biden’s “we can.”

Finally, for the sake of completeness, we include two plots of “trigrams” (three-word phrases, see Figure 3).

**Figure 3:** Trigrams in inaugural speeches: These trigrams hint at higher-order phrases, not analyzed here. Specifically, former president Trump’s “we will make’ and “will make America” are part of the same phrase, as well as President Biden’s “my whole soul” and “soul is in” are part of the same phrase.

Conclusion

As O.R. professionals, we have the opportunity to put concrete numbers behind matters that without our voice, would be driven by opinion and passion. We should capitalize on these opportunities when they arise – or when we can create them. After a hiatus this year working on other things within INFORMS, I have returned to this column to restart it . . . and to hand it off. Look forward to new topics coming from a new voice in the near future.

As a final note, some years ago I was doing some text mining work, and my son handed me the pencil sketch below. Hearing me talk about tokenizers, he produced the “superhero meets transformer” sketch (Figure 4). The tokenizer collects tokens as a video game, then fires them as a dragon, and assumes robot form (with “T” forehead) in between.

**Figure 4:** Superhero meets transformer, producing the tokenizer.

Harrison Schramm
([email protected])

Harrison Schramm, CAP, PStat, is a senior lecturer at Naval Postgraduate School, splitting his time between Defense Management and Operations Research where, in addition to teaching, he runs the Contested At-Sea Logistics Lab (CASLL). He served as the inaugural chair of the INFORMS Security Conference and is a past president of the INFORMS Analytics Society.

Keywords: