March 7, 2022 in Five-Minute Analyst
Text Mining National Security
SHARE: PRINT ARTICLE:
https://doi.org/10.1287/LYTX.2022.02.13
Recently, I found myself teaching about strategic messaging, and in the national security sector, there are quite a few documents across all levels. I chose to start with the top. Because I’m primarily in the national security line of work, I am choosing to explore the collection of National Security Strategy (NSS) documents. This type of quick-look analysis could be performed on anything. Perhaps you want to know what successful businesses put in their strategic documents? Or maybe mine the plethora of market forecast articles to see whether there is common guidance? The list can go on and on.
So much effort goes into national strategic documents. And the trickle creates many initiatives, programs, etc. Lately, the phrase “strategic deterrence” has been floating around, which to those with gray hair may feel like déjà vu. And it should. But how do we map this back to strategic guidance? What do guidance documents tell us?
NSS documents date back to when Reagan was president of the United States. For many years they were produced annually. Then, they started to become less frequent, and currently, we see them approximately once per presidential term of office. In total, and if you count the “interim” one produced last year, there are 19 documents to explore. In this five-minute analytic jaunt, I am going to look up the most common words in each document. First, the document should be, at a minimum, slightly scrubbed and put into an appropriate format. Sorry, no easy way around that. Then, using the tm package in R, we dig into the documents.
The tm package allows us to remove punctuation, change case and remove common words. It also allows us to look at the stem of a word versus the entire word. An example is strategi being a stem for strategic, strategies and strategically – although not strategery. Table 1 highlights the most common words in each NSS document.
| 1987 | 1988 | 1990 | 1991 | 1992 | 1993 | 1994 | 1995 | 1996 | 1997 | 1998 | 2000 | 2001 | 2002 | 2006 | 2010 | 2015 | 2017 | 2021 |
| Reagan | Reagan | HW_Bush | HW_Bush | HW_Bush | HW_Bush | Clinton | Clinton | Clinton | Clinton | Clinton | Clinton | Clinton | W_Bush | W_Bush | Obama | Obama | Trump | Biden |
| soviet | secur | forc | econom | econom | econom | nation | nation | nation | secur | secur | intern | state | nation | secur | nation | will | will | will |
| forc | strategi | econom | will | nation | nation | will | secur | unit | intern | intern | secur | unit | will | nation | unit | secur | state | nation |
| nation | econom | secur | secur | must | must | econom | will | will | must | nation | nation | secur | unit | strategi | secur | global | unit | secur |
| state | soviet | nation | new | unit | unit | secur | forc | state | state | state | state | promot | secur | state | advancin | govern | american | strateg |
| militari | militari | defens | nation | secur | secur | forc | state | econom | will | will | econom | continu | state | will | strengthen | state | econom | guidanc |
| unit | nation | polit | forc | will | intern | region | econom | forc | region | unit | unit | nation | develop | unit | will | strategi | secur | interim |
| secur | capabl | soviet | region | intern | state | unit | militari | secur | interest | promot | region | intern | must | democraci | intern | econom | nation | american |
| econom | forc | militari | soviet | state | will | democraci | democraci | support | nation | region | will | will | freedom | develop | promot | intern | partner | econom |
| defens | interest | technolog | unit | effort | effort | american | region | intern | econom | effort | promot | engag | threat | challeng | strategi | nation | govern | intern |
| support | support | maintain | defens | region | region | effort | unit | effort | also | develop | support | strategi | intern | freedom | build | continu | must | technolog |
| will | countri | will | threat | america | america | militari | peac | american | america | cooper | develop | must | govern | govern | invest | strengthen | militari | world |
| capabl | europ | state | world | democraci | democraci | strategi | interest | region | effort | econom | cooper | threat | use | terrorist | pursu | support | threat | interest |
| union | promot | also | region | engag | system | global |
Table 1: Top words count by NSS document. To view additional years, scroll right on the table. Note how econom peaks in the early 1990s and drops off in the early 2000s. What other trends could we tease out?
In this quick analysis, I did not remove the stems for the words national, security or strategy. One could leave them in, depending on whether their presence in the document is worth noting. The top two rows are document year and sitting U.S. president. I’ve done some conditional formatting to highlight interesting points. One is the trailing off use of the word Soviet. Given historical context, this makes complete sense. We have yet to see either Soviet or any other country raise to the same relative level in recent documents. The other interesting finding is the stem econom. This one peaks under George H. W. Bush and then falls until it completely disappears at the end of Bill Clinton’s time in office and does not reappear until almost the end of Barack Obama’s second term in office. It has yet to return to the same relative level of importance. A fitting follow-on study to this might be comparing this finding with the actual economic conditions that existed during and in years following each of these documents.
Table 1 also shows some obvious new words at appropriate places, such as the words terrorist and freedom in 2006. Using the word cloud package in R as well as generic boxplots, with some grouping of the data, we can quickly provide informative visuals. Some might find it useful to look at how the two parties compare. Let’s look at the word clouds (Figure 1) and the top 10 words for each. The differences in frequency are not very large, and most of the words overlap between the two parties. Economy seems to be higher among Republicans, whereas intern, which is a stem for international, is more common for Democrats. Militari, which is a root for some obvious words, actually shows up in the top 10 for Republicans but not Democrats. Republicans also use forc[e] at a higher frequency in the NSS documents.


Figure 1a and b: Word clouds of the entire collection of NSS documents split into those produced under Republican (a) and Democrat (b) presidents, respectively.


Figure 2a and b: Rather than the busy word cloud, we can plot the 10 words for each political party. Notice how many of the words overlap, but vary in their relative position.
We can also look at the political party bar plots stacked together with all words represented (see Figure 2). This format helps us more easily visualize where differences exist between the two parties. The final visualization comes from grouping results by sitting president. Note that with each grouping, we reshuffle the top results; thus, grouping by president will yield slightly different results than looking at each individual document, as we began to do in this five-minute look. Which grouping is best depends on the question being asked. In Figures 3 and 4, we can observe that more recent NSS documents have higher percent differences within the top 10. In this example, the difference is small, but it shows a trend in how these documents are written. More recent documents have perhaps a more repetitive nature.
In closing, let’s think about how this same analysis could be used elsewhere. Within national security, this kind of analysis could be used to analyze one’s own messaging, but also that of an adversary. It could also be used to explore the content of any number of other platforms and messaging products. The tools exist to do this quickly and easily. Just add data and stir.
Nick Ulmer, CAP, has been an operations research analyst since 2014. He is the inaugural chair of the INFORMS Military Veterans Interest Forum and a Principal Operations Research Analyst for CANA LLC, leading teams of analytics professionals to produce high level analytics products across federal and commercial domains.
([email protected])