Puzzle—An OR Approach for WORdle

Published Online:https://doi.org/10.1287/ited.2023.0285

Abstract

Modeling exercises are designed around the popular game Wordle. These exercises can possibly be used in a computational or modeling laboratory course or in a course on modeling operations research (OR) problems. They assume some familiarity with modeling integer programs, a programming language like Python for basic text processing, and optimization software like AMPL or PYOMO. These exercises were tried in a master’s-level course on modeling OR problems. Two sessions of three hours each are ideal for the whole exercise.

Wordle is a popular single-player word-guessing game (Victor 2022). A player gets six chances to guess a five-letter target word. After every guess, the player receives feedback for each letter in the word as follows:

  1. If the letter is at the same position as in the target word, it appears green (G).

  2. If the letter is present in the target word, but is not in the correct position, it appears yellow (Y).

  3. If the letter does not appear in the target word, it appears black (B).

Based on the feedback provided after each guess, the player can make better subsequent guesses. If the player correctly guesses the word in six trials, they win, and lose otherwise. The gameplay is similar to the two-player board game Mastermind, which became popular during 1975 (Knuth 1976). The game received a lot of interest from the Operations Research (OR) community as well, and several studies were published to win it in the fewest possible attempts (Neuwirth 1982, Bernier et al. 1996, Kooi 2005). Wordle is very similar to Mastermind, except that in Wordle, the guess should fit the pattern and also be a valid word. Wordle has a dictionary of words known to the player, and the target word is selected from a subset of this dictionary. Several internet-based versions (The New York Times Company 2022, Wordlegame 2022) of the game use words from the English dictionary. Following are a few examples of the color string that Wordle would show for the guessed words.

  1. Target: POWER; Guess: ROPED. The feedback string is YGYGB.

  2. Target: SMILE; Guess: SPREE. The feedback string is GBBBG. The first “E” is black in the result, and the second “E” is at the correct position.

  3. Target: AMPLE; Guess: KAPPA. The feedback string is BYGBB. Note that the guess word contains “A” twice. The first instance is yellow, whereas the second is black, and similar is the case with “P.”

An OR Approach

One possible solution approach is to break the problem into two tasks: (a) First, discover which letters appear in the target, and (b) then guess the target by rearranging the discovered letters (and some additional ones if needed). The first task can be modeled as a decision problem. The exercises below can be used as a guide to develop a mathematical model. The second task is somewhat easier and is left to the player. The resulting model is not perfect. It leaves some, hopefully easier, tasks to the player and does not even guarantee finding all letters. So this approach is just a rough tool to aid the decision maker and not a “solver” for the problem. On the other hand, this approach is quite straightforward. The strategy is “fixed” and independent of the feedback received in the first four tries. The player has to merely remember four simple words that can be used for every game. There are other interesting strategies (Anderson and Meyer 2022, Bertsimas and Paskov 2022) that can be used to fully solve any given five-letter Wordle, but, not surprisingly, they are more complicated.

The goal of the following exercises is to find a fixed strategy so that the player can maximize the number of identified letters in the first four tries so that they can make “good” guesses in the remaining two attempts. Because there are 26 letters in the English alphabet, one needs six five-letter words to discover all letters in the target. Even if one uses six such words, the letters appearing multiple times (for example, EAGLE) will not be known. Instead, one can try to pick four five-letter words that cover 20 distinct letters. For example, the set S1{CLING, JUMPY, SHARK, TOWED} can be used to cover 20 letters, but it misses the letter “V” that is more common than “J.” The exercises below attempt to find the best such four words.

  • Exercise 1: Can you find another set S2 of four words that covers 20 distinct letters?

  • Exercise 2: Count the frequency of each of the 26 English letters in the dictionary. One can use the relevant parts1 of Google Web Trillion Word Corpus (Norvig 2009b) for a dictionary.

  • Exercise 3: Based on the above frequencies, can you decide whether S1 is better than S2 for the strategy described above?

  • Exercise 4: The dictionary has words with repeated letters, which should not be used for the above-described strategy. Also, one would like to avoid the six least frequent letters. Based on these two arguments, can you remove some words that will not be in an “optimal” strategy? How long is the filtered list after removing all such words?

  • Exercise 5: Which four words from the filtered list obtained above should you select so that all the 20 most frequent letters that you have identified appear exactly once?

    One way to find such words is by formulating an integer programming model with a decision variable, say, xi, that takes value one if word i from the filtered list is selected in the solution, and zero otherwise. Constraints are defined in a way to ensure that each of the 20 letters appears exactly once. Hence, there can be 20 constraints, each corresponding to a frequent letter. Report a solution after solving this model. It is likely that the words you find are esoteric, proper nouns, or even abbreviations.

  • Exercise 6: Observe that the above optimization problem had no objective function. So, can you introduce an objective to help get a better solution that is easy to remember?

    One objective could be to select those words in a solution that are more frequent and commonly used. The above dictionary of words also contains the frequency of how often each word is used. This frequency is obtained based on the occurrence of words in all the publicly available web pages (Norvig 2009a). One way to do this exercise is by introducing an objective function that maximizes the total frequency of words selected. Can you model the complete optimization problem with this as an objective and constraints as described above? Report the solution and discuss your observations based on it.

    The “optimal” solution obtained from above may have one or two commonly used words, whereas others may not. Note that the frequency of the most frequent word “ABOUT” in the filtered list is 1.2×109, and that of the second most frequent word is 9.8×108. A scatter plot of their frequencies (after sorting) shows an exponential decrease in frequency from “ABOUT” to “IGOLE” (the least frequent word). Can you now explain why the optimal solution has rather uncommon words?

  • Exercise 7: One possible way to better model the objective is to take the logarithm of the frequency of each word. Because the logarithm grows slowly, the range from the highest-frequency word to the lowest-frequency word is not very drastic. It may keep the objective coefficients for all the variables within an order of magnitude only. Check whether the new objective yields an easier-to-remember solution set.

Play the game with your friends or online and see whether the above set is useful in solving the problem correctly. Can you think of a “hardest” target word, where the above set provides minimum information (say, only one or two letters)?

Endnote

1 A list is available online at https://github.com/ashutoshmahajan/wordle-dict.

References