Wordle: The best word to start the game, according to a language researcher
If you’ve been on any social media platform in the past two weeks, you’ve probably seen a grid of green, yellow, and black squares. This is the latest pandemic phenomenon called Wordle – a free online game that provides users with a new word puzzle every day. It was created by Josh Wardle for his crossword loving partner. As of January 10, the game had 2.7 million players.
In Wordle, players have six tries to guess a target five-letter word. Each time they make a guess, they are told which letters in their guess are in the word and in the correct position (green), and which letters are there but in a different position (yellow). It’s a bit like the board game Mastermind but with one key difference. In Mastermind, all six colors were equally likely to appear in the target. In Wordle, since guesses and targets must all be real words, certain letters are more likely to appear, making some guesses better than others.
This leads to a question I’ve seen people discuss at length online: what’s the best first word to guess?
How do I find the best first estimate?
For now, let’s define the “best first guess” as the one that is most likely to share the most letters with the target word. What we need to know is: what is the frequency of each of the 26 letters in English five-letter words. And not just any five-letter words, those that have a chance of showing up as targets.
Obscure words like “nisus” (a mental or physical effort to achieve an end) or “winze” (a link between the different levels of a mine) need not apply.
I found a recent study that looked at over 60,000 English words and their notoriety. This type of statistic is interesting for language researchers like me because it reflects how easily a word can be processed: on average, the most well-known words are read faster.
I took all five-letter words that were known to at least 50% of the people studied (if you knew “nisus” or “winze” – I certainly didn’t – you share this feat with only 7% of the sample). Then I counted the number of times each letter appeared at least once in a word.
The most common letter was ‘e’, appearing in 46% of the words. It is a well-known pattern that applies to the English language in general.
A notable exception is George Perec’s novel A Void, which was intentionally written without the letter “e”. This model was even used by Sherlock Holmes in The Dancers’ Adventure to decode a cipher made up of dancing stick figures by estimating that the most common symbol would be “e”.
The mysterious sequence of dancing stick figures that Holmes deciphers in The Adventure of the Dancers. Author provided
One of the reasons the ‘e’ is so common is the advent of silent e at the end of words in the 16th century, used to signal something about preceding sounds. For example, “tone” is pronounced differently than “ton”.
The next most common letters were: “a” (39%), “r” (34%), “o” (29%), and “i” and “s” tied for fifth (28%). Out of those six letters, one word immediately “appeared” as the best option! Want a particularly bad first estimate? Try “whump” (a dull thud). That’s about the worst by that metric.
But while “popped up” is most likely to land you letters in the target, they may not be in the correct positions.
If we want a word that is most likely to get letters in their correct positions, the best option is “samey” (monotonous, repetitive, unchanging). But let’s not stop there. If we put these approaches together in a final score, we get a word that sounds oddly familiar: “soare” (a young hawk) – “appeared” but in a more strategic order.
One last thing to note. While writing this article, I discovered that people had looked at the source code of the Wordle website and found the actual list of words that could appear as targets. I decided not to use this list because I found it more fun to try to answer the question with the language resources available. Also, this list might change and I wanted to find a more general answer.
But, just to reassure you, when I do all of the above with this list of “official” Wordle targets, “soare” ends up being the best again. So this is it. Now, what you do with guesses two through six is up to you.
This article was originally published on The conversation.