# The Puzzle

Josh Mermelstein and I have decided to begin challenging each other to a series of a math/programming puzzles. He asked me to generate all possible valid English-language four-by-four crossword grids; that is, a grid of sixteen letters where every row and column is a real English word. One caveat was that no grid could have the same word twice.

An example grid:

ACTS --> (across) acts, lore, idea, teem
LORE     ( down ) alit, code, tree, seam
IDEA
TEEM


## The Difficulty

It’s easy to think of a two really bad ways to solve this problem:

• You could try all $26^{16} = 4.36\times10^{22}$ grids and filter by validating their component words, or
• you could find real four-letter english words (there are a little over 2000 of them, so $2354 \text{ choose } 8 = 2.31\times10^{22}$ possible grids ) and try and fit them, eight at a time, on a grid.

# The Solution

The best way I found was to:

• precompute a dictionary (called paths here) with
• keys like "a", "ab", "abc", and
• values corresponding to the lists of letters which, if put after their keys, would lead to real four-letter words.
• (For example, paths["wok"] = ['e','s'], or (perhaps less obviously), paths["z"] = [('a','e','i','o'].)
• lay out all possible starting grids, where
• the top row and leftmost column were filled out
• with two real four-letter words
• whose first letters were the same
• for each grid (node, really),
• find the next blank,
• find the partial word above it,
• find the partial word to the left of it,
• look them both up in paths,
• take the intersection of the resultant lists
• for each character in this intersection,
• create a list of child nodes with the blank square filled in.

In this way, we guarantee that any placement will always lead to a real word in that row and column.

## Annotated Code

I’ll present the code annotated below, or in its entirety in the attached gist.

With comments, this is ~160 lines; without, this is ~70.

OK, what you came for. There are $686739$ distinct four-by-four crossword grids in English with no repeats and no diagonal symmetry. The script runs in just over a minute and uses far short of 100% of my memory (unlike several intermediate versions).

# Runtime

Run normally, this script is fairly fast:

But it was not always so. During development I made extensive use of the built-in GHC profiler:

This writes out to words.prof and looks something like: (see the full words.prof profiler output on Gist.)

Which gives a bit of a hint where things might be taking up a lot of time. In this case, nextIn is taking up 20% of our time, and it’s an $O(\log n)$ pre-optimized Map lookup function. Might be time to stop optimizing, with a near-one-minute runtime.

# More Fun with our Solver

What else can we do with our program now that we have a pretty speedy crossword solver? Well, because Haskell is lazy it’s not that hard to see if there exists a 5x5 or 6x6 grid. By doing main = putStrLn $gridPrint$ head grids we end up executing incredibly fast:

Here are some grids I found:

0.062s    0.164s     54.0s

ABED      ABACI      ABBESS
BLUR      BOWED      SEESAW
EERY      AXIAL      CATTLE
TWOS      SENSE      ERRATA
EDGED      NEATER
DRYER


## Palindromes

Additionally, we can find “palindromic” crosswords, where the words are valid even if the grid is rotated 90, 180, or 270 degrees:

In 6.627s it finds the following palindromic 4x4:

DRAB      WARD      DRAW      BARD
RAGA      AJAR      RAJA      AGAR
AJAR      RAGA      AGAR      RAJA
WARD      DRAB      BARD      DRAW


There are 10 such palindromic 4x4s; eight of them rely on the quartet raga/raja/ajar/agar at their centers; the other two rely on time/tide/emit/edit.

DRAB  DRAB  DRAB  DRAB  DRAW  DRAW  TRAM  TRAM  STEP  STEP
RAGA  RAJA  RAGA  RAJA  RAGA  RAJA  RAGA  RAJA  TIDE  TIME
AJAR  AGAR  AJAR  AGAR  AJAR  AGAR  AJAR  AGAR  EMIT  EDIT
WARD  WARD  YARD  YARD  YARD  YARD  PART  PART  WETS  WETS


## Word Frequency

By running

We can write a list of all grids to a file grids.txt and perform some rudimentary Bash-level analysis on what sorts of words appear in the rows most commonly:

These don’t quite look like the letter distributions we’re used to in full-fledged English – let’s look at letter frequencies and see how different it is for four-letter words, and which letters are more likely to appear in crosswords.

Looks like q only appears 33 times; only ever in the context of quay, aqua, quip, quad, quit, or quid.

# Conclusion

I have challenged Josh Mermelstein to solve the following puzzle:

For a set of n letters ('a','b','d') you can make m real English words ("bad", "dab", etc...). Find the subset of all 26 letters with the highest ratio of words to letters; in other words, the most bang for your buck.

• View the source code on Github.