##### Solving the Pairs of Percentages Puzzle
###### 2016-10-29

Here’s a math puzzle inspired by the election. I noticed a while back that the two candidates were at 81.5% and 18.5%, respectively. The numbers 185 and 815 are anagrams, and add to 1000 ($10^3$). How many such pairs of positive integers add to $10^3$? I’ve been doing a bit of Haskell recently, so that’s my language of choice.

# Exploring the space

The first step is to define a function which takes an integer pair and decides if the pair is valid (that is, if the two integers are anagrams or not.)

Let’s use this valid function to check a range of numbers (a,b) where $a+b=1000$, and valid a b == True.

This is pretty cool. Let’s fit this into a function $f(n)$, where

• $f(n)$ is the power $10^n$ we’re matching against,
• pairs n returns the list of potential pairs,
• validPairs n = filter(\(a,b) -> valid a b) $pairs n, and • f n = length$ validPairs n, which is what we’re ultimately trying to find an expression for.

If we run validPairs 4, we see a cool pattern.

Before we go further, a few notes on this: for performance reasons, we only compute pairs (a,b), not (b,a). Because the last pair computed is always something like (50,50), (500,500), etc., the total number of pairs is actually $2f(n) - 1$. We’ll keep this in mind for later.

Here’s what we have so far:

n 1 2 3 4 5 6
f(n) 1 1 6 6 141 141
2*f(n)-1 1 1 11 11 281 281

Computing $f(6)$ took a really long time, so let’s eventually wrap this in a .hs file, compile, and execute it with timing.

But before we go further, let’s check the Online Encyclopedia of Integer Sequences to see if any of the sequences (1,1,6,6,141,141), (1,6,141), or (1,11,281) appear at all. It turns out (1,6,141) are the first few digits of A241015, which is the

Number of pairs of endofunctions $f$, $g$ on $[n]$ satisfying $g(g(g(f(i)))) = f(i)$ for all $i$ in $[n]$.

…whatever that means.

If our pattern matches theirs, we might expect the next number in the sequence to be is 6184. We’ll find out in a bit, but first let’s bring runtime down a bit.

# Profiling

OK, it’s time to wrap this in a file, compile, and execute it with timing.

Just for kicks, ./puzzle takes two arguments; the first is the number $n$, and the second is either L, to print the full list of matches, or _, to print the value $f(n)$.

We have a few ways to run this. One is to open up a ghci shell, import the file, and run validPairs <num> or f <num> right there in the shell:

## with ghci

This is great for testing, but we don’t get too many stats. Let’s use a real-time thing like runhaskell.

## with runhaskell

To add stats, we use the +RTS runtime system flag with the -s summary flag.

Very cool. As a final step, we’ll properly compile and optimize it with ghc.

We can run this with +RTS -s too. $./puzzle 3 _ +RTS -s generates the following table for f(1) through f(7): n 1 2 3 4 5 6 7 f(n) 1 1 6 6 141 141 5591 time 0.001s 0.001s 0.003s 0.017s 0.190s 2.301s 26.711s Let’s see if this matches our sequence from earlier. We now have the next value, putting our sequence at (1,6,141,5591). This is not in the OEIS.. So we’ll have to find an expression ourselves, and check with the scripts. Unfortunately, our timing statisics are not optimistic. At this rate,$f(8)$will take around five minutes to calculate. It would be a good idea to parallelize. # Parallelization We can use the Control.Parallel library, (see the docs for more) which gives us the handy par and pseq functions, which force the compiler to split evaluations into separate threads and then wait to recombine. Much of the code has to be modified slightly. I run this on a four-core processor, so I rewrote this to split up the list of candidates into four roughly equal parts, evaluate the validity of each element in each of them, compute the length of that valid sublist, and recombine afterwards. We’ll call this new file puzzle-parallel.hs. You’ll notice we drop support for printing the full list – constructing the sum was easier to parallelize. The file puzzle-parallel.hs can now be compiled to be multithreaded with ghc: and executed on four cores with the +RTS flag -N4 (or -N2, or however many cores you have. The above code is written for four cores, but it’s not hard to modify for any other number of cores.) n 1 2 3 4 5 6 7 8 f(n) 1 1 6 6 141 141 5591 5591 time w/o parallelization 0.001s 0.001s 0.003s 0.017s 0.190s 2.301s 26.711s 501.863s time w/ parallelization 0.001s 0.001s 0.003s 0.021s 0.133s 1.567s 18.489s 208.285s We see around a 2x speedup from this – we expect a 4x from threading, but there is some loss to overhead. This speedup helps more the higher$n$grows. This is better performance across the board. Now calculating$f(9)$isn’t quite as impossible. # Expression Rather than delve deeper into optimization, let’s focus on trying to find a closed-form expression for this sequence. We do this first by visual inspection of the types of pairs generated. Let’s introduce some nomenclature here to turn this from a programming puzzle into a math puzzle. For some$n$, we define$f(n)$to be the number of possible pairs, and$S(n)$to be the set of pairs itself. Thus, f(n) = length S(n). Some observations: • For all even values$n$, we can see S(n) = [10*i for i in S(n-1)]; that is, every element of S(even n) is an element of S(odd n) multiplied by ten. • Thus,$f(n) = f(n-1)$for even n. ## Mapping to a simpler domain We can see from the examples above that for sets generated by, say,$n=3$and$n=4$, only the first two digits really fluctuate. We suspect the problem reduces to a combinatorics problem focused on strings of that length, which we call$m$. So, let’s map the domain of possible$n$values to a simpler domain: We’ll switch to Python here for readability, and also because our algorithm will be imperative in a moment: This alongside future memoizations will help us avoid having to calculate$f(n)$when$f(n-1)$is known. ## The 9-pair We first define the concept of a 9-pair. A 9-pair is a set of two positive numbers which add to 9. 9-pairs are: (0,9), (1,8), (2,7), (3,6), and (4,5). ## Numbers and 9-pairs Here is an example pair from$S(n=5)$: (48155,51845). We can reduce this to its tuple of interest: (4815, 5184). It seems (1,8) and (4,5) are the 9-pairs which compose the alphabet from which this tuple was generated. Since 1 and 8 are a 9-pair, if 1 appears in the first number of the tuple, then we know for sure that: • 1 appears in the second item of the tuple, • 8 appears in the first item of the tuple, • 8 appears in the second item of the tuple. # Generalization: elements and element pairs Let’s generalize this away from digits and number and into elements, sets, and permutations. We will call (a,b) a tuple, a and b strings in that tuple, and the characters in the strings a and b characters. For a given string a or b, any character in a or b uniquely defines its counterpart within an element-pair (i,j): if i in a, then j in b. Since the strings a and b are by definition anagrams, this means j in a as well. Thus a is a unique ordering of pairs of characters, selected from a known alphabet of character pairs. ## Combinatorics on that domain We can reduce this problem to finding the number of unique orderings of a list composed of selecting$x$element-pairs from a list of$p$element-pairs. We call this function$G(x,p)$. Because our real problem lies on the domain of digits, there will only eve be five element-pairs (the five 9-pairs), so$p=5$, always. This is simple (if inefficient) in Python: To get our desired$f(n)$, we can just write: ## Unmemoized timing t0 We can run this with timing: n 1 2 3 4 5 6 7 8 9 10 f(n) 1 1 6 6 141 141 5591 5591 281566 281566 t0 5.9e-6s 1.3e-5s 4.2e-5s 2.2e-5s 1.5e-4s 1.4e-4s 4.3e-3s 4.0e-3s 0.94s 0.94s ## Memoization t1 We can improve performance a lot through memoization. Functions like$G(n,5)$get called repeatedly for the same value of$n$, for example. n 1 2 3 4 5 6 7 8 9 10 t1 9.1e-6s 3.1e-6s 5.5e-5s 3.1e-6s 1.6e-4s 2.9e-6s 5.5e-3s 4.1e-6s 0.91s 1.6e-5s The most obvious boost is that$f(n)$where$n$is even now takes almost no time to compute, since$f(n) = f(n-1)$for even$n$. ## Memoization t2 We can memoize further. The alphabet of characters fed into len(list(set(itr(alphabet)))) is usually something like ['a','A','a','A'] or ['b','B','b','B']. However, it seems obvious to us that the number of unique pairs from the first of these two alphabets will be the same as that of the second. It would be useful to create a non-unique hash of some kind, representing the pair distribution within an alphabet, and check whether len(list(set(itr(alphabet)))) for an equivalent alphabet has already been calculated. This lets us rewrite$G(x,p)$as: This memoization brings down computation time even further, and lets us calculate$f(11)=f(12)$for the first time ever.$n$1 2 3 4 5 6 7 8 9 10 11$f(n)$1 1 6 6 141 141 5591 5591 281566 281566 16397596 t2 6.9e-6 3.1e-6 6.8e-5 1.3e-5 1.4e-4 3.1e-6 6.9e-4 5.0e-6 6.8e-2 7.2e-6 10.5 # Conclusions That’s all! I wasn’t able to further optimize the problem, and I’m not convinced a closed-form solution necessarily exists. That said, we found the number of anagram pairs which sum to$100,000,000,000$, and that’s pretty cool. I also learned a lot about parallelization in Haskell and using the built-in optimization flags in ghc, and a bit of memoization in Python. # Update (11/11/2016) It’s been around a week since last I touched this puzzle. My buddy Josh bounced the problem to his boss Piotr, who came up with this phenomenal closed-form solution for$G(x,p=5)$. It’s brilliant and involves reducing the combinatorics problem, searching for and finding the elusive closed-form solution in the literature, and writing a small haskell script to implement it. In short: The combinatorics section above describes a sequence$a_n^N := \Sigma \left( \dfrac{n!}{p_1! p_2! … p_N!}\right)^2$, where the sum runs over sets of non-negative integers$p_1,..p_N$summing to$n$. This sequence can be found by a recurrence relation, and was performed in Sums of squares of binomial coefficients, with applications to Picard-Fuchs equations by H. A. Verrill in 2008. Table 1 in the above paper describes recurrence relations for$a_n^N$, and we care about the case where$N=5$: the fourth equation is the object of our desire. It reads: $0 = n^4 a_n^5 - (35n^4 - 70n^2 + 63n^2 - 28n + 5)a^5_{n-1}$ $\phantom{0 = } + (n-1)^2(259(n-1)^2 + 26)a^5_{n-2} - (3 \cdot 5)^2(n-1)^2 a^5_{n-3}$ Piotr implemented this as so: (I’ve changed it the tiniest bit to better perform testing on it.) and called with runhaskell math.hs <num> +RTS -s (assuming you save it as math.hs.) As expected, this has a phenomenal runtime.$n\$ 1 10 1e2 1e3 1e4 2e4 3e4 4e4 5e4
runtime 0.183s 0.183s 0.180s 0.198s 0.542s 1.232s 2.032s 3.203s 4.643s

I think that’s about as far as this problem goes. Thanks to Josh and Piotr for doing the actual difficult mathematics.