Effect of Line Solution Caching on Pbnsolve Run-times

Jan Wolter

Pbnsolve is a program for solving paint-by-number puzzles (also called nonograms). This page describes the impact of caching line solutions on its performance.

I first saw this idea used in the BGU solver. Whenever we solve a line, we store the solution in a hash table, keyed by the row clues, and the initial state. We check that table before running the line solver each time, and if there is a solution stored there, we use that instead of running the line solver. I implemented this idea in version 1.07 of pbnsolve.

A few details of that implementation:

In later versions of pbnsolve I added a refinement to the algorithm. Sometimes you'll have clues that are identical except for being reversed. For example two rows of the same length might have clues "3 1 1" and "1 1 3". These rows are actually the same. If you recognized them during the "clue numbering" phase, then whenever you are saving or looking up a solution for one of those lines you can reverse the inital and final state strings of the reversed clues. This costs very little extra computation, and I was hoping it would reduce the size of the hash tables and improve the hit rate. It does, but only very slightly. Even on 9-Dom, which has lots of reversed clues, the hit rate was improved only 0.3%.

Here are some runtimes and hit rates on some sample puzzles. These statistics were collected on computer B using pbnsolve 1.07. (There are newer, faster versions of pbnsolve, but I have not rerun these statistics.)

Pbnsolve normally uses probing to make guesses, but there is some older code in it that uses a heuristic function (not an amazingly good one in version 1.07). Probing naturally involves more re-solving of similar problems, so I thought it would be interesting to see how the caching worked with a heuristic search algorithm as well. A "+" sign indicates a runtime longer than 30 minutes.

Pbnsolve 1.07 Run Times with and without Caching
puzzle size Probing Heuristic
No CachingCachingHit RateImprovement No CachingCachingHit RateImprovement
pbnsolve command line flags: -aLEP -aLEHP     -aLEG -aLEHG    
#436: Petro* 40 x 35 0.12s 0.06s 88% 50% 2.2m 1.6m 86% 27%
#4645: M&M 50 x 70 0.10s 0.07s 90% 30% 0.14s 0.12s 70% 14%
#3541: Signed 60 x 50 0.04s 0.04s 72% 0% 2.0s 1.6s 74% 20%
#803: Light 50 x 45 0.95s 0.68s 98% 28% + +    
#6574: Forever* 25 x 25 16.8m 8.0m 98% 52% 2.1s 1.4s 93% 33%
#2040: Hot 55 x 60 1.24s 0.56s 94% 54% + +    
#6739: Karate 40 x 40 7.7s 3.2s 93% 58% 1.5m 47.2s 93% 48%
#8098: 9-Dom* 19 x 19 28.1s 20.3s 96% 28% 7.3m 5.7m 93% 22%
#2556: Flag 65 x 45 + +     7.8s 5.2s 92% 33%
#2712: Lion 47 x 47 1.9m 44.9s 97% 61% + +    

Note that the hit rates are very high. I would never have guessed that they would be as high as this. In solving the "Forever" puzzle with probing pbnsolve did 986 million line solves, of which 974 million were satisfied from the cache, even though 4 cache flushes were done.

As expected, the hit rates are a bit lower when we do heuristic search rather than probing, but still pretty high.

We get substantial run time improvements even when the hit rates are modest, but the hit rate alone doesn't really predict how much improvement we get in run time. Pbnsolve's line solver is very highly optimized, and on some lines probably isn't that much slower than a hash lookup (especially with the overhead of compressing and uncompressing lines). If we are mostly hitting on lines that would have solved fast anyway, then the speedup isn't going to be so great.

Note that the speedups tend to be higher on larger puzzles, which have longer lines, and thus more expensive line solves. The 9-Dom puzzle might seem like a natural candidate for caching, because it has many duplicate clues, but the lines are short and the clues are simple, so caching doesn't save us as much. (This may be why it is one of the puzzles that gives the BGU solver the most trouble.)

So this is a very effective enhancement that is easy to implement within almost any solver.