Pbnsolve is a program for solving paint-by-number puzzles (also called nonograms). This page describes the impact of caching line solutions on its performance.

I first saw this idea used in the BGU solver. Whenever we solve a line, we store the solution in a hash table, keyed by the row clues, and the initial state. We check that table before running the line solver each time, and if there is a solution stored there, we use that instead of running the line solver. I implemented this idea in version 1.07 of pbnsolve.

A few details of that implementation:

- Normally pbnsolve uses two hash tables, one for rows, one for columns.
However, if the puzzle is square, then just one table is used.
- Before starting, it assigns "clue numbers" to each clue, such that rows or
columns have the same clue number if and only if they have the same
length and the same clues. These numbers are stored with the solutions in
the hash table, so that solutions for one line can be reused for other
lines with identical clues.
- For each line solution, it coverts the initial and final state of the line
into a compressed binary representation to minimize the storage used.
The initial state and the clue number are concatinated to form the hash
key. The final state is the data.
- The hash tables are fixed size. Hashing is done with a standard double
hashing algorithm.
- When the hash tables reach 90% fill, they are simply emptied. This
limits their memory consumption, but obviously negatively impacts the
hit rates.
- Each time the hash table is emptied, it's size is doubled. For
rectangular puzzles we start with two tables of size 25,000. For
square puzzles we start with one of size 50,000. They are doubled
until they reach 800,000 and then the doubling stops. This keeps us
from using too much memory for puzzles that don't need that much
searching, while making cache flushes relatively infrequent for more
difficult puzzles.
- Hashing is not turned on until the first guess is made. If the puzzle can be solved entirely by the line solver, without doing any search, then it never gets used. This is again a memory saver, since hit rates are likely to be lower during that phase, and pbnsolve doesn't really need to be sped up on such puzzles. Note that though hit rates are low during that phase, they tend to be higher than one would expect, due to the large number of duplicate clues in many puzzles.

Here are some runtimes and hit rates on some sample puzzles. These statistics were collected on computer B using pbnsolve 1.07. (There are newer, faster versions of pbnsolve, but I have not rerun these statistics.)

Pbnsolve normally uses probing to make guesses, but there is some older code in it that uses a heuristic function (not an amazingly good one in version 1.07). Probing naturally involves more re-solving of similar problems, so I thought it would be interesting to see how the caching worked with a heuristic search algorithm as well. A "+" sign indicates a runtime longer than 30 minutes.

puzzle | size | Probing | Heuristic | ||||||
---|---|---|---|---|---|---|---|---|---|

No Caching | Caching | Hit Rate | Improvement | No Caching | Caching | Hit Rate | Improvement | ||

pbnsolve command line flags: | -aLEP | -aLEHP | -aLEG | -aLEHG | |||||

#436: Petro* |
40 x 35 | 0.12s | 0.06s | 88% | 50% | 2.2m | 1.6m | 86% | 27% |

#4645: M&M | 50 x 70 | 0.10s | 0.07s | 90% | 30% | 0.14s | 0.12s | 70% | 14% |

#3541: Signed | 60 x 50 | 0.04s | 0.04s | 72% | 0% | 2.0s | 1.6s | 74% | 20% |

#803: Light | 50 x 45 | 0.95s | 0.68s | 98% | 28% | + | + | ||

#6574: Forever* |
25 x 25 | 16.8m | 8.0m | 98% | 52% | 2.1s | 1.4s | 93% | 33% |

#2040: Hot | 55 x 60 | 1.24s | 0.56s | 94% | 54% | + | + | ||

#6739: Karate | 40 x 40 | 7.7s | 3.2s | 93% | 58% | 1.5m | 47.2s | 93% | 48% |

#8098: 9-Dom* |
19 x 19 | 28.1s | 20.3s | 96% | 28% | 7.3m | 5.7m | 93% | 22% |

#2556: Flag | 65 x 45 | + | + | 7.8s | 5.2s | 92% | 33% | ||

#2712: Lion | 47 x 47 | 1.9m | 44.9s | 97% | 61% | + | + |

Note that the hit rates are very high. I would never have guessed that they would be as high as this. In solving the "Forever" puzzle with probing pbnsolve did 986 million line solves, of which 974 million were satisfied from the cache, even though 4 cache flushes were done.

As expected, the hit rates are a bit lower when we do heuristic search rather than probing, but still pretty high.

We get substantial run time improvements even when the hit rates are modest, but the hit rate alone doesn't really predict how much improvement we get in run time. Pbnsolve's line solver is very highly optimized, and on some lines probably isn't that much slower than a hash lookup (especially with the overhead of compressing and uncompressing lines). If we are mostly hitting on lines that would have solved fast anyway, then the speedup isn't going to be so great.

Note that the speedups tend to be higher on larger puzzles, which have longer lines, and thus more expensive line solves. The 9-Dom puzzle might seem like a natural candidate for caching, because it has many duplicate clues, but the lines are short and the clues are simple, so caching doesn't save us as much. (This may be why it is one of the puzzles that gives the BGU solver the most trouble.)

So this is a very effective enhancement that is easy to implement within almost any solver.