The 'pbnsolve' Paint-by-Number Puzzle Solver

Jan Wolter

Version: 1.10
Source Repository

1. Introduction

"Paint-By-Number" is one of many names for a type of graphical logic puzzle originating in Japan. Other common names for these puzzles are Nonograms, Griddlers, Hanjie, and Picross. The puzzles consist of a blank grid with numbers along the top and one side. The numbers tell the sizes of the blocks of the foreground color in that row and column. To solve the puzzle, you must reconstruct the image by figuring out how the color blocks must be place in each row and column. One place to find examples of these puzzles is at my web site,

The 'pbnsolve' program is a tool to automatically solve such puzzles. It was written primary to validate puzzles created by users for the site. We solve the puzzle to determine if it has a unique solution and sometimes to get a feel for how easy or difficult it is. There are many similar programs available on the web. Steve Simpson has a nice solver, and links to many others, including a very good solver by Mirek and Petr Olšák. I have posted a page with comparisons of some of these solvers.

The Olšák's solver came close to doing what I needed for, but the fact that the code is all in Czech makes it hard to modify for a person ignorant of that language, so I wrote my own.

Note that pbnsolve is a distinct program from the "helper" in the puzzle solving environment on the site. The helper is implemented in Javascript and is not a full solver. Pbnsolve is used on only in the puzzle creation environment to check newly submitted puzzles.

2. Features

3. Algorithm

3.1. Line Solver

A line-solver is an algorithm that given a single row or column, and the solution so far of that line, tries to figure out what additional cells can be marked. Most humans solve puzzles mostly in this way, by working on one line at a time.

Like the Javascript "helper" program on, and like most every other solver anyone else has built, pbnsolve is based on a left-right overlap algorithm for line solving. It works like this:

  1. Find the left-most solution, with all blocks pushed as far to the left as possible.

  2. Find the right-most solution, with all blocks pushed as far to the right as possible.

  3. Intersect the two solutions.

    1. Any cell that is in the same block in both solutions must be that color.

    2. Any cell that is between the same pair of blocks in both solutions, must be white.

This is fast, but not complete. It will miss some cells that could be marked.

The trickiest part in coding this is finding the left-most and right-most solutions. The code for this in pbnsolve is fairly fast, but extremely ugly, with many special cases.

Unlike most other solvers, pbnsolve handles multicolor puzzles. For each cell, it maintains a list of what colors that cell could be. (This is a major improvement over the "helper" program, in which cells are always either one specific color, or completely unknown.)

The left-right overlap algorithm is extended slightly to handle this case. When we intersect the left and right solutions, we do the following:

For all blocks
For all cells between position of the left-most cell of the block in the left solution, and the right-most cell of the block in the right solution
That cell may be the color of that block.
If you do this, you typically find that there are some cells that can't be some colors, for example if the first clue for a line is a green seven, then the first seven cells in the line can't be red.

Steve Simpson has described what appears to be a better line solving algorithm. Someday I should figure out if that can be adapted to multi-color puzzles.

3.2. Logical Solving

The first thing pbnsolve tries to do is solve the puzzle logically. It creates a job list, which is just a list of lines that we want to run the line solver on (this job list is actually implemented as a heap, so that the highest rated job can always be quickly found and extracted). Initially, it puts every row and column on the job list, rating them by some some simple algorithm that doesn't actually make a whole lot of difference.

Then it enters a loop where it keeps grabbing a line off the job list, and handing it to the line solver. Every time the line solver marks a cell, the crossing lines are put on the job list (or have their priority bumped up if they are already there).

Often, this process will suffice to solve the puzzle. If it succeeds, we are assured that there is a unique solution, and that it is solvable by mortals humans, since humans are as good or better at line solving than the left-right overlap algorithm.

In many cases, however, this will fail to fully solve the puzzle. The job list will empty out, with some cells still in an unknown state.

When pbnsolve reaches this point, it does an exhaustive check to make sure that every cell that can be marked by looking at one clue at a time has been marked. This is necessary because our line solver algorithm is not complete and can miss marking some cells. The exhaustive check tries marking every unsolved cell with each of the colors that it could possibly be, and then checks if the row and column containing that cell can still be solved. If not, it removes that color from that cell's list of possible colors. If this algorithm succeeds in marking any cells, we resume the normal line-by-line solving algorihtm.

If the exhaustive check fails, then logical solving will have failed us, and it is time to try something else.

3.3. Guessing

When the logical solver stalls, we need to fall back to a search strategy. My first cut at this was to implement a straight-forward depth-first search. This can still be used if pbnsolve is run with the -aLG flag.

When the logic solver stalls, the program makes a guess by picking a cell and assigning it a color. It then restarts the logical solver, this time keeping an undo history of all the changes it makes.

One of three things will happen. The first possibility is that the logical slover will stall again, with the puzzle still incomplete. If so, we make another guess and proceed by starting up the logical solver again.

The second possibility is that the logical solver could hit a contradition. This occurs if the line-solver is unable to find any legal solutions for a line. If that happens, we backtrack to the most recent guess, using the undo-history to undo everything we did. We then invert the guess. This is not a guess anymore, we know that the cell must be some color other than the color we guessed before. We can then restart the logical solver and see how far we get.

The third possibility is that the puzzle solves completely. Then, of course, we can stop, unless we are trying to determine if the solution is unique. If we are trying to prove uniqueness, we need to backtrack just as we did in the previous case. If backtracking leads to other solutions, then we know there are multiple solutions. If it only leads to contraditions, then the first solution we found must have been unique.

Choosing good guesses is fairly important to the performance of this algorithm. Several different guessing algorithms are implemented in pbnsolve. You choose between them by editing the config.h file and recompiling.

This guessing algorithm worked well on most puzzles, but there were some puzzles which humans could solve quite well, that the program would just run (seemingly) forever on.

Depth-first search gets hopelessly inefficient if the chain of choice points gets too long. If you have to make ten guesses to to get to a solution, then the search tree has something like 210 branches. Exhaustively exploring all that is pretty hopeless.

However on many of the puzzles where this kind of thing was happening, you could do much better if you made different guesses. Some guesses would get you much further than others, so fewer guesses would be needed to reach a solution. But the guessing heuristics weren't good enough to find these for all puzzles.

3.4. Probing

By default pbnsolve now uses a probing approach. Instead of using some simple heurisitic to make a guess when the logical solver stalls, we generate a long list of possible guesses, explore each one without committing to it, and then finally choose the one that gets closest to solving the puzzle.

The way it works is that it chooses a set of cells that seem likely candidates for a guess. Specifically, it chooses:

  1. Every unsolved cell that is a neighbor of a cell whose state has changed since the last guess was made, and
  2. Every unsolved cell that has two or more neighbor cells that are solved (edge cells count as solved).
(Older versions used only the second set of cells, the idea of adding the first set comes from the Ben Gurion University Solver.) For each of these cells it guesses every color that the cell could be (except that if the cell was set to that color by a probe on a previous cell from the set, we skip that color). For each of these candidate guesses, we run the logical solver until it stalls. The rating for that guess is the number of cells that are solved as a result. When we have explored all candidates, we pick the one that got us furthest, and choose that.

There are some important special cases that come up. Sometimes running the logical solver on one of the candidate guesses will end up solving the puzzle completely. In that case, obviously, we can abort the probing and run with the solution we found.

Similarly, sometimes one of the candidate guesses will lead to a contradiction. This is almost as good. Again, we abort the probing phase, invert the guess that led to the contradiction, and proceed with that.

There are a lot of inefficiencies in this. It does a lot of computation to explore in various directions, and discards all the results of all that work. Then it chooses one and repeats the logical solution of that. Really kind of stupid, and it's not surprising that many of the puzzles that solved quickly with the guessing algorithm, run quite a bit slower with the probing approach.

But it's not exponentially slower in those cases. In many cases where the guessing approach got lost in an exponential search, the probing approach finds better guesses, making the search tree substantially shallower, and thus winning an exponential speed improvement. This can be seen in the Full Results Table, where it can be seen that though pbnsolve is a bit slower on many puzzles of moderate difficulty, it is a lot faster on some really hard puzzles.

Originally I believed that the success of the probing algorithm was founded on the fact that it finds guesses that get us farther toward the solution, thus reducing the number of branch-points in the search tree and reducing the n in its 2n size. But when I started collecting statistics, I discovered that something different was actually happening. On the puzzles where probing really performs well, pbnsolve was almost never choosing the probe that got it the furthest toward the goal, because it was almost always finding a contradiction before the probe sequence ended. Here are some statistics on a few difficult puzzles, from pbnsolve-1.08:

Puzzle Size CPU Time Number of Probing Sequences
Total Ended in Contradiction
or Solution
Ended in
a Guess
#1611: Merka 55 x 60 0.01 sec 25 25 100.0% 0
#1694: Tragic 45 x 50 0.03 sec 193 191 99.0% 2
#6739: Karate 40 x 40 0.79 sec 9,961 9,351 93.9% 610
#2040: Hot 55 x 60 0.82 sec 2,508 2,417 96.4% 91
#2712: Lion 47 x 47 6.0 sec 36,036 33,910 94.1% 2,126
#6574: Forever 25 x 25 9.4 sec 221,846 194,838 87.8% 27,008
#8098: 9-Dom 19 x 19 10.2 sec 164,216 152,037 92.6% 12,178
#2556: Flag 65 x 45 118.7 min  127,421,555  102,554,824 80.5%  24,866,731

Many of the most troublesome puzzles are the ones with very large numbers of solutions, like Lion and Flag, or ones where you have to look many moves ahead to find contradictions, like 9-Dom. These are exactly the kinds of situations where contradictions are hard to find.

So it appears what probing is really about is pruning the solution space by aggressively searching for contradictions.

3.5. Plod & Sprint

The table above suggests that while most of the time probing works very well, on a few rare puzzles it fails badly. Curiously, about half of these puzzles were solved much more effectively by the old guessing algorithm. I think this mostly happens in puzzles with large numbers of very different solutions, where searching for contradictions tends to fail because there are solutions down many paths. Then just diving down paths until you find two solutions and can stop may be much faster than mapping out the whole solution space.

So in version 1.09 pbnsolve, by default, uses a somewhat kludgy "plod & sprint" algorithm. It starts out plodding through the solution space with the standard probing algorithm. But if it is not finding many contradictions, if the rate of finding them falls below 88%, then we switch to sprint mode for a while, making our decisions with the guessing algorithm for a while. After about 4000 iterations of that, we give it up and resume plodding for a while.

In practice, most puzzles are solved without ever sprinting. When we do sprint, it actually slows us down a little in about half the cases. But in the remaining cases it gives us quite impressive speed ups, solving some puzzles in a fraction of a second that used to take hours. So this is a useful trick, if not a particularly brilliant one.

3.6. Merging

If pbnsolve is run with the -aM flag, the it tries merging the results of all probes. In probing we try setting a cell to different colors, and compute all the consequences of each of those alternatives. With merging extra bookkeeping is done so that we can notice if any other cell was set to the same value no matter what value the original cell was set to. If this is the case, then that cell can be set to that value. Thus, this basically does Two-Way logic.

Though the merging code works, it doesn't actually improve the performance of the solver. The overhead for the extra bookkeeping almost always exceeds the gains made. So, by default, pbnsolve does not do merging, but it can be turned on by a command line option.

I think basically this would only get us better performance if it found merges that worked in situations where contradictions could not be found. Most of the time we do find contradictions, and merges work out in only a fairly small percentage of the remaining cases, so they don't give you a lot of benefit.

3.7. Caching

A group of students at Ben-Gurion University wrote a solver that uses similar algorithms to pbnsolve. One of the interesting tricks they introduce was to cache the solutions from the line solver in a hash table. A version of this has been added into pbnsolve 1.07.

Basically, everytime we run the line solver described in section 3.1, we save the result in a hash table. The line clue and the initial state of all the cells in the line form the hash key, and the final state of the line forms the data. We check the hash table before every call to the line solver, and skip the call if we can get the solution from the hash table. If the hash table gets too big, we simply empty it. In pbnsolve's implementation, we enlarge the hash table after the first few times it is is flushed. We also do a lot of compression of the data so the hash table doesn't get too big.

This works amazingly well. I would have expected hit rates on the table to be rather low, but they aren't. For a typical hard puzzles, one sees hit rates in the 80% to 98% range. Apparantly once the solver starts searching there is a lot of re-solving of the same problems. So this seems to cut the run-times on hard puzzles by 33% to 50%. The memory cost is substantial, but even small, frequently flushed tables pay off pretty well.

4. Performance

A detailed comparision between pbnsolve and other available solvers is available. Most puzzles designed by humans for humans to solve actually turn out to be quite easy and can be readily solved by almost any solver. Pbnsolve can solve the vast majority of puzzles in a fraction of a second.

However, for every solver there seems to be a few rare puzzles that cause much more difficulty, where the search becomes lost in an exponentially branching tree of guesses. Pbnsolve seems to do better on most of these than other solvers, probably due to the probing algorithm, but there some rare puzzles that pbnsolve cannot solve in a reasonable amount of time.

All solvers are going to go exponential on some puzzles. We know this, thanks to some folks who proved that solving paint-by-number uzzles is NP-complete. Pbnsolve does fairly well, but I think there is still room for improvement.

5. Directions for Improvement

A better line solver would help, because better line solving means less need for guessing, and less guessing means shallower search trees. I suspect that this would only give a fairly modest performance improvement in most cases, as most of the things that would find are currently caught by the exhaustive check.

I think a bigger performance improvement would come from a better search algorithm, perhaps using something like an A* algorithm instead of a depth-first search. Such searches don't explore in only a single direction, but instead explore along many branches of the search tree simultaneously. The down side is that all the leaves of the tree have to be kept in memory simultaneously, so this can eat more memory the longer it runs. (The current pbnsolve program uses only a modest amount of memory and memory consumption doesn't grow much no matter how long it runs.)

An interesting direction to go would be to try to apply more of the special reasoning tricks that humans use in solving more complex puzzles, like edge logic. This would be hard to do, but it would make the program into a better tool for assessing how hard the puzzle is for humans to solve.

There are whole classes of other non-traditional search algorithms that might work well here. Sometimes when a person solves a puzzle, they find that due to a mistake they are in a contradictory state. Sometimes then you can move things around to try to fudge the solution back into shape. Some kind of relaxation algorithm that temporarily produces illegal solutions and then fudges them back into shape might be interesting.