"Paint-By-Number" is one of many names for a type of graphical logic puzzle originating in Japan. Other common names for these puzzles are Nonograms, Griddlers, Hanjie, and Picross. The puzzles consist of a blank grid with numbers along the top and one side. The numbers tell the sizes of the blocks of the foreground color in that row and column. To solve the puzzle, you must reconstruct the image by figuring out how the color blocks must be place in each row and column. One place to find examples of these puzzles is at my web site, webpbn.com.
The 'pbnsolve' program is a tool to automatically solve such puzzles. It was written primary to validate puzzles created by users for the webpbn.com site. We solve the puzzle to determine if it has a unique solution and sometimes to get a feel for how easy or difficult it is. There are many similar programs available on the web. Steve Simpson has a nice Javascript solver, and links to many others, including a good solver by Mirek and Petr Olák. The latter came close to doing what I needed for webpbn.com, but the fact that the code is all in Czech makes it hard to modify for a non-speaker of that language, so I wrote my own.
Note that pbnsolve is a distinct program from the "helper" in the puzzle solving environment on the webpbn.com site. The helper is implemented in Javascript and is not a full solver. Pbnsolve is used on webpbn.com only in the puzzle creation environment to check puzzles.
Like the Javascript "helper" program on webpbn.com, and like most every other solver anyone else has built, pbnsolve is based on a left-right overlap algorithm for line solving. It works like this:
This is fast, but not complete. It will miss some cells that could be marked.
The trickiest part in coding this is finding the left-most and right-most solutions. The code for this in pbnsolve is fairly fast, but extremely ugly, with many special cases.
Unlike most other solvers, pbnsolve handles multicolor puzzles. For each cell, it maintains a list of what colors that cell could be. (This is a major improvement over the webpbn.com "helper" program which in which cells are always either one specific color, or completely unknown.)
The left-right overlap algorithm is extended slightly to handle this case. When we intersect the left and right solutions, we do the following:
For all blocksIf you do this, you typically find that there are some cells that can't be some colors, for example if the first clue for a line is a green seven, then the first seven cells in the line can't be red.For all cells between position of the left-most cell of the block in the left solution, and the right-most cell of the block in the right solutionThat cell may be the color of that block.
Steve Simpson has described what appears to be a better line solving algorithm. Someday I should figure out if that can be adapted to multi-color puzzles.
Then it enters a loop where it keeps grabbing a line off the job list, and handing it to the line solver. Every time the line solver marks a cell, the crossing lines are put on the job list (or have their priority bumped up if they are already there).
Often, this process will suffice to solve the puzzle. If it succeeds, we are assured that there is a unique solution, and that it is solvable by mortals humans, since humans are as good or better at line solving than the left-right overlap algorithm.
In many cases, however, this will fail to fully solve the puzzle. The job list will empty out, with some cells still in an unknown state. If pbnsolve is run with the -t flag, then it will halt at that point, with the puzzle incompletely solved.
When the logic solver stalls, the program makes a guess by picking a cell and assigning it a color. It then restarts the logical solver, this time keeping an undo history of all the changes it makes.
One of three things will happen. The first possibility is that the logical slover will stall again, with the puzzle still incomplete. If so, we make another guess and proceed by starting up the logical solver again.
The second possibility is that the logical solver could hit a contradition. This occurs if the line-solver is unable to find any legal solutions for a line. If that happens, we backtrack to the most recent guess, using the undo-history to undo everything we did. We then invert the guess. This is not a guess anymore, we know that the cell must be some color other than the color we guessed before. We can then restart the logical solver and see how far we get.
The third possibility is that the puzzle solves completely. Then, of course, we can stop, unless we are trying to determine if the solution is unique. If we are trying to prove uniqueness, we need to backtrack just as we did in the previous case. If backtracking leads to other solutions, then we know there are multiple solutions. If it only leads to contraditions, then the first solution we found must have been unique.
Choosing good guesses is fairly important to the performance of this algorithm. Several different guessing algorithms are implemented in pbnsolve. You choose between them by editing the config.h file and recompiling.
This guessing algorithm worked well on most puzzles, but there were some puzzles which humans could solve quite well, that the program would just run (seemingly) forever on.
Depth-first search gets hopelessly inefficient if the chain of choice points gets too long. If you have to make ten guesses to to get to a solution, then the search tree has something like 210 branches. Exhaustively exploring all that is pretty hopeless.
However on many of the puzzles where this kind of thing was happening, you could do much better if you made different guesses. Some guesses would get you much further than others, so fewer guesses would be needed to reach a solution. But the guessing heuristics weren't good enough to find these for all puzzles.
The way it works is that it chooses every unsolved cell that has two or more neighbor cells that are solved (edge cells count as solved), and it guesses every color that the cell could be for each cell. For each of these candidate guesses, we run the logical solver until it stalls. The rating for that guess is the number of cells that are solved as a result. When we have explored all candidates, we pick the one that got us furthest, and choose that.
There are some special cases that come up. Sometimes running the logical solver on one of the candidate guesses will end up solving the puzzle completely. In that case, obviously, we can abort the probing and run with the solution we found.
Similarly, sometimes one of the candidate guesses will lead to a contradiction. This is almost as good. Again, we abort the probing phase, invert the guess that led to the contradiction, and proceed with that.
There are a lot of inefficiencies in this. It does a lot of computation to explore in various directions, and discards all the results of all that work. Then it chooses one and repeats the logical solution of that. Really kind of stupid, and it's not surprising that many of the puzzles that solved quickly with the guessing algorithm, run quite a bit slower with the probing approach.
But it's not exponentially slower in those cases. In many cases where the guessing approach got lost in an exponential search, the probing approach finds better guesses, making the search tree substantially shallower, and thus winning an exponential speed improvement.
Pbnsolve was tested on every puzzle currently in the webpbn.com database. Of the 1006 published puzzles, which generally meet minimal standards of sanity, it was able to solve all but puzzle 672. This puzzle is not logically solvable by humans either, nor is the Olák program able to solve it.
The unpublished puzzles on the webpbn.com site represent puzzles that either the authors were never happy with, or which were rejected from the site, usually for being insolvable by the users. Of 106 unpublished puzzles on the site, 15 could not be solved by pbnsolve.
All solvers are going to go exponential on some puzzles. We know this, thanks to some folks who proved that solving paint-by-number puzzles is NP-complete. Pbnsolve does fairly well, but I think there is still a lot of room for improvement.
A better line solver would help, because better line solving means less need for guessing, and less guessing means shallower search trees. I suspect that this would only give a fairly modest performance improvement in most cases, as most of the things that would find are currently caught by the special case of a probe leading to a contradiction. However more puzzles would be solved by line-solving alone, and thus could be automatically identified as being humanly solvable, so doing this would make pbnsolve into a better tool for automatic puzzle checking.
I think a bigger performance improvement would come from a better search algorithm, perhaps using something like an A* algorithm instead of a depth-first search. Such searches don't explore in only a single direction, but instead explore along many branches of the search tree simultaneously. The down side is that all the leaves of the tree have to be kept in memory simultaneously, so this can eat more memory the longer it runs. (The current pbnsolve program uses only a modest amount of memory and memory consumption doesn't grow much no matter how long it runs.)
An interesting direction to go would be to try to apply more of the special reasoning tricks that humans use in solving more complex puzzles, like edge logic. This would be hard to do, but it would make the program into a better tool for assessing how hard the puzzle is for humans to solve.
There are whole classes of other non-traditional search algorithms that might work well here. Sometimes when a person solves a puzzle, they find that due to a mistake they are in a contradictory state. Sometimes then you can move things around to try to fudge the solution back into shape. Some kind of relaxation algorithm that temporarily produces illegal solutions and then fudges them back into shape might be interesting.
pbnsolve-1.0.tgz