Survey of Paint-by-Number Puzzle Solvers

Jan Wolter

1. Introduction
1.1. Procedure
1.2. Solving vs. Validation
1.3. Notes on Run Times
1.4. Other Reports
2. Sample Results
2.1. Run Times
2.2. Memory Usage
3. Full Results
4. Random Results
5. Discussions of Individual Solvers
5.1. Dedicated Stand-Alone Solvers
5.1.1. Jan Wolter's pbnsolve Program
5.1.2. Mirek and Petr Olšák's Nonogram Solver
5.1.3. Steve Simpson's Nonogram Solver
5.1.4. Ben-Gurion University Solver
5.1.5. Jakub Wilk's Nonogram Solver
5.1.6. Kenichi Ishigaki's Games::Nonogram Perl Module
5.1.7. Richard Wareham's Nonogram Solver
5.1.8. Frans Faase's Solver Project
5.2. Solvers Based on General Purpose Tools
5.2.1. Mikael Lagerkvist's Gecode Nonogram Example
5.2.2. Hakan Kjellerstrand's MiniZinc Nonogram Solver
5.2.3. Robert Bosch's IP Solver
5.2.4. Andrew Makhorin's Example IP Solver
5.3. Embedded Solvers
5.3.1. Juraj Simlo's Griddlers Solver
5.3.2. Alexander Meshcheryakov's QNonograms
5.3.3. Jan Wolter's webpbn.com Javascript Helper
6. Conclusions
7. Other Solvers

1. Introduction

"Paint-By-Number" is one of many names for a type of graphical logic puzzle originating in Japan. Other common names for these puzzles are Nonograms, Griddlers, Hanjie, and Picross. The puzzles consist of a blank grid with numbers along the top and one side. The numbers tell the sizes of the blocks of the foreground color in that row and column. To solve the puzzle, you must reconstruct the image by figuring out how the color blocks must be place in each row and column. One of the many charms of these puzzles is that they are quite easy to create as well as to solve.

I run a community paint-by-number website, webpbn.com, where people create and solve such puzzles. To help validate puzzles created there, I wrote a program to automatically solve such puzzles, pbnsolve, which is open source. I'm still not entirely happy with its capabilities and performance, so I thought I'd survey some of the solvers out there, using Steve Simpson's list of solvers as a starting point.

1.1. Procedure

My testing procedure is generally to figure out how to get the solver running on my computer (which normally runs Linux but can be booted into Windows), write some code so that I can export puzzles from the webpbn.com database in the correct format to be read by the solver, and then test the solvers on the puzzles. I try to document in detail how I went about installing and running the solver.

I start out by testing each solver on a small collection of interesting puzzles, collecting run times in a table. Then for the faster solvers, I run them on a collection consisting of every black & white puzzle posted on webpbn.com as of a certain date. Some statistics on memory usage were also collected, and a round of tests on randomly generated puzzles has been done.

Note that this document has been updated many times. Older links and/or citations may well have refered to older versions which were significantly different.

1.2. Solving vs. Validation

There is an important distinction to be made between simply solving a puzzle and validating it. In solving a puzzle, the goal is simply to find a solution. This seems to be the goal that most designers of solvers have in mind for their programs.

But I wrote my solver as a tool to check puzzles designed by users of the webpbn site. Designers of puzzles always work by drawing a puzzle image, and then generating the clues from that. But not every image makes a good puzzle. Some lead to puzzles with multiple solutions. Others may have a unique solution, but be too difficult or too easy. Thus designers need to test solve their puzzles, make adjustments, and then test solve them again. This procedure can grow tedious, so it is nice to have a program that can automatically check the puzzle, determining if the solution is unique, and perhaps giving some feedback on how hard it is to solve.

A program to do puzzle validation has a bit of a tougher job than a simple solver. A solver can stop after finding one solution. A validator must try to find a second. In the case of a puzzle that only has one solution, it must completely explore the solution space, eliminating every possible alternative solution.

This difference impacts the design of the solver in a fairly deep way. The heuristics in a simple solver should be designed to get it to the goal solution as fast as possible. They might, for example, exploit regularities that often appear in human designed puzzles to guess correctly which cells should be black, so as to get more quickly to the solution and avoid running into dead ends.

A validator, on the other hand, loves finding the contradictions that constitute dead ends, because they reduce size of the search tree. It's in no particular hurry to find the correct solution. Since the puzzle was generated from a drawing, we know that the correct solution is out there somewhere. Our goal is not to find it, but to explore the whole search tree. The heuristic search strategy is likely to be focused on making guesses that have many logical consequences or lead to contradictions, rather than on making guesses that are "correct".

So far as I can tell, only two of the programs here were consciously designed as validators, my own pbnsolve program and the Ben-Gurion University Solver. Most of the other solvers can be used as validators simply by asking them to return two solutions, but being able to do the task isn't the same as being designed for the task.

In the run-time tests performed for this survey, I always ask the solvers to find two solutions, unless the solver doesn't support that option. This obviously puts the solvers that weren't really designed for that at a disadvantage. When you run them, you often see them spit out one solution very quickly, and then spend a long time before it finally decides that there are no others. This may seem unfair to these solvers, and it is, but I don't really see much of any practical use for a simple solver. A validator is much more useful so that's what I'm focusing on here.

Paradoxically, there are cases where the strategy of looking for a solution fast is actually the superior strategy for validation. This occurs in certain puzzles with very large numbers of solutions. Because of the large number of possible solutions, contradictions are difficult to find, and trying to explore the whole search space can bog you down hopelessly. A simple dash to the finish line, even if it has to be performed twice, may be more effective when finish lines are plentiful. So it's possible that the best possible solver would actually have to find some middle ground between these strategies.

It should also be noted that if the application is puzzle validation, then there is another useful input that could be given to a validator: the drawing from which the puzzle was generated. My pbnsolve program is the only one I've seen that can take this as in input and use it in an aid to validation. For example, if the first solution found does not match this, then you already know you have multiple solutions without having to search further. More could be done though. There are, for example, certain patterns that could be easily recognized in the solution that indicate that there are multiple solutions.

In the tests performed here, we never give the solver the goal image, placing the solvers and the validators on leveler ground, but designers of practical solvers should probably be thinking of effective ways to use such data.

1.3. Notes on Run Times

A few comments on the run-times reported in this survey:

1.4. Other Reports

Steve Simpson has an excellent collection of links to nonogram solvers solvers and othere nonogram related resources, which has been invaluable to me.

Hakan Kjellerstrand has done some independent testing of some constraint-based solvers on the same sample data set used here.

2. Sample Results

I've accumulated a small sample set of puzzles that I use for testing solvers. The intent is to include a range of difficulties and enough diversity to challenge most browsers.

2.1. Run Times

Run time results for the sample set are given in the table below. There is a key below the table.

Run Times on Sample Set of Black & White Puzzles
puzzle size Notes Wolter Olšák Simp-
son
BGU Lager-
kvist/
Gecode
Kjeller-
strand/
Gecode
Kjeller-
strand/
Lazyfd
Wilk* Ishi-
gaki
*
Ware-
ham
*
Bosch/
glpk
*
Mak-
horin/
glpk
*
#1: Dancer* 5 x 10 Line 0 0 0 0.06s 0.01s 0,01s 0.04s 0 0.04s 0.05s 0.01s 0.03s
#6: Cat* 20 x 20 Line 0 0 0 0.08s 0.01s 0.02s 0.44s 0 0.28s 0.12s 2.0s 7.2s
#21: Skid* 14 x 25 Line, Blanks 0 0 0 0.08s 0.01s 0,02s 0.64s 0 Fail X X 7.1s
#27: Bucks* 27 x 23 Blanks 0 0 0 0.16s 0.01s 0.02s 1.1s 0 Fail X X 12.7s
#23: Edge* 10 x 11   0 0 0 0.09s 0.01s 0.01s 0.08s 0 Fail Fail 0.49s 0.15s
#2413: Smoke 20 x 20   0 0 0 0.19s 0.01s 0.02s 0.60s 0 1.3s 17.3m 10.8m +
#16: Knot* 34 x 34 Line 0 0 0 0.20s 0.01s 0.02s 5.5s 0 18.8s Fail Fail +
#529: Swing* 45 x 45 Line 0 0 0 0.25s 0.02s 0.04s 13.2s 0.03s 4.3s Fail - -
#65: Mum* 34 x 40   0.01s 0 0.01s 0.26s 0.02s 0.04s 11.0s 0.36s + Fail - -
#1694: Tragic 45 x 50   0.04s 0.03s 0.54s 0.62s 0.14s 3.6m 32.1s 1.9s + - - -
#1611: Merka* 55 x 60 Blanks 0.01s 0.01s 0 0.35s 0.03s + 1.1m 5.6m + X X -
#436: Petro* 40 x 35   0.06s 15.2s 0.10s 0.58s 1.4m 1.6s 6.2s 9.2m + - - -
#4645: M&M 50 x 70 Blanks 0.07s 0.10s 0.02s 0.84s 0.93s 0.28s 48.1s 14.0m - X X -
#3541: Signed 60 x 50   0.04s 1.1s 5.4m 0.68s 0.57s 0.56s 1.0m + - - - -
#803: Light 50 x 45 Blanks 0.38s + 0.02s 1.1s + + 4.7s + - - - -
#6574: Forever* 25 x 25   3.7s 2.0s 18.9s 44.3s 4.7s 2.3s 1,7s + - + + +
#2040: Hot 55 x 60   0.83s + 1.2m 0.93s + + 2.6m + - - - -
#6739: Karate 40 x 40 Multiple 0.80s 17.3s 18.3m 0.61s 56.0s 57.8s 13.6s + - - - -
#8098: 9-Dom* 19 x 19   11.0s 4.0m + 2.8m 12.6m 3.8s 1.8s 0.01s + - + -
#2556: Flag 65 x 45 Multiple, Blanks 0.55s 1.5s 0 24.2s 3.4s + 4.2s + - X X -
#2712: Lion 47 x 47 Multiple 6.3s + + 7.8s + + + + - - + -

Notes:
 Line= Puzzle is line solvable. That is, it can be solved one line at a time.
Blanks= Puzzle contains blank rows or columns.
Multiple= Puzzle has more than one solution.
Footnotes:
*= Puzzles designer has given permission to redistribute this puzzle
*= Solver was only looking for one solution to puzzles, instead of two.
Run Times:
0= Run time less than a hundredth of a second
#s= Run time in seconds
#m= Run time in minutes
+= Run time exceeded 30 minutes
Fail= Program terminated without completely solving the puzzle
X= Puzzle had blank lines and program couldn't handle them
-= Test was not performed

I've had several people report running times quite a bit different from the ones reported here. Hakan Kjellerstand used the lazyfd solver to solve the "Lion" puzzle in two minutes on his computer, but it took me 55 minutes to solve. Mikael Kjellerstand solved it with gecode in 13 minutes on his computer, but I gave up after 11 hours. I don't know why there are these big differences. The computers are obvoiusly different, but not that different.

The 9-Dom puzzle is especially interesting because you can easily make larger versions of it, 10-Dom, 11-Dom, etc. You can use this series of puzzles to test the effect of puzzle size on solver run times.

I currently only know of two solvers that can handle multicolor puzzles, but I decided to start developing a sample set of multicolor puzzles just so I can use it to track improvements in my solver. Generally speaking, multicolor puzzles are easier to solve, but unskilled puzzle designers are often attracted to using a lot of colors, so there are a disproportionate number of really awful multicolor puzzles in the webpbn database. Some of these are quite a challenge for a solver, while being exactly the class of puzzles where automated validation would be most useful.

Note that when we count colors, we include the background color (white), so a black & white puzzle has two colors, and multicolor puzzles have three or more. Webpbn only supports five colors.

Run Times on Sample Set of Multicolor Puzzles
puzzle size colors Notes Wolter Olšák
#47: Map 31 x 18 5 Line 0 0
#220: Onions 40 x 40 5 Line 0 0
#1503: Bob 25 x 25 3   0.01s 15.7s
#2257: JarJar 55 x 60 3   0.10s 14.4s
#4940: Babe 45 x 45 3   0.11s 0.10s
#5193: Face 30 x 30 5 Multiple 1.3s 0.05s
#2684: Censored 48 x 64 4 Multiple 7.7s 2.78s
#2073: Peanut 35 x 35 3   0.04s 6.3m
#4364: Snowman 40 x 40 5   0.04s 6.3m
#2817: Shark 50 x 45 3   0.13s 5.6m
#4809: EFB 50 x 40 4 Multiple 3.2s 15.7m
#2814: Comet 45 x 50 3   0.07s +
#3149: Ex 40 x 40 5 Multiple 0.58s +
#4445: Pinup 66 x 99 4 Multiple 0.40s +
#2984: Heart 25 x 25 5 Multiple 0.05s 0
#2498: Gecko 75 x 45 5 Multiple 1.5s +
#3620: Addition 20 x 20 5 Multiple + 0.01s
#672: Web 52 x 52 4 Multiple + +

2.2. Memory Usage

I've made some attempt to study the memory usage of the different solvers, but this is difficult as there is no easy way under Linux to get the maximum or average memory usage of a process. The operating system simply does not collect that information. So instead I did repeated runs under different address space rlimits, until I found the smallest memory limit under which the program would still run. This method requires many runs of the program however, so I could only do it for puzzles where the run times were fairly small.

It should also be noted that measuring the memory usage of processes on modern operating systems like Linux can be a bit complex, since some memory is shared between different processes. I believe the values reported here include all memory used by the program, including memory for shared libraries and such. If your server has other things running that use the same libraries (e.g., other Perl or Java programs) then the memory cost might be lower than these numbers suggest.

The table below summarizes these results. The first column shows the memory used to solve a very small, very easy 10x5 puzzle. This is more or less the minimum memory usage.

Then runs were done on various other puzzles from the list above that had fairly small runtimes. Each solver was tested on a different set of puzzles, since solvers are fast on different puzzles. The run which used the most memory is reported. This second value shouldn't be taken as an upper limit to the solver's memory usage, just as an indication of how much memory usage grows as the puzzle complexity grows. Note that some solvers, including the first three on the list and the last on the list, used pretty much the same amount of memory on all puzzles tested.

Solver Memory Usage on
#1: Dancer
Largest Memory Usage
Observed on Sample Set
Simpson 3.8M 3.8M all puzzles
Olšák 7.2M 7.2M all puzzles
Wilk 10.4M 10.4M all puzzles
Wolter 13.6M 21.5M on #6739: Karate
Bosch/glpk 9.4M 15.1M on #6: Cat
Makhorin/glpk 9.6M 17.6M on #6: Cat
Ishigaki/perl 19.4M 27.4 M on #2413: Smoke
Lagerkvist/Gecode 106.6M 112.4M on #4645: M&M
Kjellerstrand/Gecode 108.9M 122.3M on #4645: M&M
Wareham/mono 160.0M 163.4M on #6: Cat
Kjellerstrand/Lazyfd 20.5M 689.2M on #4645: M&M
BGU/java ~2,240M ~2,240M all puzzles

The differences here are quite spectacular, but, of course, it should come as no surprise that the systems that use interpretors or complex general purpose libraries have a bigger memory footprint. The BGU solver breaks all previous records, using twice as much memory as all other solvers combined. But it's not clear to what extent it actually uses all that memory. Java seems to like to grab a lot of extra virtual memory.

3. Full Results

The puzzles in the table above are not by any means typical puzzles. Most of the puzzles created for humans to solve are similar to the first two on the list, and can be quickly solved by even fairly simple-minded solvers. The puzzles listed later in the list were mostly selected from those that gave the pbnsolve program problems, and so the sample is somewhat skewed against pbnsolve.

In my experience, solvers tend to be rather finicky. A change in the algorithm that makes it run a bit faster on one puzzle sometimes makes it run much slower on others. For example, pbnsolve's probing algorithm helps it cope with many difficult puzzles, but it fails badly on the "Flag" puzzle in the list above, which it can only solve if you turn off probing. Because of this, it's hard to judge any solver based on just a short list of puzzles.

So given a more typical set of puzzles, how many of them does each solver solve? To test this, I used a snapshot of the webpbn.com puzzle database (as of Oct 2, 2009) containing 2,491 black and white puzzles and 3,232 multicolor puzzles, and ran some of the more interesting solvers on every single one. Note that one puzzle in the sample set, #8098, is not included in this "full" set, since the snapshot predates it.

Some important notes about this test set:

The number of black and white puzzles solved in various time ranges is tabulated below, sorted by the rightmost columns. They were run on the same computer with the same compilation and execution options used in the sample results. To keep these tests from running excessively long, any run that took longer than two minutes was interupted.

2,491 Black & White Puzzles
Solver 0 to
0.09 sec
0.10 to
0.19 sec
0.20 to
0.49 sec
0.50 to
0.99 sec
1.00 to
3.99 sec
4.00 to
9.99 sec
10.00 to
29.99 sec
30.00 to
59.99 sec
1 min to
2 min
2 min
or more
Wolter 2474 8 3 4 1 1 0 0 0 0
BGU 632 756 1029 59 11 2 1 1 0 0
Simpson 2439 10 4 5 10 2 3 0 2 16
Olšák 2414 13 10 8 16 3 6 3 2 16
Lagerkvist/Gecode 2380 23 20 12 12 12 4 5 5 18
Kjellerstrand/Lazyfd 125 92 409 343 712 424 284 66 17 19
Kjellerstrand/Gecode 2306 42 31 15 20 14 5 5 3 50
Wilk* 2107 60 76 52 54 31 19 9 11 72

The rows of this table are sorted by the right-most columns. The BGU solver was on top for a while, but I borrowed some of there better ideas and edged them out again. It's not particularly amazing that the two solvers that were developed with this sample set in hand come out on top.

But what this table really demonstrates is that the vast majority of puzzles are easy. It doesn't actually take all that much sophistication to solve most of them pretty darned fast. Only a tiny fraction of the puzzles designed for humans to solve are really hard for programs to solve.

Another thing that stands out in these tables is that the Olšák solver and the Simpson solver look very similar. But this similarity is actually somewhat illusionary. Both solvers required more than two minutes for 16 puzzles, but it isn't the same 16 puzzles. In fact, there are only 3 puzzles on both lists.

Given that most puzzles are easy, and different puzzles are hard for different solvers, it is very hard to thoroughly test a solver without a large sample set of puzzles to test it on. Tests based on small sample sets give very misleading results. The sample set I used in the previous section was partly generated by picking puzzles from these full runs that gave various solvers problems, so it includes at least some "hard" puzzles for every solver, but there is no guarantee that that will be true for the next solver to come along.

On my webserver, I limit runtimes to one second of CPU time. Most of these solvers would be able to solve the vast majority of puzzles in that amount of time:

Solver Percent Solved
in Under
a Second
Wolter 99.9%
BGU 99.4%
Simpson 98.7%
Olšák 98.2%
Lagerkvist/Gecode 97.8%
Kjellerstrand/Gecode 96.1%
Wilk* 92.1%
Kjellerstrand/Lazyfd 38.9%
Since the basic design goal for pbnsolve was to solve as many puzzles as possible in under a second, it's not surprising to find it on top of this list. It actually solves all but 21 puzzles in under a tenth of a second.

Only pbnsolve and the Olšák solver can handle multicolor puzzles. The following results were found when running both solvers on all 3,232 multicolor puzzles in the webpbn.com database.

3,232 Multicolor Puzzles
Solver 0 to
0.09 sec
0.10 to
0.19 sec
0.20 to
0.49 sec
0.50 to
0.99 sec
1.00 to
3.99 sec
4.00 to
9.99 sec
10.00 to
29.99 sec
30.00 to
59.99 sec
1 min to
2 min
2 min
or more
Wolter 3215 6 2 1 3 1 0 0 0 4
Olšák 3178 9 5 4 7 6 4 0 0 19

Though handling color puzzles adds a lot of complexity to the solver, since many more special cases can come up, the majority of color puzzles are easier to solve than black and white puzzles, just because contradictions between what colors the row clues says a square can be and what colors the column clues say it can be help narrow down the range of possibilities. However, the rare pathelogical multicolor puzzle can manage to be a lot more pathelogical, because the fact that white spaces between color blocks aren't required means there can be a whole lot more possible solutions to a line where the crossing clues happen to be unhelpful.

Again, the puzzles that are hard for the two solvers are not always the same puzzles. There are only four multicolor puzzles that neither solver could solve in less than two minutes: #672, #2498, #3085 and #3114.

4. Random Results

Testing against webpbn puzzles is not ideal, because of the biases introduced by the historical entanglement of that site with the pbnsolve program. One way one might try to avoid that is to test against a set of randomly generated puzzles instead.

For this test, I generated a set of 5000 two-color, 30x30 puzzles. Each puzzle was generated by painting each grid cell black or white with a 50% probability. Each solver was then run against this set of puzzles. An tarball containing the puzzle grids used in this test can be downloaded.

These puzzles are not very similar to human designed puzzles. The vast majority have multiple solutions, and scarcely any are solvable without guessing by working on one line at a time. The breakdown is shown below, in comparison to that of the full sample of black and white puzzles from webpbn:

    Random Puzzles     Webpbn Puzzles  
Multiple solutions: 97.5% 2.6%
Unique solution but not line solvable: 2.1% 15.6%
Unique solution and line solvable: 0.4% 81.8%

So, though these tests avoid a priori biases, the data isn't very much like any real world puzzle data, and it is certainly going to favor some solvers over others. In particular, solvers whose main strength is fast line solving aren't going to perform as well as ones whose main strength is effective searching.

Here are the run time results, sorted by the rightmost column:

5,000 Randomly Generated 30x30 Puzzles
Solver 0 to
0.09 sec
0.10 to
0.19 sec
0.20 to
0.49 sec
0.50 to
0.99 sec
1.00 to
3.99 sec
4.00 to
9.99 sec
10.00 to
29.99 sec
30.00 to
59.99 sec
1 min to
2 min
2 min
or more
BGU 0 23 2852 1790 284 22 12 8 4 5
Wolter 4362 221 169 70 88 38 21 6 7 18
Kjellerstrand/Lazyfd 0 0 0 0 616 2921 1208 176 50 29
Olšák 2879 333 381 244 405 199 146 84 61 268
Kjellerstrand/Gecode 2227 461 495 311 491 238 205 115 97 360
Lagerkvist/Gecode 2325 381 468 308 460 251 235 104 79 389
Simpson 2568 306 309 213 348 215 224 105 91 621
Wilk* 644 310 484 370 697 424 452 237 231 1151

Once again, the Lazyfd solver and the BGU solver are on a completely different planet than the other solvers. The Lazyfd solver didn't find a single puzzle it could solve in under a second. One took 1.7 seconds, one took 1.8 seconds and all the rest took more than two seconds. (Note that lazyfd's run times are very sensitive to the size of the puzzle. All of the fast times in the full sample are on puzzles much smaller than 30x30.) The BGU solver is less extreme, but it still needs more than half a second for 99.5% of all puzzles. But though these solvers are slow on the easy puzzles, they are quite amazing on the hard ones. They solved substantially more puzzles in under 2 minutes than any other solver.

Simpson's solver didn't do as well as I expected on this set of puzzles. The performance of pure depth-first search systems depends very much on the choice of good heuristics for choosing which direction to search in. Simpson's heuristics seems to work very well for human-designed puzzles but not as well for these randomly generated puzzles. This is not too surprising, since this is a ridiculous set of puzzles, and nobody in their right mind would tune their solver to work optimally on them.

Pbnsolve and the BGU solver rely less on heuristics than on probing strategies, which are kind of a poor-man's version of breadth-first search, so its not too surprising that they adapt fairly well to this set of puzzles.

The two Gecode solvers are pretty close together on this test. Evidentally neither of the two search heuristics has a pronounced advantage on this problem set.

As a way of deciding which program is "best" at solving nonograms, testing on random puzzles is absurd. These puzzles are quite unlike the population of puzzles likely to be encountered in any practical application. I think these tests do, however, give interesting insights into the strengths and weaknesses of these different solvers.

5. Discussions of Individual Solvers

5.1. Dedicated Stand-Alone Solvers

The solvers in this section are programs written from scratch for the express purpose of solving paint-by-number puzzles, and not much else. Most are command line programs that read in input describing the puzzle and return a solution.

5.1.1 Jan Wolter's pbnsolve Program

URL: http://webpbn.com/pbnsolve.html

Version Evaluated: 1.09

Limitations: None.

Assessment: Among the fastest and most flexible solvers.

Description: This is the solver I wrote myself, and I have already described it in way too much detail. It should be noted that the study of other solvers for this survey has had substantal positive results for pbnsolve, as significant improvements have been made based ideas from the BGU solver, Simpson's solver, and others. The performance of some past versions is documented on a separate page.

The most salient point here is that it uses a "probing" strategy. When its straight logic solving gets stuck, then instead of using some kind of heuristic function to make a guess, it generates a long list of possible guesses, and evaluates every single one with its logic solving engine until it stalls again. Usually at least one of these will end in a contradiction, so the inverse of the guess can be considered proven. If no probes lead to a contradiction, we take the one that solved the most cell as our guess, and let the search proceed from there.

This gives the solver some of the attributes of a breadth-first search system, in that it looks some distance ahead on many different possible paths instead of being guided by a heuristic function. But it doesn't consume memory as hungrily as a real breadth-first search system. It pays the price in doing a lot of extra logic solving, but as the logic solver is highly optimized, the run-time overhead is not too horrible. Caching line solver solutions (an idea borrowed from the BGU solver) further reduces the overhead.

Set-up: Compilation instructions are in the README file.

In production, pbnsolve is given a copy of the intended solution, and can use this to make uniqueness checking faster in some cases. To be fair to the other solvers we did not do that in these tests. All test runs were done with the -u flag to disable the use of goal solutions from the input file.

By default pbnsolve takes input in XML format. But XML files can sometimes be extremely slow to parse, since doing so may require fetching the DTD files over the Internet. To avoid this, most tests were done with input files in Steve Simpson's .non format.

Results: It's most interesting to compare this to Simpson's solver, which does a pure depth-first search with a carefully tuned search heuristic to select guesses. Both solvers are very fast at finding logical consequences. Simpson's solver is usually bit faster on the easier puzzles, where pbnsolve's expensive strategy for making guesses costs it more than it gains it. The payoff is on the harder puzzles, where pbnsolve is substantially more likely to be able to find its way to a solution. Better guesses mean fewer guesses and thus less combinatorial explosion.

On some puzzles, like the Flag, this approach used to fail pbnsolve badly. so it would take pbnsolve almost two hours to solve that puzzle. Eventually I "fixed" that problem by having it notice when too few probes were resulting in contradictions. Then it switches over to a classic heuristic search strategy, using heuristic functions borrowed from Simpson's solver.

5.1.2. Mirek and Petr Olšák's Nonogram Solver

URL: http://www.olsak.net/grid.html

Version Evaluated: 1.2

Limitations: None.

Assessment: A fast and flexible solver, with excellent line solving and heuristic search heuristic search capabilities.

Description: This is an open-source C-language solver that can handle color puzzles and triddlers as well as traditional paint-by-number solvers. The code is richly commented, but in Czech. English documentation is minimal. The documentation claims the line solver has linear complexity. Can read two different file formats. A number of test puzzles are included.

Set-up: It's all in one C file. Compile it and you're ready to go. I ran my tests like

   ./grid -total 2 -log 1 datafile
The -log option turns off the profuse default logging.

There are upper bounds for various things compiled into the program. To be able to handle all the puzzles in my multicolor puzzle set, I had to increase MAXBLOKU, which I think is the maximum number of clue numbers in a line. (Note that color puzzles can have twice as many clues per line as black & white puzzles.)

Results: The numbers in the table above tell the story. This is a very fast solver.

Note that earlier versions of this survey showed this solver performing somewhat poorly on the full results, but that turned out to be due to a bug in my batch execution script.

5.1.3. Steve Simpson's Nonogram Solver

URL: http://www.comp.lancs.ac.uk/~ss/software/nonowimp/

Version Evaluated: nonogram-1.9.12 with nonolib-4.3.0.

Limitations: Black and white only.

Assessment: Excellent solver with a terrific heuristic search capability.

Description: An open source solver library and command line interface written in C99. A Java version also exists but was not tested.

Set-up: Precompiled versions are availabe for Windows and RISC OS, but alas, I wanted to run in under Linux. To do that you need to get both the nonolib library and the nonogram command line tool.

The first step in building the library is to create a file named nonolib-env.mk and put some configuration information into it. There is a sample in the configuration in the INSTALL file, but it's bizarre, suggesting a -ansi flag although the program is in C99, not ANSI C. I used:

    INSTALL=install
    INSTALL_PATH=/usr/local
    RANLIB=ranlib
    AR=ar
    CFLAGS += -std=c99 
Then 'make' and 'make install' work fine. I tried doing this with -O2 added to the CFLAGS but this cause gcc-4.2 to give a segmentation fault while compiling fcomp.c. I've since tested this with gcc-4.4.1 and it compiles fine with that.

Compiling the nonogram command line program follows a similar procedure. For nonogram-env.mk, I used:

    INSTALL=install
    INSTALL_PATH=/usr/local
    CFLAGS += -I$(INSTALL_PATH)/include
    LDFLAGS += -L$(INSTALL_PATH)/lib
    CFLAGS += -std=c99
For all tests, I ran it with the command ./nonogram -s 1 -i datafile .

Results: As can be seen from the run times in the table, this is clearly a first rate solver, very close in performance to the Olšák's solver. It also consumes far less memory than the other solvers.

5.1.4. Ben-Gurion University Solver

URL: http://www.cs.bgu.ac.il/~benr/nonograms/

Version Evaluated: 1.0.1 / Java 1.6.0_17

Limitations: Black and White only.

Assessment: An extremely good solver, especially on the hardest puzzles.

Description: This is the first instance of a solver written by people who had already seen this survey and had a copy of the full webpbn data set to test on during development. In December 2009, Ben Raziel wrote me saying he was part of a group of students writing a nonogram solver for an undergraduate 3rd year project at Ben-Gurion University. He wanted test data. I sent him the 2491 puzzles used in the full tests here. I tested the first version, which use JaCoP for line solving in March 2010. The second version, described here uses their own code for line solving.

From the description on the web page, it's clear that it uses a probing approach quite similar to that used by pbnsolve, but it does more probes at each decision point. This gives it a better chance of finding a guess that gets it far, but increases the cost of guessing.

A clever idea introduced in this solver was the use of a hash table to cache the results of line solver runs, so they can be reused if the same situation comes up again. Especially with a probing approach, one tends to solve similar problems over and over again. I liked this idea so much I copied it in pbnsolve, where I routinely see amazingly high hit rates on the cache. This give a big performance boost even if the line solver is fairly fast. I think this simple idea would improve the performance of most solvers.

Set-up: This solver reads datafiles in the same format as Wilk's solver, probably because that's the format I supplied them with test data in. Hurrah! For the first time ever, I didn't need to write a new translator!

I downloaded the file bgusolver.jar from the web site. It is run like

   java -jar bgusolver.jar -file datafile.nin -maxsolutions 2
I was advised that on 64-bit computers, 64-bit versions of Java give the best results. I was using a 64-bit version of Java.

Results: Memory usage is extremely high, far more than any other solver tested, but constant. It needs a bit over two gigabytes of memory to run, but it uses the about the same amount of memory for all puzzles and memory usage does not increase as the run continues. Curiously, the memory usage does vary from one run to another, so if you run it twice on the same data, the memory usage will be a bit different. (This variation from run to run wrecks havoc with my method of testing maximum memory usage by doing repeated runs with different rlimits, so my memory measurements are somewhat approximate.) It's not clear how much of this memory it actually uses. Java just seems to like to grab a lot of virtual memory, whether it is going to use it or not.

The run times on simple problems are not spectacular. Java is an interpreted language, and thus, naturally, rather slow. The authors report that compiling the program improves its performance on the easier puzzles (though not on the harder puzzles). I haven't tried that. But my suspicion is that part of this is also a basic design trade off - the authors were focusing more on solving hard puzzles fast than on solving easy puzzles fast.

But this really works very well on the harder puzzles. It solves every puzzle in the full 2,491 webpbn puzzle data set in under 45 seconds. No other solver tested was able to solve all puzzles.

Of course, this solver, like pbnsolve, was trained on the webpbn data set, which certainly gives it an advantage when being tested against that data set. The 9-Dom puzzle, which takes it 2.8 minutes to solve was added to the sample set more recently. It's the only puzzle I've seen that was solved more quickly by the previous version of the BGU solver than the new solver. Still, if it can quickly solve everything in the webpbn puzzle set, then that's a pretty strong indication that it can solve most of the puzzles likely to be encountered in any real application.

5.1.5. Jakub Wilk's Nonogram Solver

URL: http://jwilk.nfshost.com/software/nonogram.html

Version Evaluated: nonogram_0.8.4.4

Limitations: Black and white only.

Assessment: Good at line solving, but not so good at searching.

Description: This is an open-source C-language solver. It comes with a large collection of nice data files. Documentation is minimal. Looks like a linesolver with some search capability on top of that. It does an uncommonly nice job of pretty-printing the solved puzzles.

Set-up: Compile it by typing "make" in the distribution directory.

I was running it with the command

   ./nonogram -m < datafile
The -m gives less garish output.

Note that there does not seem to be any way to ask this solver to find two solutions, which we normally do for all solvers. Thus our runtime numbers actually overstate it's performance on many puzzles. The extremely fast run time on the 9-Dom puzzles is probably just because it got lucky. The Olšák solver found the first solution equally quickly, but then spent a long time confirmining that there were no other solutions. Wilk's solver is just stopping at the first.

Results: This solver appears to be fast and capable at line solving, though maybe a bit less so than some of the top solvers. But if line solving doesn't fully solve the puzzle, it declares the puzzle "ambiguous" but says it will try to solve it anyway, starting a heuristic search. It's heuristic search gets the right answers, but not particularly fast. This solver's performance on puzzles where search is needed falls significantly behind some of the other solvers.

5.1.6. Kenichi Ishigaki's Games::Nonogram Perl Module

URL: http://search.cpan.org/~ishigaki/Games-Nonogram-0.01/

Version Evaluated: Games-Nonogram-0.01

Limitations: Black and white only. No blank lines?

Assessment: Somewhat weak and somewhat slow.

Description: This is somewhat weirdly packaged as a Perl module. Documentation is sparse. I don't know what algorithm it uses, but it seems to be some kind of line solver with some search capability. No sample data is given in the distribution, and the details of the input format require some guessing. It's not clear how blank lines should be handled, if they can be handled at all.

Set-up: You install the module just like you'd install any other perl module, following the simple steps in the README file. I think you can ignore any messages that "make test" gives about Test::Pod modules not being installed. But this only installs a library. To run the thing you've got to write a Perl program that uses it. But luckily it's a very short Perl program:

    #!/usr/bin/perl
    use Games::Nonogram;
    $start= (times)[0];
    Games::Nonogram->bootstrap;
    $end= (times)[0];
    printf "CPU Time: %.2f seconds\n",$end-$start;
Put that in a file, permit it executable, and you can run it, giving the file name for the input file on the command line.

Note that this is looking only for a single solution, while we were asking most of the other solvers to find two solutions.

Results: This solves most small puzzles just fine, though slower than other solvers. It is, after all, written in an interpreted language.

It seems to be able to handle at least some simple puzzles with multiple solutions, reporting all possible solutions, so it must have some trial and error search capabilities.

However it failed to solve some other fairly easy puzzles, like the line-solvable Skid puzzle, where it printed out a partially complete solution. It also got only partway through Bucks which is not line solvable. For the Edge puzzle, which is not line solvable, it says:

   Block 1 is broken. (Games::Nonogram::Block::left overflow: 4)
   Unless you're trying to solve by brute force,
   there may be something wrong in the puzzle data.

5.1.7. Richard Wareham's Nonogram Solver

URL: http://www-sigproc.eng.cam.ac.uk/ga/index.php?title=User:RichWareham/Nonograms

Version Evaluated: no version number.

Limitations: Black and white only.

Assessment: Slow and seriously buggy.

Description: An open source solver written in C#. A few sample puzzles are included in the testgames.txt file. Nothing in the way of algorithmic documentation. It appears to be a a line solver, though the line solving algorithm looks kind of inefficient (try all possible arrangements of white spaces). Might have some guessing ability.

Set-up: I used the mono compiler to build this. It was actually already installed on my Linux system, and is probably available in packages for most linux systems. I built it with "make -f Makefile.Nonograms".

It doesn't read the puzzle data from a data file like most solvers, instead it takes them from the command line. This seemed rather awkward to me, so I wrote a little script that would run it like this:

  mono Nonograms.exe `cat datafile`

I believe this is looking for only one solution, not two solutions.

Results: It ran OK on a couple of smaller puzzles, though the "Edge" puzzle caused it to simply say "Oh I give up".

Then I tried it on the Knot puzzle. On computer A, it locked up my windowing system until I could only kill it with difficulty, and I gave up on further tests. On computer B, I was able to watch its memory usage quickly climb over 4 gigabytes before it crashed. The same thing appeared to happen on the next few puzzles I tried. My impression is that some bug causes it to go into an infinite memory allocating loop on most larger puzzles.

5.1.8. Frans Faase's Solver

URL: http://www.iwriteiam.nl/Dpuzzle.html#nono

Version Evaluated: mega_nono_090823.cpp

Limitations: Black and white only.

Assessment: Inconclusive, but doesn't seem to work well.

Description: Frans Faase has been keeping an on line diary since before anyone was calling them blogs, and he occasionally mentions work on a nonogram solver. I tried testing the version linked to by his latest blog entry.

Set-up: To be able to compile with g++ under Linux I had to make one change to the file, replacing the constant "CLK_TCK" with "CLOCKS_PER_SEC". When run a lot of output is generated on standard output, and an image of the puzzle solution found is generated in a file called "mega_nono_2.sol".

Results This solves some simple puzzles quickly, but it fails on most complex puzzles. Even some line solvable puzzles like the Knot were not fully solved, and no puzzles that were not line solvable were solved. It never seems to run for long, giving up quickly on most hard puzzles, but the author's blog contains many references to it running for a long time, so maybe I'm doing something wrong, or maybe the latest is not the greatest.

5.2. Solvers Based on General Purpose Tools

These solvers use general purpose constraint programming or linear programming tools to solve paint-by-number puzzles. They are generally very small, simple demonstration programs built on top of very sophisticated packages designed to solve broad classes of problems.

The Constraint Programming (CP) community especially has long used nonograms (everyone in the CP community call paint-by-number puzzles by that name) as a standard example problem. A description of the nonogram problem is included in a standard list of constraint problems.

Because these are usually written just as demonstration programs, they typically don't have very elaborate user interfaces. Many require recompilation to load in new puzzles. If you wanted to use these in a practical nonogram solving application, you might need to wrap some more code around them.

5.2.1. Mikael Lagerkvist's Gecode Nonogram Example

URL: http://www.gecode.org/doc-latest/reference/classNonogram.html

Version Evaluated: Version distributed with gecode 3.2.2

Limitations: Black and white only.

Assessment: An amazingly good solver, especially for a simple demo program.

Description: The first example of such an implementation that I tested was one written by Mikael Lagerkvist and distributed as an example program for the Gecode constraint solving library. Gecode is an open source package written in C++. Lagerkvist's example nonogram solver is a simple little C++ program, with only a couple hundred lines of code, that constructs a constraint set describing a paint-by-number puzzle, calls Gecode to solve it, and prints the result. It's a bit bare-bones, lacking even the ability to read data in from files. The puzzles to be solved have to be compiled into the program.

The basic concept here is to describe the problem to be solved, in this case a particular paint-by-number puzzle by a set of constraints. One form of constraints that Gecode supports is the requirement that a string match a regular expression. As it happens, line clues in paint-by-number puzzles are easily converted to regular expressions, and that's the way this solver represents the puzzle to the solver.

Apparantly the program can also provide some portion of the search heuristics to be used to find a solution, maybe choosing between different search strategies and providing heuristic functions. I haven't studied this in detail.

The heuristic search strategy being used turns out to be fairly similar to the usual one-line-at-a-time depth-first search algorithms used by other solvers, since it does a one-constraint-at-a-time depth-first search.

Set-up: I downloaded the tarball from gecode.com, and followed their compilation instructions, first doing

   ./configure
   make 
This compiles the package, but doesn't install it. You can run the examples without installing it, just by the LD_LIBRARY_PATH environment variable to the build directory and then invoking the program like
   ./example/nonogram -solutions 2 1 
The "-solutions 2" argument tells how many solutions to find. The default is one, but since we are benchmarking solvers' ability to do uniqueness checking for puzzles, we want it to try to find two solutions if possible. The last argument tells which of the compiled in-examples to solve.

To be able to run the test problems, I modified the examples/nonogram.cpp file from the distribution by embedding my own examples into that file. The fact that data is compiled in instead of being read in at runtime gives a bit of an unfair advantage to this program, but it would obviously not be difficult to add that capability to the program, and if you did it would not increase its run time noticably.

Note that the runtimes I report are not the ones printed by the program. I report the same Unix CPU time values that I report for all the other programs tested.

Results: This performs quite well. The table of results doesn't show zero times for the smaller puzzles like some of the other solvers do. It seems like there is about 0.01 seconds of overhead in all runs, but that hardly matters. What matters is that it solves many of the harder puzzles with good times, easily outperforming the Wilk solver. It comes pretty close to matching the dedicated solvers.

This is especially impressive because the faster solvers are large, complex programs custom designed to solve paint-by-number puzzles. This one is a general purpose solver with just a couple hundred custom lines of code to generate the constraints, run the solver, and print the result. Considering that this is a simple application of a general purpose solving tool rather than hugely complex and specialized special purpose solving tool, this is an astonishingly good result.

Getting really first class search performance usually requires a lot of experimentation with different search strategies. This is awkward and slow to do if you have to implement each new strategies from scratch. I suspect that a tool like Gecode lets you try out lots of different strategies with relatively little coding to implement each one. That probably contributes a lot to getting to better solvers faster.

5.2.2. Hakan Kjellerstrand's MiniZinc Nonogram Solver

URL: http://www.hakank.org/constraint_programming_blog/2009/09/at_last_2_a_nonogram_solver_us.html

Version Evaluated: nonogram_create_automaton2.mzn / minizinc 1.0.3 / Gecode 3.2.2

Limitations: Black and white only.

Assessment: An very good solver, especially for a simple demo program.

Description: This is another example of a paint-by-number solver from the constraint programming community, written in MiniZinc. MiniZinc is a special language for describing constraint programming problems that is supposed to be solver-independent, so that you can describe your problem once, and then try solving it with multiple different solvers. Basically the high-level MiniZinc model is translated into a low-level language called FlatZinc, which is understood by many different solvers.

Kjellerstrand's MiniZinc model for solving nonograms is one of hundreds of programs he has developed to solve different kinds of logic puzzles in different constraint programming systems. It converts the line clues into regular expression constraints, exactly the same way that Lagerkvist's solver does. It's a tiny bit more of a rounded program than Lagerkvist's Gecode solver, since the descriptions of the particular puzzle to solve is loaded from a separate data file, but the data files basically still need to be compiled in.

Because the the same model description can be used with different solvers, the performance varies depending on which solver you use.

Set-up: Kjellerstrand's script is called nonogram_create_automaton2.mzn. It takes data files in the MiniZinc .dzn format. He provides many examples.

I tried three different setups to try it with different solvers. One general annoyance here is that if you get things configured a bit wrong, or leave some of the options off the command lines, it will generally not give you an error message. Instead it will fall back to solving in some less efficient way than you thought you were asking for. This can make debugging your setup tricky. I got lots of help from Kjellerstrand.

Note also that there are plenty of other solvers that understand FlatZinc that I did not test.

Setup with G12 Solver

First I installed the current release of the MiniZinc-to-FlatZinc translator. I downloaded minizinc-1.0.3 from the G12 website. This includes precompiled binaries, so I just had to unpack the tarball and run "sh SETUP" in the distribution directory. The bin subdirectory contained mzn2fzn, a MiniZinc to FlatZinc translator, flatzinc, a FlatZinc-based solver, and minizinc, a wrapper that runs the previous two programs. To run the program with the G12 FlatZinc solver, we do:
   minizinc --data datafile.dzn nonogram_create_automaton2.mzn 
But the whole point of the MiniZinc language is that you can create FlatZinc programs for different solvers, so the translation to FlatZinc and the execution of the solver are normally separated, like this:
   mzn2fzn --data datafile.dzn nonogram_create_automaton2.mzn
   flatzinc nonogram_create_automaton2.fzn | perl mzn_show.pl "{trtr:12: #:} {nospace} {noresult}" 
Here mzn_show.pl is Kjellerstrand's Perl pretty printing program for displaying the final solution. Apparantly the output logic in the Kjellerstand's model ceases to work when execution is separated from translation, so he wrote an external program to present pretty output in that case.

I did only minimal tests with this configuration. It works fine, but both Kjellerstand and I had better results with the Gecode solver.

Setup with the Gecode Solver
I had already installed the Gecode 3.2.2 package to be able to test Lagerkvist's solver, and set-up instructions are given in that section of this survey. Of course, you also need the G12 mzn2fzn program installed in the previous section.

One more thing needs to be done before you can use mzn2fzn to generate a FlatZinc file for Gecode/FlatZinc. We need to give the translator some more knowledge of Gecode's capabilities. Under Minizinc's lib/minizinc directory, you should create a new directory named gecode and copy the contents of the directory gecode/flatzinc/mznlib from the Gecode distribution into it. Gecode/FlatZinc can run FlatZinc files generated without these extensions, but performance is greatly improved by using them.

Once you've done this, you can run the model in Gecode/FlatZinc with the commands below. This is what I used for the speed tests reported above.

   mzn2fzn -G gecode --data datafile.dzn nonogram_create_automaton2.mzn
   fz -solutions 2 nonogram_create_automaton2.fzn | perl mzn_show.pl "{trtr:12: #:} {nospace} {noresult}" 
Here the '-G gecode' flag on the mzn2fzn command causes the translator to use the special Gecode/Flatzinc library we copied over, and the -solutions 2' command on the fz command tells it to try to produce two solutions, so that we are doing uniqueness checking instead of simple solving.
Setup with the G12 Lazyfd Solver
After I posted the results of the tests on the Gecode solver, Kjellerstrand suggested trying it using the G12 Lazyfd solver, which apparantly uses a hybrid of two different solving techniques.

To do this, I first had to install a pre-release version of the G12 Minizinc package. I downloaded the "release of the day" for November 2, 2009 and installed it in just the same way as I installed the version 1.0.3 package.

Then I had to first make a small modification to the nonogram_create_automaton2.mzn model by deleting the search strategy hints. Apparantly the solver performs better without such hints. We change this hunk of code:

    solve :: int_search(
         [x[i,j] | j in 1..cols, i in 1..rows],
         first_fail,
         indomain_min,
         complete)
    satisfy;
To this:
    solve satisfy;
Then we run it with the commends below:
   mzn2fzn -G g12_lazyfd --data datafile.dzn nonogram_create_automaton2.mzn
   flatzinc -S -n 2 --solver lazy nonogram_create_automaton2.fzn | perl mzn_show.pl "{trtr:12: #:} {nospace} {noresult}" 

For the time tests, I included only the time to run the solver, fz or flatzinc, not the time to run mzn2fzn. Maybe I should have included both, since you have to run both each time you want to solve a new puzzle. However, this would only have added a small constant to all run times, so it's not going to make that big a difference to the overall assessment.

Results: I ran tests with Kjellerstrand's MiniZinc model on both the Gecode and G12 Lazyfd solver.

Results with the Gecode Solver

It's interesting to compare the performance with Gecode to Lagerkvist's C++ Gecode solver. Though they are quite similar, there are distinct differences. In the sample results there are puzzles where each is faster, but overall Lagerkvist's solver looks a little better. The full results make it clear that the difference is quite substantial.

When comparing this to the dedicated paint-by-numbers solvers, it's easy to get distracted by the fact that it is slower than the dedicated solvers on the easier puzzles. But though most of the puzzles in the full test set are easy puzzles, we really aren't talking about much of a speed difference, not more than a twentieth of a second. It's on the minority of hard puzzles that this solver shines.

Looking at the full results, it is clear that this system still falls short of the top three dedicated solvers. But it's important to remember that nonogram_create_automaton2.mzn contains only about 100 lines of code. From Kjellerstrand's blog, it is obvious that these 100 lines are the result of a fair amount of thought and experimentation, and a lot of experience with MiniZinc, but the Olšák solver is something like 2,500 lines and pbnsolve approaches 7,000. If you tried turning this into a fully useful tool rather than a technology demo, with input file parsers and such, it would get a lot bigger, but clearly the constraint programming approach has big advantages, achieving good search results with low development cost.

Results with the G12 Lazyfd Solver

The sample results for Kjellerstrand's MiniZinc model running on the G12 Lazyfd solver look quite different from any other solver's results. It is honestly pathetically slow on the easy problems.

But the remarkable thing is that it solves almost everything in the sample set. Actually, it even solved the Lion puzzle, although it took 55.2 minutes to do it. Now, I didn't allow a lot of other solvers 55 minutes to run, but that's still pretty impressive. (Note that Kjellerstrand got much faster solving times for the Lion than I did.)

In the full results, we see that it solved all but 19 puzzles in under 2 minutes, which puts it close to the Simpson and Olšák solver. I suspect that the 2 minute cutoff in that test is actually too low for Lazyfd. For the other solvers, the majority of puzzles that don't solve in under two minutes won't solve in two hours either. It's not clear that this is true for Lazyfd.

It does have a down side though. On some puzzles this thing uses staggering amounts of memory. When I tried running it on my full collection of puzzles on computer A it got hung up on puzzle #354, because it needed 1.4 gigabytes of memory. Since computer A only has a gigabyte of memory, this started it thrashing hopelessly. This is a large puzzle, but not especially hard. On computer B, which has more memory, the Lazyfd solver could do it in 46.5 seconds. And Simpson's solver could do it in 0.01 seconds using 3.8M memory.

Without actually knowing a lot about how lazyfd works, I'd guess that a lot more effort is being spent on choosing optimal directions to search in. The memory consumption suggests that it may be mixing some degree of breadth-first searching with the depth-first strategies used by most solvers. On easy puzzles, this is massive overkill because they are easily solved no matter which way you direct the search. But it starts paying off on puzzles where other solvers run into a wall of combinatorial explosion and go exponential. From the fact that most of the "hard" puzzles are easily solved by at least one of the solvers surveyed, we know that for any particular puzzle there exists a good search strategy. Lazyfd seems to be pretty good at finding the right strategy for each puzzle, though it spends a lot of extra computation and memory doing so.

These two results highlight the advantage of a solver-independent modeling language like MiniZinc. You can describe your problem once, and then try out a wide variety of different solvers and heuristics without having to code every single one from scratch. You can benefit from the work of the best and the brightest solver designers. It's hard to imagine that this isn't where the future lies for the development of solvers for applications like this.

5.2.3. Robert Bosch's IP Solver

URL: http://www.oberlin.edu/math/faculty/bosch/pbn-page.html

Version Evaluated: No version number / glpk version 4.40

Limitations: Black and white only. No blank lines?

Assessment: Interesting, but slow.

Description: Bosch's tiny program takes an input file that describes the clues to a puzzle and translates it into a set of equations in a form that can be solved by standard integer programming software. Integer programs are like linear programs, except the values of the variables need to take integer values, and, while there are delightfully fast algorithms to solve linear programs, solving integer programs is a substantially more difficult task. Bosch provides a paper describing his algorithm, and two sample data files, spaceman.dat and dragonfly.dat.

Set-up: I started by downloading and installing version 4.40 of the Gnu glpk package, which includes a command-line program called "glpsol" which can solve integer programs expressed in the same CPLEX format generated by Bosch's little program. I have no idea how the performance of glpk compares to other packages such as ILOG CPLEX, which is most likely what Bosch used. I chose glpk because it was free.

Running the solver is a multistep process:

  1. Copy the clue file you want to solve into pbn.dat
  2. Run Bosch's pbn.c program. This reads the clue file, and writes out a CPLEX file named pbn.lp.
  3. Run the integer program solver. If you are using glpk, the command is:
    glpsol --cpxlp pbn.lp -o pbn.sol
  4. The output file from this, pbn.sol does not, of course, include a pretty picture of the solved puzzles. It just gives the values of the variables in the final solution. You want to look at the "Z" varibles, which are 1 for black and 0 for white.
I wrote a little perl script that would do all this and then pretty print the result.

Results: This solves most small puzzles pretty easily, but generally much slower than the line solvers do. Because it doesn't work line-by-line, some of the puzzles that give the more simple-minded line solvers problems are relatively easy for it. It did the edge logic puzzle, #23, in a half second.

The constraint generating program could use some work. It crashes on puzzles with blank lines. For some reason it generated a syntactically invalid CPLEX file for puzzle 16.

Interestingly, the second puzzle included in the distribution, the 20x20 dragonfly puzzle, required a touch under five hours of CPU time to solve on computer A, yet the puzzle is line solvable. My pbnsolve program solves that so fast that the CPU time registers as zero.

Luiz Mingote and Francisco Azevedo have generalized this approach to multicolor puzzles. Their paper compares its results to the Olšák solver and found them similar for a few tiny puzzles. I doubt if it would remain comparable if presented with a larger set of more difficult puzzles, but I cannot test it, as their source code does not appear to be available.

5.2.4. Andrew Makhorin's Example IP Solver

URL: http://www.gnu.org/software/glpk/

Version Evaluated: version distributed with glpk version 4.40

Limitations: Black and white only.

Assessment: A cleaner, slower implementation of Bosch's solver.

Description: Andrew Makhorin, the author of the Gnu glpk package that I used to run Bosch's solver above, wrote me to point out that there is an implementation of a version of that solver included among the example programs in the glpk distribution. The example is written in the Gnu MathProg modeling language, which is a subset of AMPL. I believe it is intended as more of a demonstration of glpk's modeling capabilities than a practical solver.

Set-up: I had already installed glpk to test Bosch's program. The example nonogram solver is called examples/pbn.mod in the distribution. An example puzzle description is embedded in that file. It can be executed by doing:

glpsol --math pbn.mod
For my tests, I replaced the sample test data with data generated by the webpbn export program. This is obviously much simpler to run than the Bosch solver, and it even pretty prints the final puzzle image for us.

Results: Unsurprisingly, this performs somewhat similarly to the Bosch solver, but it is usually substantially slower. On the plus side, it can handle puzzles with blank rows or columns just fine.

5.3. Embedded Solvers

These solvers are embedded in more general tools for designing and solving paint-by-number puzzles. They usually have graphic user interfaces. It is generally impossible to get any runtime results from them.

5.3.1. Juraj Simlo's Griddlers Solver

URL: http://jsimlo.sk/griddlers/

Version Evaluated: gs13

Limitations: Black and white only. Puzzles must be 50x50 or less.

Assessment: A good solver.

Description: This is a Windows application offering a complete puzzle creation/playing/auto-solving environment, but the emphasis is on the auto-solver. It is not open source. Many fine puzzles are included in the package. It is fairly well documented, including a general description of the algorithm.

Set-up: Nothing much to set up here. Unzip and run.

Results: I had no way to get CPU time information from this program, since the solver is embedded in a GUI, But it solves most puzzles quickly, while you watch.

If you give it puzzles that are too big, it crashes, saying "Access violation at address 00463EAE in module 'gridsolv.exe' read of address 00000012". It seemed like this happened for puzzles where any dimension exceeded 50, but I'm not sure. Because of this I was unable to test it against most of the harder puzzles on my list. It succeeded with all the smaller ones except the "Lion", where, like most other solvers tested, it got hopelessly bogged down in an interminable brute force search.

5.3.2. Alexander Meshcheryakov's QNonograms

URL: http://qnonograms.sourceforge.net/

Version Evaluated: qnonograms-0.2

Limitations: Black and white only.

Assessment: A good line solver with no search capability.

Description: An open source puzzle creation and solving environment written in C++. When run, it brings up a window with the empty puzzle grid, where you can solve manually or run the solver. The program is fairly minimalistic, and looks like a project in its early stages. It can read input files in two different binary formats (suffixes .jap and .jcr) and one plain text format (suffix .cwd) all documented only by internal comments, and then only scantily.

Setup: To compile from the source package on a Linux system, first do "qmake", which generates the Makefile and then do "make". The ready-to-run binary will be in the build subdirectory. You can give a data file name on the command line, or select one after you run it.

Results: Because the solver is integrated with the rest of the program, it is somewhat difficult to obtain timing results. It appears to be a fairly competent line solver. It solves all the line solvable puzzles on our list above, but for the rest it solves only as much as can be line-solved and then stops.

On the Flag puzzle, it didn't appear to do all possible line solving before quitting.

5.3.3. Jan Wolter's webpbn.com Javascript Helper

URL: webpbn.com

Version Evaluated: No version numbers.

Limitations:

Assessment: A somewhat dim-witted solver with no search capability.

Description: Webpbn.com is a community website where users create puzzles for other users to solve. Users solve puzzles on a page where Javascript is used to render the puzzle and update the screen image in response to mouse clicks. There is an automated solver available in this environment that can be activated with the "helper" button, however this button is only available if you are solving a puzzle you created yourself, or are solving a puzzle that you have previously solved manually.

The webpn helper is a small Javascript solver that normally uses a left-right overlap algorithm for line solving. This is normally implemented using Javascript's regular expression engine to find possible solutions, but on browsers where the regular expressions are brain-damaged a version of the line solving algorithm from pbnsolve is used instead.

This left-right overlap algorithm for line solving is not complete, and there is no search capability in the helper, so there are many puzzles that will only be partially solved by the helper. However, when the helper stalls, it is possible for the user to manually mark a few more cells and then restart the helper.

Setup: None.

Results: Puzzle designers need to solve their puzzles many times as they make small changes in hopes of improving the playability. Doing this manually is extremely tedious, and error prone. The helper is designed to streamline this process. It only does things that any compentent human solver would do. Then it stops, so the puzzle designer can see if more sophisticated reasoning can be used to continue. Thus, its stupidity as a solver is actually a feature. It is designed to solve cooperatively with the puzzle designer, allowing the puzzle designer to assess how well the puzzle solves and how difficult it is without having to solve it entirely by hand.

6. Conclusions

The only practical applications that I know of for nonogram solvers are (1) validating new puzzle designs to ensure they have a unique solution, (2) providing hints to human solvers when they get stuck on a puzzle, and (3) cheating in puzzle solving competitions. In each case, we are solving puzzles designed by humans for humans to solve, and those just aren't usually that difficult. Nobody designs nonograms on 10,000 by 10,000 grids because nobody wants to solve them. The "Lion" puzzle and "Flag" puzzles which are among the most difficult puzzles in our test set are badly botched designs by inexperienced puzzle designers. Any experienced puzzle designer can tell at a glance that these puzzles have thousands of possible solutions, so no experienced designer would ever have created them. Thus, puzzles like this are rarities. Puzzles like "Forever" and "9-Dom" on the other hand, are extremely clever constructs by skilled human designers aiming to explore unconventional puzzle solving techniques. They are solvable by skilled solvers. But puzzles like that are rare too. So, although nonogram solving is known to be NP-complete, the fact that real world puzzles are typically small and well-designed means that it is nevertheless quite easy to build a practical solver that works well nearly all of the time.

Some of the solvers described here were built to be practical, real-world solvers. Pbnsolve and the Simpson and Olsak solvers are probably the most highly developed practical solvers. Pbnsolve is actually used to validate dozens of puzzles every day. It is optimized to run very fast in little memory, not so much because we need answers in a tenth of a second, but because it runs on a web server and any resources it consumes have a negative impact on the server's ability to serve web pages. For solvers like this, the most significant part of the sample results table is all the "zero" or near zero run times on the easy puzzles. Solving most puzzles fast is more important than solving the rare pathelogical puzzles at all.

Other solvers described here are written by people who have no practical interest in actually solving puzzles, but only want to use the problem as a demonstration of the power of their general purpose problem solving platforms. For those authors, this survey provides a benchmark to compare their solving systems against other general solving platforms, and against hand built, dedicated solvers. They hope to prove not that they have the best tool for solving nonograms, but that they have the best general problem solving system.

But if you are using this data that way, you run up against the problem that we don't actually have a lot of hard problems in our data set. It is perfectly possible to tune a solver to solve the particular hard problems in this set. That proves nothing about the general power of your solving framework, unless we can find more hard problems to throw at it. Ideally we would have an endless supply of puzzles of arbitrary size and difficulty. But there is no bottomless pool of hard nonogram problems. People just don't make them. The best you could do is to test on randomly generated puzzles, but if you optimize your solvers for those, then you are no longer working on anything with any practical application at all, because the characteristics of randomly generated puzzles are quite different from the puzzles that people really solve.

So it's hard to say anything about what solver is best, because of the different goals of the different solvers, and because of the limitations of the test data available. Still some interesting observations can be made.

I think Simpson's solver is a terrific example of a well-tuned, super-efficient, pure heuristic search system. Compared to some of the other solvers, it's memory consumption is microscopic, but it solves nearly all puzzles very quickly. It seems to all come down to having well-chosen search heuristics that are well-tuned to solving human-designed puzzles. The fact that it's performance falls off badly when it is presented with randomly generated puzzles just goes to demonstrate how well it is tuned for the application it was designed for.

The Olsak solver and the two Gecode solvers are also pure heuristic search systems. They don't do quite as well as Simpson's on the human designed puzzles, and don't fall apart quite as badly on the randomly generated puzzles.

The Gecode solvers are impressive in another sense. Though their average run time falls a bit short of that of the Simpson and Olsak solvers, the amount of effort that went into their development is far less. These are short little programs, probably written in a couple days, and they are competitive with solvers that are the result of months or years of work. The power of the underlaying search libraries and the ease of experimentation with alternative search heuristics obviously serves them well. If you need a solver fast, building on top of one of these packages is clearly the right way to go. As these general purpose solving environments continue to grow in sophistication, I think the day may well come when building dedicated solvers from the ground up makes no sense at all.

These heuristic search programs are basically doing depth-first search, using quickly evalated heuristic functions to choose which branches to explore first, in hopes of finding the solution more quickly. My own pbnsolve system doesn't actually use heuristic functions. Instead, it explores each possible branch for some distance before choosing one. This probing approach expends a lot more computation at each decision point in hopes of winning a pay off by finding a short route to the solution. It works more often then it fails, so performance is generally good, even on randomly generated puzzles. It does however, sacrifice some speed on easier puzzles, where all that probing turns out to be overkill. This was a very tricky part of the design, getting better performance on the rare hard problems without getting too sluggish on the common easy ones.

The BGU system goes further in this direction, doing even more probing at each decision point. It succeeds in solving everything in our test set, but it also seems to sacrifice even more speed on easy puzzles.

I don't know the details of how the Lazyfd solver works, but I suspect that it is an approach similar to probing that spends more computation to try to make better choices, or otherwise adds more of a breadth-first search flavor into its strategy. Again, the focus is more on getting through hard problems than solving simple ones quickly.

I suspect that there is a lot of room for progress here. The fact that few puzzles are hard for all solvers suggests that if a hybrid approach that more effectively combines their strengths should be possible.

7. Other Solvers

This is just a list of links to pages describing solvers that I maybe ought to investigate someday. Some of them are puzzlingly undocumented. Others would require me to figure out how to install and configure all sorts of strange packages before I could run them. Some are windows packages that I probably wouldn't be able to collect run time data from anyway. But who knows, maybe I'll get around to them someday anyway. If anyone is especially eager to have me look at one of these, or knows of others to add to the list, please let me know.
Last Update: Wed Apr 28 10:39:20 PDT 2010