Survey of Paint-by-Number Puzzle Solvers

Jan Wolter

1. Introduction
1.1. Procedure
1.2. Solving vs. Validation
1.3. Notes on Run Times
1.4. Other Reports
1.5. Appendices
2. Sample Results
2.1. Run Times
2.2. Memory Usage
3. Full Results
4. Random Results
5. Discussions of Individual Solvers
5.1. Dedicated Stand-Alone Solvers
5.1.1. Jan Wolter's pbnsolve Program
5.1.2. Mirek and Petr Olšák's Nonogram Solver
5.1.3. Steve Simpson's Nonogram Solver
5.1.4. Ben-Gurion University Solver
5.1.5. Kuang-che Wu's Naughty
5.1.6. Evgeniy Syromolotov's JSolver
5.1.7. Jakub Wilk's Nonogram Solver
5.1.8. Kenichi Ishigaki's Games::Nonogram Perl Module
5.1.9. Richard Wareham's Nonogram Solver
5.1.10. Vladimir Sukhoy's C++ Paint by Numbers Solver
5.1.11. Kyle Keen's Nonogram Solver
5.1.12. Frans Faase's Solver Project
5.1.13. Yuvai Lando's Nonogram Solver
5.2. Solvers Based on General Purpose Tools
5.2.1. Mikael Lagerkvist's Gecode Nonogram Example
5.2.2. Hakan Kjellerstrand's MiniZinc Nonogram Solver
5.2.3. Robert Bosch's IP Solver
5.2.4. Andrew Makhorin's Example IP Solver
5.2.5. Naoyuki Tamura's Copris Nonogram Solver
5.2.6. Jan Wolter's Copris Multicolor Nonogram Solver
5.3. Embedded Solvers
5.3.1. Juraj Simlovic's Griddlers Solver
5.3.2. Alexander Meshcheryakov's QNonograms
5.3.3. Jan Wolter's webpbn.com Javascript Helper
6. Conclusions
7. Other Solvers

1. Introduction

"Paint-By-Number" is one of many names for a type of graphical logic puzzle originating in Japan. Other common names for these puzzles are Nonograms, Griddlers, Hanjie, and Picross. The puzzles consist of a blank grid with numbers along the top and one side. The numbers tell the sizes of the blocks of the foreground color in that row and column. To solve the puzzle, you must reconstruct the image by figuring out how the color blocks must be place in each row and column. One of the many charms of these puzzles is that they are quite easy to create as well as to solve.

I run a community paint-by-number website, webpbn.com, where people create and solve such puzzles. To help validate puzzles created there, I wrote a program to automatically solve such puzzles, pbnsolve, which is open source. I'm still not entirely happy with its capabilities and performance, so I thought I'd survey some of the solvers out there, using Steve Simpson's list of solvers as a starting point.

1.1. Procedure

My testing procedure is generally to figure out how to get the solver running on my computer (which normally runs Linux but can be booted into Windows), write some code so that I can export puzzles from the webpbn.com database in the correct format to be read by the solver, and then test the solvers on the puzzles. I try to document in detail how I went about installing and running the solver.

I start out by testing each solver on a small collection of interesting puzzles, collecting run times in a table. Then for the faster solvers, I run them on a collection consisting of every black & white puzzle posted on webpbn.com as of a certain date. Some statistics on memory usage were also collected, and a round of tests on randomly generated puzzles has been done.

Note that this document has been updated many times. Older links and/or citations may well have refered to older versions which were significantly different.

1.2. Solving vs. Validation

There is an important distinction to be made between simply solving a puzzle and validating it. In solving a puzzle, the goal is simply to find a solution. This seems to be the goal that most designers of solvers have in mind for their programs.

But I wrote my solver as a tool to check puzzles designed by users of the webpbn site. Designers of puzzles always work by drawing a puzzle image, and then generating the clues from that. But not every image makes a good puzzle. Some lead to puzzles with multiple solutions. Others may have a unique solution, but be too difficult or too easy. Thus designers need to test solve their puzzles, make adjustments, and then test solve them again. This procedure can grow tedious, so it is nice to have a program that can automatically check the puzzle, determining if the solution is unique, and perhaps giving some feedback on how hard it is to solve.

A program to do puzzle validation has a bit of a tougher job than a simple solver. A solver can stop after finding one solution. A validator must try to find a second. In the case of a puzzle that only has one solution, it must completely explore the solution space, eliminating every possible alternative solution.

This difference impacts the design of the solver in a fairly deep way. The heuristics in a simple solver should be designed to get it to the goal solution as fast as possible. They might, for example, exploit regularities that often appear in human designed puzzles to guess correctly which cells should be black, so as to get more quickly to the solution and avoid running into dead ends.

A validator, on the other hand, loves finding the contradictions that constitute dead ends, because they reduce size of the search tree. It's in no particular hurry to find the correct solution. Since the puzzle was generated from a drawing, we know that the correct solution is out there somewhere. Our goal is not to find it, but to explore the whole search tree. The heuristic search strategy is likely to be focused on making guesses that have many logical consequences or lead to contradictions, rather than on making guesses that are "correct".

So far as I can tell, only two of the programs surveyed here were consciously designed as validators, my own pbnsolve program and the Ben-Gurion University Solver. Most of the other solvers can be used as validators simply by asking them to return two solutions, but being able to do the task isn't the same as being designed for the task.

In the run-time tests performed for this survey, I always ask the solvers to find two solutions, unless the solver doesn't support that option. This obviously puts the solvers that weren't really designed for that at a disadvantage. When you run them, you often see them spit out one solution very quickly, and then spend a long time before it finally decides that there are no others. This may seem unfair to these solvers, and it is, but I don't really see much of any practical use for a simple solver. A validator is much more useful so that's what I'm focusing on here.

Paradoxically, there are cases where the strategy of looking for a solution fast is actually the superior strategy for validation. This occurs in certain puzzles with very large numbers of solutions. Because of the large number of possible solutions, contradictions are difficult to find, and trying to explore the whole search space can bog you down hopelessly. A simple dash to the finish line, even if it has to be performed twice, may be more effective when finish lines are plentiful. So it's possible that the best possible solver would actually have to find some middle ground between these strategies.

It should also be noted that if the application is puzzle validation, then there is another useful input that could be given to a validator: the drawing from which the puzzle was generated. My pbnsolve program is the only one I've seen that can take this as in input and use it in an aid to validation. For example, if the first solution found does not match this, then you already know you have multiple solutions without having to search further. More could be done though. There are, for example, certain patterns that could be easily recognized in the solution that indicate that there are multiple solutions.

In the tests performed here, we never give the solver the goal image, placing the solvers and the validators on leveler ground, but designers of practical solvers should probably be thinking of effective ways to use such data.

Another difference between solving and validating is the types of puzzles likely to be fed to it. Solvers mostly get good, well designed puzzles. But designing a good puzzle is often a long process, and early versions are quite likely to be nasty things that are not solvable by humans. We'd like our validator to give the designer useful feedback even on those. Our test set includes several such puzzles, which an experienced designer would never release to the public, but which are quite typical of the sorts of things a validator must cope with.

1.3. Notes on Run Times

A few comments on the run-times reported in this survey:

1.4. Other Reports

Steve Simpson has an excellent collection of links to nonogram solvers solvers and other nonogram related resources, which has been invaluable to me.

Hakan Kjellerstrand has done some independent testing of some constraint-based solvers on the same sample data set used here.

1.5. Appendices

I have several other pages closely related to this page:

2. Sample Results

I've accumulated a small sample set of puzzles that I use for testing solvers. The intent is to include a range of difficulties and enough diversity to challenge most browsers.

2.1. Run Times

Run time results for the sample set are given in the table below. The table has been split in two to separate solvers that were searching for two solutions from solvers that could not be told to do so. Note that the solvers in the first table are solving substantially more difficult problems than those in second table, since they are checking if there is a unique solution rather than just finding one solution.

There is a key below the tables.

Run Times on Sample Set of Black & White Puzzles
Part I: Solvers Capable of Uniqueness Checking

puzzle size Notes Wu Syro-
molo-
tov
Wolter Olšák Simp-
son
BGU Tamura/
Copris
Lager-
kvist/
Gecode
Kjeller-
strand/
Gecode
Kjeller-
strand/
Lazyfd
#1: Dancer* 5 x 10 Line 0 0 0 0 0 0.06s 0.92s 0.01s 0,01s 0.04s
#6: Cat* 20 x 20 Line 0 0 0 0 0 0.08s 4.0s 0.01s 0.02s 0.44s
#21: Skid* 14 x 25 Line, Blanks 0 0 0 0 0 0.08s 4.1s 0.01s 0,02s 0.64s
#27: Bucks* 27 x 23 Blanks 0 0 0 0 0 0.16s 6.2s 0.01s 0.02s 1.1s
#23: Edge* 10 x 11   0 0 0 0 0 0.09s 1.3s 0.01s 0.01s 0.08s
#2413: Smoke 20 x 20   0 0 0 0 0 0.19s 5.5s 0.01s 0.02s 0.60s
#16: Knot* 34 x 34 Line 0 0 0 0 0 0.20s 9.7s 0.01s 0.02s 5.5s
#529: Swing* 45 x 45 Line 0 0 0 0 0 0.25s 10.3s 0.02s 0.04s 13.2s
#65: Mum* 34 x 40   0 0.01s 0.01s 0 0.01s 0.26s 9.5s 0.02s 0.04s 11.0s
#7604: DiCap 52 x 63   0.01s 0.11s 0.02s 0 0 2.1s 19.7s 0.02s 0.06s +
#1694: Tragic 45 x 50   0.02s 0.02s 0.04s 0.03s 0.54s 0.62s 10.9s 0.14s 3.6m 32.1s
#1611: Merka* 55 x 60 Blanks 0.02s 0.01s 0.01s 0.01s 0 0.35s 15.5s 0.03s + 1.1m
#436: Petro* 40 x 35   0.01s 0.04s 0.06s 15.2s 0.10s 0.58s 8.3s 1.4m 1.6s 6.2s
#4645: M&M 50 x 70 Blanks Size 0.07s 0.07s 0.10s 0.02s 0.84s 12.2s 0.93s 0.28s 48.1s
#3541: Signed 60 x 50   0.07s 0.09s 0.04s 1.1s 5.4m 0.68s 12.9s 0.57s 0.56s 1.0m
#803: Light* 50 x 45 Blanks 0.20s 0.19s 0.38s + 0.02s 1.1s 8.4s + + 4.7s
#6574: Forever* 25 x 25   0.34s 13.9s 3.7s 2.0s 18.9s 44.3s 6.4s 4.7s 2.3s 1,7s
#10810: Center 60 x 60   5.4s 0.05s 8.6s 0 0.01s 0.86s 21.3s 0.04s 0.07s 27.8s
#2040: Hot 55 x 60   0.11s 0.06s 0.83s + 1.2m 0.93s 16.1s + + 2.6m
#6739: Karate 40 x 40 Multiple 0.06s 0.03s 0.80s 17.3s 18.3m 0.61s 10.8s 56.0s 57.8s 13.6s
#8098: 9-Dom* 19 x 19   20.3s 11.7s 11.0s 4.0m + 2.8m 4.9s 12.6m 3.8s 1.8s
#2556: Flag 65 x 45 Multiple, Blanks Size 0.02s 0.55s 1.5s 0 24.2s 1.1m 3.4s + 4.2s
#2712: Lion 47 x 47 Multiple 53.4s 24.8s 6.3s + + 7.8s 20.3s + + +
#10088: Marley 52 x 63 Multiple 54.0s 4.2s + + + 22.0s 28.1s + + 2.9m
#18297: Thing 36 x 42 Multiple 8.4m + 6.7m + + 2.0m 46.8s + + 3.6m
#9892: Nature 50 x 40 Multiple 1.8m + + 18.6m + + 1.2m + + 2.4m
#12548: Sierp 47 x 40 Multiple + 2.2m + + + + 1.4m + + +
#22336: Gettys 99 x 59 Multiple Size + + + + + 10.5m + + +
Knotty* 40 x 40 Multiple? + + + + + + + + + +
Meow 75 x 50 Multiple? Size + + + + + + + + +
Faase* 95 x 80 Multiple Size + + + + + + + + +

Run Times on Sample Set of Black & White Puzzles
Part II: Solvers Not Capable of Uniqueness Checking

puzzle size Notes Wilk* Suk-
hoy
*
Ishi-
gaki
*
Ware-
ham
*
Bosch/
glpk
*
Mak-
horin/
glpk
*
Keen* Lando*
#1: Dancer* 5 x 10 Line 0 0 0.04s 0.05s 0.01s 0.03s 0.05s 9.3s
#6: Cat* 20 x 20 Line 0 0 0.28s 0.12s 2.0s 7.2s 2.1s 9.2s
#21: Skid* 14 x 25 Line, Blanks 0 Fail Fail X X 7.1s 7.9s 9.4s
#27: Bucks* 27 x 23 Blanks 0 Fail Fail X X 12.7s 35.3s +
#23: Edge* 10 x 11   0 Fail Fail Fail 0.49s 0.15s 0.17s +
#2413: Smoke 20 x 20   0 0.04s 1.3s 17.3m 10.8m + Fail +
#16: Knot* 34 x 34 Line 0 0.02s 18.8s Fail Fail + + 9.4s
#529: Swing* 45 x 45 Line 0.03s 0.04s 4.3s Fail - - - Fail
#65: Mum* 34 x 40   0.36s Fail + Fail - - - Fail
#7604: DiCap 55 x 55   2.91s 0.47s + Fail - - - -
#1694: Tragic 45 x 50   1.9s 3.8m + - - - - -
#1611: Merka* 55 x 60 Blanks 5.6m Fail + X X - - -
#436: Petro* 40 x 35   9.2m + + - - - - -
#4645: M&M 50 x 70 Blanks 14.0m Fail - X X - - -
#3541: Signed 60 x 50   + + - - - - - -
#803: Light* 50 x 45 Blanks + Fail - - - - - -
#6574: Forever* 25 x 25   + Fail Fail + + + + -
#10810: Center 60 x 60   15.0s + - - - - - -
#2040: Hot 55 x 60   + + - - - - - -
#6739: Karate 40 x 40 Multiple + + + - - - - -
#8098: 9-Dom* 19 x 19   0.01s Fail + - + - + -
#2556: Flag 65 x 45 Multiple, Blanks + Fail - X X - X -
#2712: Lion 47 x 47 Multiple + + - - + - - -
#10088: Marley 52 x 63 Multiple + - - - - - - -
#18297: Thing 36 x 42 Multiple; + - - - - - - -
#9892: Nature 50 x 40 Multiple + - - - - - - -
#12548: Sierp 47 x 40 Multiple - - - - - - - -
#22336: Gettys 99 x 59 Multiple - - - - - - - -
Knotty* 40 x 40 Multiple? - - - - - - - -
Meow 75 x 50 Multiple? - - - - - - - -
Faase* 95 x 80 Multiple - - - - - - - -

Notes:
 Line= Puzzle is line solvable. That is, it can be solved one line at a time.
Blanks= Puzzle contains blank rows or columns.
Multiple= Puzzle has more than one solution.
Footnotes:
*= Puzzles designer has given permission to redistribute this puzzle
*= Solver was only looking for one solution to puzzles, instead of two.
Run Times:
0= Run time less than a hundredth of a second
#s= Run time in seconds
#m= Run time in minutes
+= Run time exceeded 30 minutes
Fail= Program terminated without completely solving the puzzle
Size= Program failed because puzzle size exceeded its limit
X= Puzzle had blank lines and program couldn't handle them
-= Test was not performed

I've had several people report running times quite a bit different from the ones reported here. Hakan Kjellerstand used the lazyfd solver to solve the "Lion" puzzle in two minutes on his computer, but it took me 55 minutes to solve. Mikael Kjellerstand solved it with gecode in 13 minutes on his computer, but I gave up after 11 hours. I don't know why there are these big differences. The computers are obviously different, but not that different.

The Sierp" puzzle was just recently added. It's not terribly big, but no tested solver has solved it. I don't actually know if the solution is unique, though I think it is. I once let webpbn chew on it for almost eleven days before I gave up.

The 9-Dom puzzle is especially interesting because you can easily make larger versions of it, 10-Dom, 11-Dom, etc. You can use this series of puzzles to test the effect of puzzle size on solver run times.

I currently only know of three solvers that can handle multicolor puzzles, but I decided to start developing a sample set of multicolor puzzles just so I can use it to track improvements in my solver. Generally speaking, multicolor puzzles are easier to solve, but unskilled puzzle designers are often attracted to using a lot of colors, so there are a disproportionate number of really awful multicolor puzzles in the webpbn database. Some of these are quite a challenge for a solver, while being exactly the class of puzzles where automated validation would be most useful.

Note that when we count colors, we include the background color (white), so a black & white puzzle has two colors, and multicolor puzzles have three or more. Webpbn only supports five colors.

Run Times on Sample Set of Multicolor Puzzles
puzzle size colors Notes Wolter Olšák Wolter/
Copris
#47: Map* 31 x 18 5 Line 0 0 7.6s
#220: Onions* 40 x 40 5 Line 0 0 13.3s
#1503: Bob 25 x 25 3   0.01s 15.7s 9.6s
#2257: JarJar 55 x 60 3   0.10s 14.4s 23.1s
#4940: Babe 45 x 45 3   0.11s 0.10s 11.3s
#5193: Face 30 x 30 5 Multiple 1.3s 0.05s 10.2s
#2684: Censored 48 x 64 4 Multiple 7.7s 2.78s 15.5s
#2073: Peanut 35 x 35 3   0.04s 6.3m 11.9s
#4364: Snowman 40 x 40 5   0.04s 6.3m 11.5s
#2817: Shark 50 x 45 3   0.13s 5.6m 12.9s
#4809: EFB 50 x 40 4 Multiple 3.2s 15.7m 28.8s
#2814: Comet 45 x 50 3   0.07s + 14.3s
#3149: Ex 40 x 40 5 Multiple 0.58s + 19.4s
#4445: Pinup 66 x 99 4 Multiple 0.40s + 2.1m
#2984: Heart 25 x 25 5 Multiple 0.05s 0 10.2s
#2498: Gecko 75 x 45 5 Multiple 1.5s + 24.4s
#3620: Addition 20 x 20 5 Multiple + 0.01s 7.3s
#672: Web 52 x 52 4 Multiple + + 29.1s

2.2. Memory Usage

I've made some attempt to study the memory usage of the different solvers, but this is difficult as there is no easy way under Linux to get the maximum or average memory usage of a process. The operating system simply does not collect that information. So instead I did repeated runs under different address space rlimits, until I found the smallest memory limit under which the program would still run. This method requires many runs of the program however, so I could only do it for puzzles where the run times were fairly small.

It should also be noted that measuring the memory usage of processes on modern operating systems like Linux can be a bit complex, since some memory is shared between different processes. I believe the values reported here include all memory used by the program, including memory for shared libraries and such. If your server has other things running that use the same libraries (e.g., other Perl programs) then the memory cost might be lower than these numbers suggest.

I am not reporting memory tests on the Java programs, that is, the BGU solver and Copris. It seems that when you run a Java program, you have to tell it in advance how much heap memory to allocate, and it always uses that much, no more and no less. I could try running each puzzle with different heap sizes to try to figure out how much is actually needed for any given puzzle, but there is little sense in that, because in practice you never know how much memory you will need, so you always set it very high, just in case.

The table below summarizes these results. As usual, the dashes indicate tests not done, usually because I lacked the patience to sit through dozens of long runs.

Solver Memory Usage
#1: Dancer*
5x10
#23: Edge*
10x11
#6: Cat*
20x20
#2413: Smoke
20x20
#6739: Karate
40x40
#4645: M&M
50x70
#8098: 9-Dom
19x19
Simpson 3.8M 3.8M 3.8M 3.8M 3.8M 3.8M -
Olšák 7.2M 7.2M 7.2M 7.2M 7.2M 7.2M -
Wilk 10.4M 10.4M 10.4M 10.4M - - 10.4M
Syromolotov 11.3M 11.3M 11.3M 11.4M 12.2M 14.0M 12.9M
Wu 12.9M 13.0M 12.9M 13.0M 14.4M Size 33.9M
Wolter 13.6M 14.9M 13.6M 14.9M 21,5M 16.5M 32.0M
Bosch/glpk 9.4M - 15.1M - - - -
Makhorin/glpk 9.6M - 17.6M - - - -
Ishigaki/perl 19.4M - - 27.4 M - - -
Keen/python 32.3M 32.6M 33.2M - - - -
Lagerkvist/Gecode 106.5M 106.5M 106.5M 106.5M - 106.5M -
Kjellerstrand/Gecode 108.9M 109.2M 109.2M 109.5M - 122.3M 111.1M
Wareham/mono 160.0M - 163.4M - - - -
Kjellerstrand/Lazyfd 20.5M 23.1M 56.6M 68.3M 345.3M 689.2M 48.5M

The differences here are quite significant, but, of course, it should come as no surprise that the systems that use interpretors or complex general purpose libraries have a bigger memory footprint.

It's a bit hard to spot on this table, but there are also big differences in the growth rate of memory usage with increased problem size and complexity. Though some solvers seem to use about the same amount of memory for all problems (e.g., Simpson and Olšák), others eat a lot more memory for harder problems (e.g., Lazyfd).

3. Full Results

The puzzles in the tables above are not by any means typical puzzles. Most of the puzzles created for humans to solve are similar to the first two on the list, and can be quickly solved by even fairly simple-minded solvers. The puzzles listed later in the list were mostly selected from those that historically gave pbnsolve and various other solvers problems.

In my experience, solvers tend to be rather finicky. A change in the algorithm that makes it run a bit faster on one puzzle sometimes makes it run much slower on others. For example, pbnsolve's probing algorithm helps it cope with many difficult puzzles, but it fails badly on the "Flag" puzzle in the list above, which it used to be able to solve only if you turned off probing. (Version 1.09 learned to switch to heuristic search in such cases.) Because programs that work well on some puzzles often work badly on others, it's misleading to judge any solver based on just a short list of puzzles.

So given a more typical set of puzzles, how many of them does each solver solve? To test this, I used a snapshot of the webpbn.com puzzle database (as of Oct 2, 2009) containing 2,491 black and white puzzles and 3,232 multicolor puzzles, and ran some of the more interesting solvers on every single one. Note that six puzzles in the sample set ("DiCap", "9-Dom", "Nature", "Center", "Sierp" and "Marley") are not included in this "full" set, since the snapshot predates those puzzles.

Some important notes about this test set:

The number of black and white puzzles solved in various time ranges is tabulated below, sorted by the rightmost columns. They were run on the same computer with the same compilation and execution options used in the sample results. To keep these tests from running excessively long, any run that took longer than two minutes was interupted.

2,491 Black & White Puzzles
Solver 0 to
0.09 sec
0.10 to
0.19 sec
0.20 to
0.49 sec
0.50 to
0.99 sec
1.00 to
3.99 sec
4.00 to
9.99 sec
10.00 to
29.99 sec
30.00 to
59.99 sec
1 min to
2 min
2 min
or more
Wolter 2474 8 3 4 1 1 0 0 0 0
BGU 632 756 1029 59 11 2 1 1 0 0
Wu 2376 3 9 0 2 0 1 0 1 0
Tamura/Copris 0 0 0 33 367 1760 326 2 3 0
Syromolotov 2474 9 4 1 0 0 2 0 0 1
Simpson 2439 10 4 5 10 2 3 0 2 16
Olšák 2414 13 10 8 16 3 6 3 2 16
Lagerkvist/Gecode 2380 23 20 12 12 12 4 5 5 18
Kjellerstrand/Lazyfd 125 92 409 343 712 424 284 66 17 19
Kjellerstrand/Gecode 2306 42 31 15 20 14 5 5 3 50
Wilk* 2107 60 76 52 54 31 19 9 11 72

Note that the numbers for the Wu solver add to 2392 instead of 2491 because 99 puzzles in the test set were too big for it to handle.

The rows of this table are sorted by the right-most columns, which is not necessarily the best metric for which is fastest. Syromolotov's solver is overall much faster than the BGU solver, but it has problems with one puzzle in the data set. Still, it's not particularly amazing that the three solvers that were developed with this data set in hand come out on top of this listing.

One also shouldn't take this as an assurance that these three solvers can solve all puzzles in under a minute. Actually, I've since found puzzles that take longer (see "9-Dom", "Nature" and "Marley" in the sample set above).

The two-minute cutoff in this table is too short to really show off the strength of the Lazyfd solver. It would look a lot better if we allowed three minutes because unlike other solvers, it solves a lot of puzzles in the 2-3 minute range.

But what this table really demonstrates is that the vast majority of puzzles are easy. It doesn't actually take all that much sophistication to solve most of them pretty darned fast. Only a tiny fraction of the puzzles designed for humans to solve are really hard for programs to solve.

Another thing that stands out in these tables is that the Olšák solver and the Simpson solver look very similar. But this similarity is actually somewhat illusionary. Both solvers required more than two minutes for 16 puzzles, but it isn't the same 16 puzzles. In fact, there are only 3 puzzles on both lists.

Given that most puzzles are easy, and different puzzles are hard for different solvers, it is very hard to thoroughly test a solver without a large sample set of puzzles to test it on. Tests based on small sample sets give very misleading results. The sample set I used in the previous section was partly generated by picking puzzles from these full runs that gave various solvers problems, so it includes at least some "hard" puzzles for every solver, but there is no guarantee that that will be true for the next solver to come along.

On my webserver, I limit runtimes to one second of CPU time. Most of these solvers would be able to solve the vast majority of puzzles in that amount of time:

Solver Percent Solved
in Under
a Second
Wolter 99.9%
Wu 99.8%
Syromolotov 99.8%
BGU 99.4%
Simpson 98.7%
Olšák 98.2%
Lagerkvist/Gecode 97.8%
Kjellerstrand/Gecode 96.1%
Wilk* 92.1%
Kjellerstrand/Lazyfd 38.9%
Since the basic design goal for pbnsolve was to solve as many puzzles as possible in under a second, it's not surprising to find it on top of this list. It actually solves all but 21 puzzles in under a tenth of a second.

Only a few of the solvers surveyed can handle multicolor puzzles. The following results were found when running both solvers on all 3,232 multicolor puzzles in the webpbn.com database.

3,232 Multicolor Puzzles
Solver 0 to
0.09 sec
0.10 to
0.19 sec
0.20 to
0.49 sec
0.50 to
0.99 sec
1.00 to
3.99 sec
4.00 to
9.99 sec
10.00 to
29.99 sec
30.00 to
59.99 sec
1 min to
2 min
2 min
or more
Wolter/Copris 0 0 0 20 135 2073 992 10 0 2
Wolter 3215 6 2 1 3 1 0 0 0 4
Olšák 3178 9 5 4 7 6 4 0 0 19

Though handling color puzzles adds a lot of complexity to the solver, since many more special cases can come up, the majority of color puzzles are easier to solve than black and white puzzles, just because contradictions between what colors the row clues says a square can be and what colors the column clues say it can be help narrow down the range of possibilities. However, the rare pathelogical multicolor puzzle can manage to be a lot more pathelogical, because of the fact that white spaces between color blocks aren't required means there can be a whole lot more possible solutions to a line where the crossing clues happen to be unhelpful.

4. Random Results

Testing against webpbn puzzles is not ideal, because of the biases introduced by the historical entanglement of that site with the pbnsolve program. One way one might try to avoid that is to test against a set of randomly generated puzzles instead.

For this test, I generated a set of 5000 two-color, 30x30 puzzles. Each puzzle was generated by painting each grid cell black or white with a 50% probability. Each solver was then run against this set of puzzles. An tarball containing the puzzle grids used in this test can be downloaded.

These puzzles are not very similar to human designed puzzles. The vast majority have multiple solutions, and scarcely any are solvable by working on one line at a time. The breakdown is shown below, in comparison to that of the full sample of black and white puzzles from webpbn:

    Random Puzzles     Webpbn Puzzles  
Multiple solutions: 97.5% 2.6%
Unique solution but not line solvable: 2.1% 15.6%
Unique solution and line solvable: 0.4% 81.8%

So, though these tests avoid a priori biases, the data isn't very much like any real world puzzle data, and it is certainly going to favor some solvers over others. In particular, solvers whose main strength is fast line solving aren't going to perform as well as ones whose main strength is effective searching.

Here are the run time results, sorted by the rightmost column:

5,000 Randomly Generated 30x30 Puzzles
Solver 0 to
0.09 sec
0.10 to
0.19 sec
0.20 to
0.49 sec
0.50 to
0.99 sec
1.00 to
3.99 sec
4.00 to
9.99 sec
10.00 to
29.99 sec
30.00 to
59.99 sec
1 min to
2 min
2 min
or more
Tamura/Copis 0 0 0 0 0 4691 303 5 0 1
Wu 4075 321 229 122 171 57 18 2 1 4
BGU 0 23 2852 1790 284 22 12 8 4 5
Syromolotov 4417 313 129 40 41 26 13 4 5 12
Wolter 4362 221 169 70 88 38 21 6 7 18
Kjellerstrand/Lazyfd 0 0 0 0 616 2921 1208 176 50 29
Olšák 2879 333 381 244 405 199 146 84 61 268
Kjellerstrand/Gecode 2227 461 495 311 491 238 205 115 97 360
Lagerkvist/Gecode 2325 381 468 308 460 251 235 104 79 389
Simpson 2568 306 309 213 348 215 224 105 91 621
Wilk* 644 310 484 370 697 424 452 237 231 1151

Tamura's Copris solver is the clear leader in solving the most puzzles in under two minutes. In fact only one puzzle, number 3054 takes more than a minute. But no puzzle solves in less than 7 seconds either.

Wu's solver also does pretty well, solving nearly all puzzles in under two minutes and the great majority in under a tenth of a second. Possibly the fact that Wu's solver was designed for a competition where randomly generated puzzles were solved contributed to its strength in this area.

Webpbn, on the other hand, was designed to solve as many puzzles as possible in under a second, and it does well by that metric, but not as well as Syromolotov's solver:

Solver Percent Solved
in Under
a Second
Syromolotov 98.0%
Wolter 96.4%
Wu 94.9%
BGU 93.3%
Olšák 76.7%
Simpson 74.6%
Kjellerstrand/Gecode 69.8%
Lagerkvist/Gecode 69.6%
Wilk* 36.2%
Kjellerstrand/Lazyfd 0.0%
Tamura/Copris 0.0%

Plainly the Lazyfd solver makes a bad showing when we measure things this way. It didn't find a single puzzle it could solve in under a second. One took 1.7 seconds, one took 1.8 seconds and all the rest took more than two seconds. Lazyfd's run times are very sensitive to the size of the puzzle. All of the fast times in the full sample were puzzles much smaller than 30x30. The BGU solver is less extreme, but it still needs more than half a second for 99.5% of all puzzles. But though these solvers are slow on the easy puzzles, they are still quite good on the hard ones.

Simpson's solver didn't do as well as I expected on this set of puzzles. The performance of pure depth-first search systems depends very much on the choice of good heuristics for choosing which direction to search in. Simpson's heuristics seems to work very well for human-designed puzzles but not as well for these randomly generated puzzles. This is not too surprising, since this is a ridiculous set of puzzles, and nobody in their right mind would tune their solver to work optimally on them.

Pbnsolve and the BGU solver rely less on heuristics than on probing strategies, which are kind of a poor-man's version of breadth-first search, so its not too surprising that they adapt fairly well to this set of puzzles.

The two Gecode solvers are pretty close together on this test. Evidentally neither of the two search heuristics has a pronounced advantage on this problem set.

As a way of deciding which program is "best" at solving nonograms, testing on random puzzles is absurd. These puzzles are quite unlike the population of puzzles likely to be encountered in any practical application. I think these tests do, however, give interesting insights into the strengths and weaknesses of these different solvers.

5. Discussions of Individual Solvers

5.1. Dedicated Stand-Alone Solvers

The solvers in this section are programs written from scratch for the express purpose of solving paint-by-number puzzles, and not much else. Most are command line programs that read in input describing the puzzle and return a solution.

5.1.1 Jan Wolter's pbnsolve Program

URL: https://webpbn.com/pbnsolve.html

Version Evaluated: 1.09

Limitations: None.

Assessment: Among the fastest and most flexible solvers.

Description: This is the solver I wrote myself, and I have already described it in way too much detail. It should be noted that the study of other solvers for this survey has had substantal positive results for pbnsolve, as significant improvements have been made based ideas from the BGU solver, Simpson's solver, and others. The performance of some past versions is documented on a separate page.

The most salient point here is that it uses a "probing" strategy. When its straight logic solving gets stuck, then instead of using some kind of heuristic function to make a guess, it generates a long list of possible guesses, and evaluates every single one with its logic solving engine until it stalls again. Usually at least one of these will end in a contradiction, so the inverse of the guess can be considered proven. If no probes lead to a contradiction, we take the one that solved the most cell as our guess, and let the search proceed from there.

This gives the solver some of the attributes of a breadth-first search system, in that it looks some distance ahead on many different possible paths instead of being guided by a heuristic function. But it doesn't consume memory as hungrily as a real breadth-first search system. It pays the price in doing a lot of extra logic solving, but as the logic solver is highly optimized, the run-time overhead is not too horrible. Caching line solver solutions (an idea borrowed from the BGU solver) further reduces the overhead.

Note: The version tested here is the fastest version, but not the newest version. Version 1.10 added the ability to solve puzzles where some clue numbers have been blotted out. This required some generalization of the algorithms, so version 1.10 is slightly slower than 1.09. Test results for 1.10 are included in the comparison of pbnsolve versions.

Set-up: Compilation instructions are in the README file.

In production, pbnsolve is given a copy of the intended solution, and can use this to make uniqueness checking faster in some cases. To be fair to the other solvers we did not do that in these tests. All test runs were done with the -u flag to disable the use of goal solutions from the input file.

By default pbnsolve takes input in XML format. But XML files can sometimes be extremely slow to parse, since doing so may require fetching the DTD files over the Internet. To avoid this, most tests were done with input files in Steve Simpson's .non format.

Results: It's most interesting to compare this to Simpson's solver, which does a pure depth-first search with a carefully tuned search heuristic to select guesses. Both solvers are very fast at finding logical consequences. Simpson's solver is usually bit faster on the easier puzzles, where pbnsolve's expensive strategy for making guesses costs it more than it gains it. The payoff is on the harder puzzles, where pbnsolve is substantially more likely to be able to find its way to a solution. Better guesses mean fewer guesses and thus less combinatorial explosion.

On some puzzles, like the Flag, this approach used to fail pbnsolve badly. so it would take pbnsolve almost two hours to solve that puzzle. Eventually I "fixed" that problem by having it notice when too few probes were resulting in contradictions. Then it switches over to a classic heuristic search strategy, using heuristic functions borrowed from Simpson's solver.

Finally, it should be remembered that pbnsolve and the Olšáks' solver are the only two solvers in this survey that solve multicolor puzzles. This adds a lot of additional complexity to the algorithms and is likely to slow down the solver in comparision to solvers that only do black and white puzzles.

5.1.2. Mirek and Petr Olšák's Nonogram Solver

URL: http://www.olsak.net/grid.html

Version Evaluated: 1.2

Date Evaluated: December 2009

Limitations: None.

Assessment: A fast and flexible solver, but has been overtaken by newer solvers.

Description: This is an open-source C-language solver that can handle color puzzles and triddlers as well as traditional paint-by-number solvers. It was probably one of the first really good solvers published on the web. The code is richly commented, but in Czech. English documentation is minimal. The documentation claims the line solver has linear complexity. Can read two different file formats. A number of test puzzles are included.

Set-up: It's all in one C file. Compile it and you're ready to go. I ran my tests like

   ./grid -total 2 -log 1 datafile
The -log option turns off the profuse default logging.

There are upper bounds for various things compiled into the program. To be able to handle all the puzzles in my multicolor puzzle set, I had to increase MAXBLOKU, which I think is the maximum number of clue numbers in a line. (Note that color puzzles can have twice as many clues per line as black & white puzzles.)

Results: The numbers in the table above tell the story. This is a very fast solver.

Note that earlier versions of this survey showed this solver performing somewhat poorly on the full results, but that turned out to be due to a bug in my batch execution script.

This remains the only solver I know of that can solve triddlers. The fact that its algorithms are generalized to support this may slow it down a bit, but I don't really think the speed impact of that is substantial.

5.1.3. Steve Simpson's Nonogram Solver

URL: http://www.comp.lancs.ac.uk/~ss/software/nonowimp/

Version Evaluated: nonogram-1.9.12 with nonolib-4.3.0.

Date Evaluated: November 2011

Limitations: Black and white only.

Assessment: Very good solver with a terrific heuristic search capability.

Description: An open source solver library and command line interface written in C99. A Java version also exists but was not tested.

Set-up: Precompiled versions are availabe for Windows and RISC OS, but alas, I wanted to run in under Linux. To do that you need to get both the nonolib library and the nonogram command line tool.

The first step in building the library is to create a file named nonolib-env.mk and put some configuration information into it. There is a sample in the configuration in the INSTALL file, but it's bizarre, suggesting a -ansi flag although the program is in C99, not ANSI C. I used:

    INSTALL=install
    INSTALL_PATH=/usr/local
    RANLIB=ranlib
    AR=ar
    CFLAGS += -std=c99 -O2
Then 'make' and 'make install' work fine. When I first tried this with gcc-4.2, the compiler gave a segmentation fault while compiling fcomp.c with the -O2 flag. It works fine with gcc-4.4.1 though.

Compiling the nonogram command line program follows a similar procedure. For nonogram-env.mk, I used:

    INSTALL=install
    INSTALL_PATH=/usr/local
    CFLAGS += -I$(INSTALL_PATH)/include
    LDFLAGS += -L$(INSTALL_PATH)/lib
    CFLAGS += -std=c99 -O2
For all tests, I ran it with the command ./nonogram -s 2 -i datafile .

Results: As can be seen from the run times in the table, this is clearly a first rate solver, very close in performance to the Olšák's solver. It also consumes far less memory than the other solvers.

Simpson's solver would probably do even better on a benchmark where the goal was to find only one solution instead of two. Instead of being designed to find contradictions, pruning back the search tree as much as possible, it is designed to sprint down the most likely path toward a solution. The "Flag" puzzle is an example where the sprinting strategy works very well. It has lots of solutions, and it is not hard to find one, back up a bit, and find another, so Simpson's solver solves this in under a hundredth of a second. Solvers like mine and the BGU solver, who seek contradictions find this puzzle very hard because there aren't many contradictions to be found.

5.1.4. Ben-Gurion University Solver

URL: http://www.cs.bgu.ac.il/~benr/nonograms/

Version Evaluated: 1.0.1 / Java 1.6.0_17

Date Evaluated: April 2010

Limitations: Black and White only.

Assessment: An extremely good solver, especially on the hardest puzzles.

Description: This was the first instance of a solver written by people who had already seen this survey and had a copy of the full webpbn data set to test on during development. In December 2009, Ben Raziel wrote me saying he was part of a group of students writing a nonogram solver for an undergraduate 3rd year project at Ben-Gurion University. He wanted test data. I sent him the 2491 puzzles used in the full tests here. I tested the first version, which used JaCoP for line solving in March 2010. The second version, described here uses their own code for line solving.

From the description on the web page, it's clear that it uses a probing approach quite similar to that used by pbnsolve, but it does more probes at each decision point. This gives it a better chance of finding a guess that gets it far, but increases the cost of guessing.

A clever idea introduced in this solver was the use of a hash table to cache the results of line solver runs, so they can be reused if the same situation comes up again. Especially with a probing approach, one tends to solve similar problems over and over again. I liked this idea so much I copied it in pbnsolve, where I routinely see amazingly high hit rates on the cache. This gives a big performance boost even if the line solver is fairly fast. I think this simple idea would improve the performance of most solvers.

Set-up: This solver reads datafiles in the same format as Wilk's solver, probably because that's the format I supplied them with test data in. Hurrah! For the first time ever, I didn't need to write a new translator!

I downloaded the file bgusolver.jar from the web site. It is run like

   java -jar bgusolver.jar -file datafile.nin -maxsolutions 2
I was advised that on 64-bit computers, 64-bit versions of Java give the best results. I was using a 64-bit version of Java.

If the puzzle has multiple solutions and is not square, then the solver will crash after finding the second solution instead of exiting cleanly. This would probably be easy to fix. I accepted the run times for these cases because it looked to me as if the solver was basically done before it crashed.

Results: Memory usage is extremely high, but that's just because Java forces you to predefine how much memory you are going to use at run time. If you implemented the same algorithm in a sensible language, it would probably be more reasonable.

The run times on simple problems are not spectacular. Java is an interpreted language, and thus, naturally, rather slow. The authors report that compiling the program improves its performance on the easier puzzles (though not on the harder puzzles). I haven't tried that. But my suspicion is that part of this is also a basic design trade off - the authors were focusing more on solving hard puzzles fast than on solving easy puzzles fast.

But this really works very well on the harder puzzles. It was the first solver to solve every puzzle in the full 2,491 webpbn puzzle data set in under 2 minutes, and, in fact, solved them all in under 45 seconds. I was so impressed that I copied some key ideas in subsequent versions of pbnsolve, which helped substantially in improving pbnsolve's performance, so that it too now solves all puzzles.

Of course, this solver, like pbnsolve, was trained on the webpbn data set, which certainly gives it an advantage when being tested against that data set. The 9-Dom puzzle, which takes it 2.8 minutes to solve was added to the sample set more recently. It's the only puzzle I've seen that was solved more quickly by the previous version of the BGU solver than the new solver. Still, if it can quickly solve everything in the webpbn puzzle set, then that's a pretty strong indication that it can solve most of the puzzles likely to be encountered in any real application.

5.1.5. Kuang-che Wu's Naughty

URL: http://kcwu.csie.org/~kcwu/nonogram/naughty/

Version Evaluated: 87

Date Evaluated: December 2011

Limitations: Black and White only. Maximum Puzzle Size is 63x63.

Assessment: Among the fastest solvers tested so far.

Description: This is a dedicated nonogram solver written in C++. It was one of several programs that participated in nonogram solving competitions at a series of artificial intelligence conferences in Taiwan. I reviewed a pre-release version, but there seem to have been no major changes for the release version.

It takes input in the same format as Steve Simpson's solver.

Like pbnsolve and the BGU solver, naughty's programmer had access to our test data set, and likely trained on it to at least some degree.

Set-up: The solver evidentally makes some use of bit strings to represent line states, because it it is normally limited to puzzles of size 31x31, but can be compiled to work for puzzles of sizes up to 63x63 by editing the configuration file. By default it finds only one solution, but it can be modified to do uniqueness checking with an edit of the main.cc file. The README file explains these fixes, which were both applied for our test version. Other than that, just type "make" and it compiles and is ready to run.

Results: This solver is among the fastest we've tested, maybe the fastest, depending on which metrics you prefer to look at. Interestingly, it was soundly trounced in its most recent competition by something called "LaLaFrogKK," from the National Chiao Tung University in Taiwan. Maybe I'll get my hands on that someday.

5.1.6. Evgeniy Syromolotov's JSolver

URL: http://sourceforge.net/projects/jsolver/

Version Evaluated: 1.2

Date Evaluated: November 2012

Limitations: Black and White only.

Assessment: Among the fastest solvers I have tested.

Description: This is a dedicated nonogram solver written in C++. Documentation is minimal, and the source code is uncommented with most variable names in Russian, so I have no clue how it works, other than well.

Set-up: The README file gives says "g++ -O2 -o jsolver jsolver.cpp" so that's what I did. All test runs were done with the "-n 2" command line option, which instructs it to try to find two solutions. The input file format is undocumented, but easy enough to figure out from the examples included in the distribution.

Results: This was the first solver of all those tested that could solve the "Sierp" puzzle and it did it in only about two minutes. I ran my webpbn solver for two days on that once without getting a result.

It solved all but three puzzles in 2,491 puzzle test set in under a second, a record matched only by pbnsolve and naughty. Performance on the randomly generated puzzle set was also outstanding.

Of the puzzles in the full test set that took jsolver more than a second to solve, two were "Forever" and "Lion" which it still performed quite well on. But there was one puzzle in the test set, number 3867, which it seemed unable to solve - it ran over an ninety minutes without terminating. This is odd, because though it is a large puzzle with multiple solutions, it isn't typically one of the hardest puzzles for other solvers. The author says he has another version of his solver that solves puzzle 3867 fast, but that solver is slower on most other puzzles.

5.1.7. Jakub Wilk's Nonogram Solver

URL: http://jwilk.nfshost.com/software/nonogram.html

Version Evaluated: nonogram_0.8.4.4

Date Evaluated: December 2009

Limitations: Black and white only. Always halts after first solution.

Assessment: Good at line solving, but not so good at searching.

Description: This is an open-source C-language solver. It comes with a large collection of nice data files. Documentation is minimal. Looks like a linesolver with some search capability on top of that. It does an uncommonly nice job of pretty-printing the solved puzzles.

Set-up: Compile it by typing "make" in the distribution directory.

I was running it with the command

   ./nonogram -m < datafile
The -m gives less garish output.

Note that there does not seem to be any way to ask this solver to find two solutions, which we normally do for all solvers. Thus our runtime numbers actually overstate it's performance on many puzzles. The extremely fast run time on the 9-Dom puzzles is probably just because it got lucky. The Olšák solver found the first solution equally quickly, but then spent a long time confirmining that there were no other solutions. Wilk's solver is just stopping at the first.

Results: This solver appears to be fast and capable at line solving, though maybe a bit less so than some of the top solvers. But if line solving doesn't fully solve the puzzle, it declares the puzzle "ambiguous" but says it will try to solve it anyway, starting a heuristic search. It's heuristic search gets the right answers, but not particularly fast. This solver's performance on puzzles where search is needed falls significantly behind some of the other solvers.

5.1.8. Kenichi Ishigaki's Games::Nonogram Perl Module

URL: http://search.cpan.org/~ishigaki/Games-Nonogram-0.01/

Version Evaluated: Games-Nonogram-0.01

Date Evaluated: May 2009

Limitations: Black and white only. Always halts after first solution. No blank lines?

Assessment: Somewhat weak and somewhat slow.

Description: This is somewhat weirdly packaged as a Perl module. Documentation is sparse. I don't know what algorithm it uses, but it seems to be some kind of line solver with some search capability. No sample data is given in the distribution, and the details of the input format require some guessing. It's not clear how blank lines should be handled, if they can be handled at all.

Set-up: You install the module just like you'd install any other perl module, following the simple steps in the README file. I think you can ignore any messages that "make test" gives about Test::Pod modules not being installed. But this only installs a library. To run the thing you've got to write a Perl program that uses it. But luckily it's a very short Perl program:

    #!/usr/bin/perl
    use Games::Nonogram;
    $start= (times)[0];
    Games::Nonogram->bootstrap;
    $end= (times)[0];
    printf "CPU Time: %.2f seconds\n",$end-$start;
Put that in a file, permit it executable, and you can run it, giving the file name for the input file on the command line.

Note that this is looking only for a single solution, while we were asking most of the other solvers to find two solutions.

Results: This solves most small puzzles just fine, though it is slower than other solvers. But it is, after all, written in an interpreted language.

It seems to be able to handle at least some simple puzzles with multiple solutions, reporting all possible solutions, so it must have some trial and error search capabilities.

However it failed to solve some other fairly easy puzzles, like the line-solvable Skid puzzle, where it printed out a partially complete solution. It also got only partway through Bucks which is not line solvable. For the Edge puzzle, which is not line solvable, it says:

   Block 1 is broken. (Games::Nonogram::Block::left overflow: 4)
   Unless you're trying to solve by brute force,
   there may be something wrong in the puzzle data.

5.1.9. Richard Wareham's Nonogram Solver

URL: http://www-sigproc.eng.cam.ac.uk/ga/index.php?title=User:RichWareham/Nonograms

Version Evaluated: no version number. Date Evaluated: August 2011

Limitations: Black and white only. Can find only a single solution.

Assessment: Slow and seriously buggy.

Description: An open source solver written in C#. A few sample puzzles are included in the testgames.txt file. Nothing in the way of algorithmic documentation. It appears to be a a line solver, though the line solving algorithm looks kind of inefficient (try all possible arrangements of white spaces). Might have some guessing ability.

Set-up: I used the mono compiler to build this. It was actually already installed on my Linux system, and is probably available in packages for most linux systems. I built it with "make -f Makefile.Nonograms".

It doesn't read the puzzle data from a data file like most solvers, instead it takes them from the command line. This seemed rather awkward to me, so I wrote a little script that would run it like this:

  mono Nonograms.exe `cat datafile`

I believe this is looking for only one solution, not two solutions.

Results: It ran OK on a couple of smaller puzzles, though the "Edge" puzzle caused it to simply say "Oh I give up".

Then I tried it on the Knot puzzle. On computer A, it locked up my windowing system until I could only kill it with difficulty, and I gave up on further tests. On computer B, I was able to watch its memory usage quickly climb over 4 gigabytes before it crashed. The same thing appeared to happen on the next few puzzles I tried. My impression is that some bug causes it to go into an infinite memory allocating loop on most larger puzzles.

5.1.10. Vladimir Sukhoy's C++ Paint by Numbers Solver

URL: http://cgi.public.iastate.edu/cgi-bin/cgiwrap/sukhoy/moin.cgi/PaintByNumbers/CPPSolver

Version Evaluated: version 0.991

Date Evaluated: October 2010

Limitations: Black and white only. Can find only a single solution.

Assessment: Buggy and slow.

Description: The author says this was "written primarily as an effort on self-learning some of the more advanced C++ features" and is not a fully developed system. The distribution does not contain any documentation except for a description of the input file format, which is identical to the "CWD" format used by QNonograms.

The author does supply a collection of 440 sample puzzles in CWD format, each with a solution generated by his program. This would seem to be a good sized test set, but is actually pretty limited. All but eight of the puzzles are line solvable. Most of the rest require only minimal searching, two having unique solutions (0204, 0333) and four having multiple solutions (0055, 0086, 0221, 0240). Puzzle 0370 requires a bit more searching, though is nothing like the "hard" puzzles in our sample set. Puzzle 0340 is a bit of a mystery: my solver and the Olsak solver agree that it is unsolvable, but Sukhoy's program finds a solution anyway. On the whole, this is not actually that challenging a test set.

Set-up: This compiled easily under Unix with the command:

   g++ -o pbn -O2 pbn-0-991.cpp
It accepts no command line options except for the input file name.

Results: This solver had quite a bit of trouble with our sample puzzle set. It solved a few, but for many it terminated with an incomplete solution. On the "Mum" and "Forever" puzzle, it found solutions in a reasonable amount of time (0.21 seconds and 2.4 minutes, respectively) but the solutions printed were incorrect. Some of the partial solutions it produced, such as the one for "Skid", were also incorrect.

Given that it is stopping after it finds the first solution, its performance on the puzzles that it does complete is OK, but not outstanding.

It seems likely that the line solver has some bugs, leading to cells being incorrectly marked in some circumstance and the occasional generation of incorrect solutions or even solutions where there are no solutions. Its search capabilities are also limited.

5.1.11. Kyle Keen's Nonogram Solver

URL: http://kmkeen.com/nonogram/

Version Evaluated: version 2 with Python 2.6.2

Date Evaluated: August 2010

Limitations: Black and white only. Can find only a single solution.

Assessment: Very compact, but very slow.

Description: Kyle Keen shares a number of python programs on his website, including a nonogram solver. I previously evaluated the initial version, which was not too impressive except for the fact that it was only 125 lines of code. His second version is twice the size, 285 lines, which is still very small, can read input from a file, solve puzzles with blank lines, and succeeds on many puzzles where the previous version failed.

Keen provides a nice discussion of the program on his web page. It seems to be using a collection of simple rules to solve lines, instead of a more general line-solver algorithm. I suspect that you could make the program both smaller and faster by using a left-right-overlap algorithm for line solving. You could use python's regular expression library to find the left and right solutions.

Set-up: I cut/pasted the code from his page into a file which I called nonogram.py. You run it by doing:

python nonogram.py input-file

Results: The solver can handle many small puzzles which is no small accomplishment for such a small program. It does crash on some, like the "Smoke" puzzle. It is quite slow, especially given that it is just stopping when it finds the a solution, not checking for more. It isn't quite as bad as it looks in the sample results, because that sample set contains predominantly large difficult puzzles.

So this is nice demonstration of concise python programming, but not a serious nonogram solver.

5.1.12. Frans Faase's Solver

URL: http://www.iwriteiam.nl/Dpuzzle.html#nono

Version Evaluated: mega_nono_090823.cpp

Date Evaluated: August 2010

Limitations: Black and white only.

Assessment: Does logic solving only and not completely.

Description: Frans Faase has been keeping an on line diary since before anyone was calling them blogs, and since 2003 he has occasionally mentioned work on nonogram solving. Much of his effort has focused on solving a single 80x95 puzzle that Kerrin Mansfield challenged him to solve. This puzzle is now the "Faase" puzzle in our sample set, and is, so far, unsolved by any solver.

There have apparently been many different programs over the years. I tried testing the version linked to by his Aug 23, 2009 blog entry.

That solver apparantly does not search, since searching on his monster puzzle is unlikely to be profitable. So it fills in as many cells as it can, then stops, functioning as an aid to solving rather than a complete solver.

Set-up: To be able to compile with g++ under Linux I had to make one change to the file, replacing the constant "CLK_TCK" with "CLOCKS_PER_SEC". When run, a lot of output is generated on standard output, and an image of the puzzle solution found is generated in a file called "mega_nono_2.sol".

Results This solves some simple puzzles quickly, but it fails on most complex puzzles. Naturally it will not completely solve any puzzle that is not line solvable, since it does not search. But it doesn't solve all line solvable puzzles either, for example, it does not solve the Knot.

5.1.13. Yuvai Lando's Nonogram Solver

URL: https://github.com/ylando2/nonogram-solver

Version Evaluated: April 15, 2012 commit

Date Evaluated: July 2012

Limitations: Black and white only.

Assessment: Only works on small line-solvable puzzles.

Description: A simple solver implemented in Racket, which seems to be a Lisp dialect.

Set-up: I installed Racket 5.2.1 to be able to run this. I used the "racket3m" version, which is typically faster than the "racketcgc" version. The puzzle descriptions are embedded in the source code of the nonogram-user.rkt file, so I wrote a script to insert them there. Then to run, I just did

    racket3m nonogram-user.rkt

Results The run times are remarkably consistant, all being slightly over nine seconds, which is very slow. The values reported by the built-in time function are much smaller. It appears that it is taking about nine seconds just to start up Racket and load the program. Which is pathetic, but likely not the author's fault. I'm reluctant to try to factor that out of the run times however, because it would require me to measure timings differently than for all other programs in this survey. So I believe this is slow, but not as spectacularly slow as it appears in the table.

I was only able to solve the line-solvable puzzles with this. The non-line solvable puzzles all seem to take a very long time to solve, if it can solve them at all. I let some tests run for as long as 15 hours of CPU time before I gave up and let interupted them.

On the "Swing" puzzle, which is line-solvable, it failed with an "SIGSEGV MAPERR si_code 1 fault on addr (nil)" error. The "Mum" puzzle generated a similar crash. I was not impressed by helpfulness of the error messages, but that is like the fault of Racket. This seems to happen on larger puzzles. Possibly there is a puzzle size limit I don't know about.

So this seems to be a very basic solver implemented in a language with a rather unimpressive interpretor. If it has heuristic search capabilities, to fall back on when line solving stalls, then they don't work. It's possible it just keeps linesolving even when progress ceases.

5.2. Solvers Based on General Purpose Tools

These solvers use general purpose constraint programming or linear programming tools to solve paint-by-number puzzles. They are generally very small, simple demonstration programs built on top of very sophisticated packages designed to solve broad classes of problems.

The Constraint Programming (CP) community especially has long used nonograms (everyone in the CP community call paint-by-number puzzles by that name) as a standard example problem. A description of the nonogram problem is included in a standard list of constraint problems.

Because these are usually written just as demonstration programs, they typically don't have very elaborate user interfaces. Many require recompilation to load in new puzzles. If you wanted to use these in a practical nonogram solving application, you might need to wrap some more code around them.

5.2.1. Mikael Lagerkvist's Gecode Nonogram Example

URL: http://www.gecode.org/doc-latest/reference/classNonogram.html

Version Evaluated: Version distributed with gecode 3.2.2

Date Evaluated: December 2009

Limitations: Black and white only.

Assessment: An amazingly good solver, especially for a simple demo program.

Description: The first example of such an implementation that I tested was one written by Mikael Lagerkvist and distributed as an example program for the Gecode constraint solving library. Gecode is an open source package written in C++. Lagerkvist's example nonogram solver is a simple little C++ program, with only a couple hundred lines of code, that constructs a constraint set describing a paint-by-number puzzle, calls Gecode to solve it, and prints the result. It's a bit bare-bones, lacking even the ability to read data in from files. The puzzles to be solved have to be compiled into the program.

The basic concept here is to describe the problem to be solved, in this case a particular paint-by-number puzzle by a set of constraints. One form of constraints that Gecode supports is the requirement that a string match a regular expression. As it happens, line clues in paint-by-number puzzles are easily converted to regular expressions, and that's the way this solver represents the puzzle to the solver.

Apparantly the program can also provide some portion of the search heuristics to be used to find a solution, maybe choosing between different search strategies and providing heuristic functions. I haven't studied this in detail.

The heuristic search strategy being used turns out to be fairly similar to the usual one-line-at-a-time depth-first search algorithms used by other solvers, since it does a one-constraint-at-a-time depth-first search.

Set-up: I downloaded the tarball from gecode.com, and followed their compilation instructions, first doing

   ./configure
   make 
This compiles the package, but doesn't install it. You can run the examples without installing it, just by the LD_LIBRARY_PATH environment variable to the build directory and then invoking the program like
   ./example/nonogram -solutions 2 1 
The "-solutions 2" argument tells how many solutions to find. The default is one, but since we are benchmarking solvers' ability to do uniqueness checking for puzzles, we want it to try to find two solutions if possible. The last argument tells which of the compiled in-examples to solve.

To be able to run the test problems, I modified the examples/nonogram.cpp file from the distribution by embedding my own examples into that file. The fact that data is compiled in instead of being read in at runtime gives a bit of an unfair advantage to this program, but it would obviously not be difficult to add that capability to the program, and if you did it would not increase its run time noticably.

Note that the runtimes I report are not the ones printed by the program. I report the same Unix CPU time values that I report for all the other programs tested.

Results: This performs quite well. The table of results doesn't show zero times for the smaller puzzles like some of the other solvers do. It seems like there is about 0.01 seconds of overhead in all runs, but that hardly matters. What matters is that it solves many of the harder puzzles with good times, easily outperforming the Wilk solver. It comes pretty close to matching the dedicated solvers.

This is especially impressive because the faster solvers are large, complex programs custom designed to solve paint-by-number puzzles. This one is a general purpose solver with just a couple hundred custom lines of code to generate the constraints, run the solver, and print the result. Considering that this is a simple application of a general purpose solving tool rather than hugely complex and specialized special purpose solving tool, this is an astonishingly good result.

Getting really first class search performance usually requires a lot of experimentation with different search strategies. This is awkward and slow to do if you have to implement each new strategies from scratch. I suspect that a tool like Gecode lets you try out lots of different strategies with relatively little coding to implement each one. That probably contributes a lot to getting to better solvers faster.

5.2.2. Hakan Kjellerstrand's MiniZinc Nonogram Solver

URL: http://www.hakank.org/constraint_programming_blog/2009/09/at_last_2_a_nonogram_solver_us.html

Version Evaluated: nonogram_create_automaton2.mzn / minizinc 1.0.3 / Gecode 3.2.2

Date Evaluated: December 2009

Limitations: Black and white only.

Assessment: An very good solver, especially for a simple demo program.

Description: This is another example of a paint-by-number solver from the constraint programming community, written in MiniZinc. MiniZinc is a special language for describing constraint programming problems that is supposed to be solver-independent, so that you can describe your problem once, and then try solving it with multiple different solvers. Basically the high-level MiniZinc model is translated into a low-level language called FlatZinc, which is understood by many different solvers.

Kjellerstrand's MiniZinc model for solving nonograms is one of hundreds of programs he has developed to solve different kinds of logic puzzles in different constraint programming systems. It converts the line clues into regular expression constraints, exactly the same way that Lagerkvist's solver does. It's a tiny bit more of a rounded program than Lagerkvist's Gecode solver, since the descriptions of the particular puzzle to solve is loaded from a separate data file, but the data files basically still need to be compiled in.

Because the the same model description can be used with different solvers, the performance varies depending on which solver you use.

Set-up: Kjellerstrand's script is called nonogram_create_automaton2.mzn. It takes data files in the MiniZinc .dzn format. He provides many examples.

I tried three different setups to try it with different solvers. One general annoyance here is that if you get things configured a bit wrong, or leave some of the options off the command lines, it will generally not give you an error message. Instead it will fall back to solving in some less efficient way than you thought you were asking for. This can make debugging your setup tricky. I got lots of help from Kjellerstrand.

Note also that there are plenty of other solvers that understand FlatZinc that I did not test.

Setup with G12 Solver

First I installed the current release of the MiniZinc-to-FlatZinc translator. I downloaded minizinc-1.0.3 from the G12 website. This includes precompiled binaries, so I just had to unpack the tarball and run "sh SETUP" in the distribution directory. The bin subdirectory contained mzn2fzn, a MiniZinc to FlatZinc translator, flatzinc, a FlatZinc-based solver, and minizinc, a wrapper that runs the previous two programs. To run the program with the G12 FlatZinc solver, we do:
   minizinc --data datafile.dzn nonogram_create_automaton2.mzn 
But the whole point of the MiniZinc language is that you can create FlatZinc programs for different solvers, so the translation to FlatZinc and the execution of the solver are normally separated, like this:
   mzn2fzn --data datafile.dzn nonogram_create_automaton2.mzn
   flatzinc nonogram_create_automaton2.fzn | perl mzn_show.pl "{trtr:12: #:} {nospace} {noresult}" 
Here mzn_show.pl is Kjellerstrand's Perl pretty printing program for displaying the final solution. Apparantly the output logic in the Kjellerstand's model ceases to work when execution is separated from translation, so he wrote an external program to present pretty output in that case.

I did only minimal tests with this configuration. It works fine, but both Kjellerstand and I had better results with the Gecode solver.

Setup with the Gecode Solver
I had already installed the Gecode 3.2.2 package to be able to test Lagerkvist's solver, and set-up instructions are given in that section of this survey. Of course, you also need the G12 mzn2fzn program installed in the previous section.

One more thing needs to be done before you can use mzn2fzn to generate a FlatZinc file for Gecode/FlatZinc. We need to give the translator some more knowledge of Gecode's capabilities. Under Minizinc's lib/minizinc directory, you should create a new directory named gecode and copy the contents of the directory gecode/flatzinc/mznlib from the Gecode distribution into it. Gecode/FlatZinc can run FlatZinc files generated without these extensions, but performance is greatly improved by using them.

Once you've done this, you can run the model in Gecode/FlatZinc with the commands below. This is what I used for the speed tests reported above.

   mzn2fzn -G gecode --data datafile.dzn nonogram_create_automaton2.mzn
   fz -solutions 2 nonogram_create_automaton2.fzn | perl mzn_show.pl "{trtr:12: #:} {nospace} {noresult}" 
Here the '-G gecode' flag on the mzn2fzn command causes the translator to use the special Gecode/Flatzinc library we copied over, and the -solutions 2' command on the fz command tells it to try to produce two solutions, so that we are doing uniqueness checking instead of simple solving.
Setup with the G12 Lazyfd Solver
After I posted the results of the tests on the Gecode solver, Kjellerstrand suggested trying it using the G12 Lazyfd solver, which apparantly uses a hybrid of two different solving techniques.

To do this, I first had to install a pre-release version of the G12 Minizinc package. I downloaded the "release of the day" for November 2, 2009 and installed it in just the same way as I installed the version 1.0.3 package.

Then I had to first make a small modification to the nonogram_create_automaton2.mzn model by deleting the search strategy hints. Apparantly the solver performs better without such hints. We change this hunk of code:

    solve :: int_search(
         [x[i,j] | j in 1..cols, i in 1..rows],
         first_fail,
         indomain_min,
         complete)
    satisfy;
To this:
    solve satisfy;
Then we run it with the commends below:
   mzn2fzn -G g12_lazyfd --data datafile.dzn nonogram_create_automaton2.mzn
   flatzinc -S -n 2 --solver lazy nonogram_create_automaton2.fzn | perl mzn_show.pl "{trtr:12: #:} {nospace} {noresult}" 

For the time tests, I included only the time to run the solver, fz or flatzinc, not the time to run mzn2fzn. Maybe I should have included both, since you have to run both each time you want to solve a new puzzle. However, this would only have added a small constant to all run times, so it's not going to make that big a difference to the overall assessment.

Results: I ran tests with Kjellerstrand's MiniZinc model on both the Gecode and G12 Lazyfd solver.

Results with the Gecode Solver

It's interesting to compare the performance with Gecode to Lagerkvist's C++ Gecode solver. Though they are quite similar, there are distinct differences. In the sample results there are puzzles where each is faster, but overall Lagerkvist's solver looks a little better. The full results make it clear that the difference is quite substantial.

When comparing this to the dedicated paint-by-numbers solvers, it's easy to get distracted by the fact that it is slower than the dedicated solvers on the easier puzzles. But though most of the puzzles in the full test set are easy puzzles, we really aren't talking about much of a speed difference, not more than a twentieth of a second. It's on the minority of hard puzzles that this solver shines.

Looking at the full results, it is clear that this system still falls short of the top three dedicated solvers. But it's important to remember that nonogram_create_automaton2.mzn contains only about 100 lines of code. From Kjellerstrand's blog, it is obvious that these 100 lines are the result of a fair amount of thought and experimentation, and a lot of experience with MiniZinc, but the Olšák solver is something like 2,500 lines and pbnsolve approaches 7,000. If you tried turning this into a fully useful tool rather than a technology demo, with input file parsers and such, it would get a lot bigger, but clearly the constraint programming approach has big advantages, achieving good search results with low development cost.

Results with the G12 Lazyfd Solver

The sample results for Kjellerstrand's MiniZinc model running on the G12 Lazyfd solver look quite different from any other solver's results. It is honestly pathetically slow on the easy problems.

But the remarkable thing is that it solves all but two of the puzzles in the sample set. Actually, it even solved the remaining two puzzles ("Lion" and "DiCap") as well, but each took slightly over 55 minutes to solve. (Note that in Kjellerstrand's own tests he got much faster solving times for the Lion than I did.)

The fact that it found "DiCap" difficult is a bit of a mystery. None of the other good solvers have much problem with that puzzle. Is it just afraid of pictures of lions? So it seems that though this is generally very, very good, it can be slow in some rare and not obviously predictable circumstances.

In the full results, we see that it solved all but 19 puzzles in under 2 minutes, which puts it close to the Simpson and Olšák solver. I suspect that the 2 minute cutoff in that test is actually too low for Lazyfd. For the other solvers, the majority of puzzles that don't solve in under two minutes won't solve in two hours either. It's not clear that this is true for Lazyfd.

It does have a down side though. On some puzzles this thing uses staggering amounts of memory. When I tried running it on my full collection of puzzles on computer A it got hung up on puzzle #354, because it needed 1.4 gigabytes of memory. Since computer A only has a gigabyte of memory, this started it thrashing hopelessly. This is a large puzzle, but not especially hard. On computer B, which has more memory, the Lazyfd solver could do it in 46.5 seconds. And Simpson's solver could do it in 0.01 seconds using 3.8M memory.

Without actually knowing a lot about how lazyfd works, I'd guess that a lot more effort is being spent on choosing optimal directions to search in. The memory consumption suggests that it may be mixing some degree of breadth-first searching with the depth-first strategies used by most solvers. On easy puzzles, this is massive overkill because they are easily solved no matter which way you direct the search. But it starts paying off on puzzles where other solvers run into a wall of combinatorial explosion and go exponential. From the fact that most of the "hard" puzzles are easily solved by at least one of the solvers surveyed, we know that for any particular puzzle there exists a good search strategy. Lazyfd seems to be pretty good at finding the right strategy for each puzzle, though it spends a lot of extra computation and memory doing so.

These two results highlight the advantage of a solver-independent modeling language like MiniZinc. You can describe your problem once, and then try out a wide variety of different solvers and heuristics without having to code every single one from scratch. You can benefit from the work of the best and the brightest solver designers. It's hard to imagine that this isn't where the future lies for the development of solvers for applications like this.

5.2.3. Robert Bosch's IP Solver

URL: http://www.oberlin.edu/math/faculty/bosch/pbn-page.html

Version Evaluated: No version number / glpk version 4.40

Date Evaluated: May 2009

Limitations: Black and white only. Always halts after first solution. No blank lines?

Assessment: Interesting, but slow.

Description: Bosch's tiny program takes an input file that describes the clues to a puzzle and translates it into a set of equations in a form that can be solved by standard integer programming software. Integer programs are like linear programs, except the values of the variables need to take integer values, and, while there are delightfully fast algorithms to solve linear programs, solving integer programs is a substantially more difficult task. Bosch provides a paper describing his algorithm, and two sample data files, spaceman.dat and dragonfly.dat.

Set-up: I started by downloading and installing version 4.40 of the Gnu glpk package, which includes a command-line program called "glpsol" which can solve integer programs expressed in the same CPLEX format generated by Bosch's little program. I have no idea how the performance of glpk compares to other packages such as ILOG CPLEX, which is most likely what Bosch used. I chose glpk because it was free.

Running the solver is a multistep process:

  1. Copy the clue file you want to solve into pbn.dat
  2. Run Bosch's pbn.c program. This reads the clue file, and writes out a CPLEX file named pbn.lp.
  3. Run the integer program solver. If you are using glpk, the command is:
    glpsol --cpxlp pbn.lp -o pbn.sol
  4. The output file from this, pbn.sol does not, of course, include a pretty picture of the solved puzzles. It just gives the values of the variables in the final solution. You want to look at the "Z" varibles, which are 1 for black and 0 for white.
I wrote a little perl script that would do all this and then pretty print the result.

Results: This solves most small puzzles pretty easily, but generally much slower than the line solvers do. Because it doesn't work line-by-line, some of the puzzles that give the more simple-minded line solvers problems are relatively easy for it. It did the edge logic puzzle, #23, in a half second.

The constraint generating program could use some work. It crashes on puzzles with blank lines. For some reason it generated a syntactically invalid CPLEX file for puzzle 16.

Interestingly, the second puzzle included in the distribution, the 20x20 dragonfly puzzle, required a touch under five hours of CPU time to solve on computer A, yet the puzzle is line solvable. My pbnsolve program solves that so fast that the CPU time registers as zero.

Erwin Kalvelgen posted a blog article and a follow-up that describe how he implemented Bosch's algorithm using GAMS as a solver and Excell as an interface. He seems to have solved the blank line problem and gotten much better performance. Links to his model and some test data can be found near the bottom of this page. I haven't tried this myself because a full version of GAMS costs money and the demo version looks like it is too limited to be able to solve any interesting puzzles.

Luiz Mingote and Francisco Azevedo have generalized this approach to multicolor puzzles. Their paper compares its results to the Olšák solver and found them similar for a few tiny puzzles. I doubt if it would remain comparable if presented with a larger set of more difficult puzzles, but I cannot test it, as their source code does not appear to be available.

Luiz Mingote's master's thesis describes the same results as the paper just mentioned, but goes one step further. He builds a hybrid solver by first using my pbnsolve program to do pure line solving, and then, instead of letting pbnsolve start probing and searching, he passing the rest of the problem to his IP solver to complete. He tested the pure IP, pbnsolve, and the hybrid on a collection of randomly generated puzzles. The hybrid vastly outperformed the pure IP, but pbnsolve outperformed the hybrid. A disappointing result for the researchers, but not really a surprise. They did have some results suggesting that the SCIP IP solver outperformed the CPLEX solver they used in most of their tests.

5.2.4. Andrew Makhorin's Example IP Solver

URL: http://www.gnu.org/software/glpk/

Version Evaluated: version distributed with glpk version 4.40

Date Evaluated: March 2010

Limitations: Black and white only.

Assessment: A cleaner, slower implementation of Bosch's solver.

Description: Andrew Makhorin, the author of the Gnu glpk package that I used to run Bosch's solver above, wrote me to point out that there is an implementation of a version of that solver included among the example programs in the glpk distribution. The example is written in the Gnu MathProg modeling language, which is a subset of AMPL. I believe it is intended as more of a demonstration of glpk's modeling capabilities than a practical solver.

Set-up: I had already installed glpk to test Bosch's program. The example nonogram solver is called examples/pbn.mod in the distribution. An example puzzle description is embedded in that file. It can be executed by doing:

glpsol --math pbn.mod
For my tests, I replaced the sample test data with data generated by the webpbn export program. This is obviously much simpler to run than the Bosch solver, and it even pretty prints the final puzzle image for us.

Results: Unsurprisingly, this performs somewhat similarly to the Bosch solver, but it is usually substantially slower. On the plus side, it can handle puzzles with blank rows or columns just fine.

5.2.5. Naoyuki Tamura's Copris Nonogram Solver

URL: http://bach.istc.kobe-u.ac.jp/copris/puzzles/nonogram/

Version Evaluated: Version 1.1 running in Scala version 2.9.3 and Java 1.6.0_24

Date Evaluated: September 2013

Limitations: Black and white only.

Assessment: Very good at solving extremely hard puzzles.

Description: Naoyuki Tamura has published a nonogram solver written in Copris, a Constraint Programming DSL (Domain-Specific Language) embedded in Scala. Scala is written in Java. This page includes test results comparing his run times to results reported here, but since his computer is faster than my computer they are a bit hard to interpret.

Set-up: Tamura's page provides a very clear description of how to run this solver. I downloaded the "tgz" version of Scala version 2.9.3, unpacked it, and added the bin subdirectory directory to my path. The run command is then:

   scala -cp copris-nonogram-v1-1.jar nonogram.Solver cwd_file_name
I found that the program crashed with "java.lang.OutOfMemoryError: GC overhead limit exceeded" errors for several puzzles (including puzzle numbers 2992, 3867, and 10810). I fixed this by editing the scala-2.9.3/bin/scala script to change the -Xmx256M java heap size setting to -Xmx1024M. I think I could have set an environment variable to change this.

The scala script does some annoying stuff with stty settings that makes it difficult to run programs in background. I didn't try to fix this.

Results: Execution times and memory usage vary quite a bit from run to run, but this is normal for Java programs. Twenty runs to solve the "Bucks" puzzle had run times ranging from 5.93 seconds to 6.54 seconds. For the sample results, I always reported the average of 20 runs. For the full results I did only one run for each puzzle. I don't know why anyone uses Java for anything.

Generally this solver is much slower than other solvers for solving "normal" puzzles, but it works amazingly well on the pathelogical puzzles that cause other solvers problems. It solved all 2,491 puzzles in our "full" test set. Only three puzzles took more than 40 seconds to solve, and of those none took more than 75 seconds. As far as being able to consistantly solve everything, this is the best solver I've tested to date.

The results on the 5,000 randomly generated puzzles were equally good. It solved every puzzle except number 3054 in under a minute, and it solved puzzle 2054 in just 2.4 minutes.

Generally, when I find a solver that solves everything in my test sets, I go looking for puzzles that it can't solve, so I can add them to my test set. So I ran this solver on the 4,071 black and white puzzles that had been posted to webpbn more recently. I found one puzzle, now listed as "Gettys" in the test set, that required just over ten minutes to solve, one that required 81 seconds, one that required 64 seconds, and all the rest were under a minute. So this program was able to solve all 6,562 published black and white puzzles on webpbn. That's very impressive.

But there's another place I can look for hard puzzles. On the webpbn site, when people create puzzles they typically need to work on them for a while before they are ready to be "published" for others to solve. Some puzzles never get published, either because the designer never managed to get them into workable shape, or because they were designing the puzzle for their own private use and didn't want to share it. So naturally this set of "unpublished" puzzles is likely to contain unusually difficult puzzles, which are nevertheless within the range of what can be expected to be presented to a puzzle verification programs. I don't usually use them for testing here, because I can't usually publish the puzzles in this report without getting the express permission of the designers, which is not usually possible.

So, in hopes of finding the limits of this solver, I ran it on the 1,081 unpublished black and white puzzles that were currently in the webpbn database. I found six puzzles that were not solved after expending a half hour of CPU time. I was able add two of them, "Knotty" and "Meow", to the sample set.

So, in the end, I was able to turn up a few puzzles that this solver couldn't solve, but there are very few, they are hard to find, and no other tested solver was able to solve any of them either. This a very impressive solver.

It really is pretty slow on simple puzzles though. Solving a trivial one-by-one puzzle takes it two-thirds of a second of CPU time.

Tests reported on Tamura's page indicate that the solver is faster when other SAT solvers are used, with average run times cut almost in half. I have not seen a need to repeat those tests on my computer.

5.2.6. Jan Wolter's Copris Multicolor Nonogram Solver

URL: https://webpbn.com/survey/copris.html

Version Evaluated: Unnumbered version running in Scala version 2.9.3 and Java 1.6.0_24

Date Evaluated: September 2013

Limitations: None.

Assessment: Same as Tamura's solver, but with color.

Description: This is a straight-forward extension of Naoyuki Tamura's solver to handle multicolor puzzles.

Set-up: This was run identically to Tamura's solver.

For black and white puzzles, a heap size limit of one megabyte proved sufficient to solve all puzzles, but for a few large multicolor puzzles this needed to be increased. Perhaps this is because there are more constraints in multicolor puzzles. Clearly this is a fairly memory hungry solver.

Results: When run on black and white puzzles, this performs nearly identically to Tamura's solver, which is no surprise since in that case it will be solving the identical set of constraints using the identical solver. Because of this, I am not reporting test results on black and white puzzles. The random puzzle set is also all black and white, so I didn't do those tests either.

This solver was able to solve every puzzle in my original 3,232 puzzle multicolor test set, plus all 8,512 multicolor puzzles posted to webpbn since then. Among these, one puzzle require 17 minutes to solve, but all others were under three minutes.

I tried it on 1,712 unpublished multicolor puzzles from webpbn, and it was able to solve all of those as well, although one took it over two hours to solve. The vast majority of puzzles were solved in less than 30 seconds.

Of course, there are still some monochromatic puzzles that it can't solve, as was mentioned in the discussion of Tamura's original solver. It's just that multicolor puzzles tend to be a bit easier than monochromatic puzzles.

5.3. Embedded Solvers

These solvers are embedded in more general tools for designing and solving paint-by-number puzzles. They usually have graphic user interfaces. It is generally impossible to get any runtime results from them.

5.3.1. Juraj Simlovic's Griddlers Solver

URL: http://jsimlo.sk/griddlers/

Version Evaluated: gs13

Limitations: Black and white only. Puzzles must be 50x50 or less.

Assessment: A good solver.

Description: This is a Windows application offering a complete puzzle creation/playing/auto-solving environment, but the emphasis is on the auto-solver. It is not open source. Many fine puzzles are included in the package. It is fairly well documented, including a general description of the algorithm.

Set-up: Nothing much to set up here. Unzip and run.

Results: I had no way to get CPU time information from this program, since the solver is embedded in a GUI, But it solves most puzzles quickly, while you watch.

If you give it puzzles that are too big, it crashes, saying "Access violation at address 00463EAE in module 'gridsolv.exe' read of address 00000012". It seemed like this happened for puzzles where any dimension exceeded 50, but I'm not sure. Because of this I was unable to test it against most of the harder puzzles on my list. It succeeded with all the smaller ones except the "Lion", where, like most other solvers tested, it got hopelessly bogged down in an interminable brute force search.

5.3.2. Alexander Meshcheryakov's QNonograms

URL: http://qnonograms.sourceforge.net/

Version Evaluated: qnonograms-0.2

Limitations: Black and white only.

Assessment: A good line solver with no search capability.

Description: An open source puzzle creation and solving environment written in C++. When run, it brings up a window with the empty puzzle grid, where you can solve manually or run the solver. The program is fairly minimalistic, and looks like a project in its early stages. It can read input files in two different binary formats (suffixes .jap and .jcr) and one plain text format (suffix .cwd) all documented only by internal comments, and then only scantily.

Setup: To compile from the source package on a Linux system, first do "qmake", which generates the Makefile and then do "make". The ready-to-run binary will be in the build subdirectory. You can give a data file name on the command line, or select one after you run it.

Results: Because the solver is integrated with the rest of the program, it is somewhat difficult to obtain timing results. It appears to be a fairly competent line solver. It solves all the line solvable puzzles on our list above, but for the rest it solves only as much as can be line-solved and then stops.

On the Flag puzzle, it didn't appear to do all possible line solving before quitting.

5.3.3. Jan Wolter's webpbn.com Javascript Helper

URL: webpbn.com

Version Evaluated: No version numbers.

Limitations:

Assessment: A somewhat dim-witted solver with no search capability.

Description: Webpbn.com is a community website where users create puzzles for other users to solve. Users solve puzzles on a page where Javascript is used to render the puzzle and update the screen image in response to mouse clicks. There is an automated solver available in this environment that can be activated with the "helper" button, however this button is only available if you are solving a puzzle you created yourself, or are solving a puzzle that you have previously solved manually.

The webpn helper is a small Javascript solver that normally uses a left-right overlap algorithm for line solving. This is normally implemented using Javascript's regular expression engine to find possible solutions, but on browsers where the regular expressions are brain-damaged a version of the line solving algorithm from pbnsolve is used instead.

This left-right overlap algorithm for line solving is not complete, and there is no search capability in the helper, so there are many puzzles that will only be partially solved by the helper. However, when the helper stalls, it is possible for the user to manually mark a few more cells and then restart the helper.

Setup: None.

Results: Puzzle designers need to solve their puzzles many times as they make small changes in hopes of improving the playability. Doing this manually is extremely tedious, and error prone. The helper is designed to streamline this process. It only does things that any compentent human solver would do. Then it stops, so the puzzle designer can see if more sophisticated reasoning can be used to continue. Thus, its stupidity as a solver is actually a feature. It is designed to solve cooperatively with the puzzle designer, allowing the puzzle designer to assess how well the puzzle solves and how difficult it is without having to solve it entirely by hand.

6. Conclusions

The only practical applications that I know of for nonogram solvers are (1) validating new puzzle designs to ensure they have a unique solution, (2) providing hints to human solvers when they get stuck on a puzzle, and (3) cheating in puzzle solving competitions. In each case, we are solving puzzles designed by humans for humans to solve, and those just aren't usually that difficult. Nobody designs nonograms on 10,000 by 10,000 grids because nobody wants to solve them.

The "Lion", "Flag", "Nature", "Marley" and "Gettys" puzzles which are among the most difficult puzzles in our test set are poor designs by inexperienced puzzle designers. Any experienced puzzle designer can tell at a glance that these puzzles have thousands of possible solutions, so no experienced designer would ever have created them. Thus, puzzles like this never occur in books and magazines. However, they often occur as intermediate stages during the development of new puzzles, so it is important that a validator be able to handle them.

Puzzles like "Forever" and "9-Dom", on the other hand, are extremely clever constructs by skilled human designers aiming to explore unconventional puzzle solving techniques. They are solvable by skilled human solvers. But puzzles like that are rare too.

So, although nonogram solving is known to be NP-complete, the fact that real world puzzles are typically small and well-designed means that it is nevertheless quite easy to build a practical solver that works well nearly all of the time.

Some of the solvers described here were built to be practical, real-world solvers. They were designed by people who actually wanted to solve nonogram puzzles. My own solver, pbnsolve, is actually used to validate dozens of puzzles every day. It is optimized to run very fast in little memory, not so much because we need answers in a tenth of a second, but because it runs on a web server and any resources it consumes have a negative impact on the server's ability to serve web pages. For solvers like this, the most significant part of the sample results table is all the "zero" or near zero run times on the easy puzzles. Solving most puzzles fast is more important than solving the rare pathelogical puzzles at all.

Wu's solver was designed to win a programming contest. It is tuned to type of data used in that contest, which includes randomly generated puzzles.

Other solvers described here are written by people who have no practical interest in actually solving puzzles, but only want to use the problem as a demonstration of the power of their general purpose problem solving platforms. For those authors, this survey provides a benchmark to compare their solving systems against other general solving platforms, and against hand built, dedicated solvers. They hope to prove not that they have the best tool for solving nonograms, but that they have the best general problem solving system.

But if you are using this data that way, you run up against the problem that we don't actually have a lot of hard problems in our data set. It is perfectly possible to tune a solver to solve the particular hard problems in this set. That proves nothing about the general power of your solving framework, unless we can find more hard problems to throw at it. Ideally we would have an endless supply of puzzles of arbitrary size and difficulty. But there is no bottomless pool of hard nonogram problems. People just don't make them. The best you could do is to test on randomly generated puzzles, but if you optimize your solvers for those, then you are no longer working on anything with any practical application at all, because the characteristics of randomly generated puzzles are quite different from the puzzles that people really solve.

That being said, I must say that I'm impressed with the ability of Tamura's Copris solver to solve almost every puzzle I can come up with to throw at it.

So it's hard to say anything about what solver is best, because of the different goals of the different solvers, and because of the limitations of the test data available. Still some interesting observations can be made.

I think Simpson's solver is a terrific example of a well-tuned, super-efficient, pure heuristic search system. Compared to some of the other solvers, it's memory consumption is microscopic, but it solves nearly all puzzles very quickly. It seems to all come down to having well-chosen search heuristics that are well-tuned to solving human-designed puzzles. The fact that it's performance falls off badly when it is presented with randomly generated puzzles just goes to demonstrate how well it is tuned for the application it was designed for.

The Olsak solver and the two Gecode solvers are also pure heuristic search systems. They don't do quite as well as Simpson's on the human designed puzzles, and don't fall apart quite as badly on the randomly generated puzzles.

The various solvers based on constraint solving systems are impressive in another sense. Though their average run time may not match the hand-built solvers (yet!), the amount of effort that went into programming them to solve nonograms is far less. These are short little programs, probably written in a couple days, and they are competitive with dedicated solvers that are the result of months or years of work. The power of the underlaying search libraries and the ease of experimentation with alternative search heuristics obviously serves them well. If you need a solver fast, building on top of one of these packages is clearly the right way to go. As these general purpose solving environments continue to grow in sophistication, I think the day may well come when building dedicated solvers from the ground up makes no sense at all.

These heuristic search programs are basically doing depth-first search, using quickly evalated heuristic functions to choose which branches to explore first, in hopes of finding the solution more quickly. My own pbnsolve system doesn't actually use heuristic functions. Instead, it explores each possible branch for some distance before choosing one. This probing approach expends a lot more computation at each decision point in hopes of winning a pay off by finding a short route to the solution. It works more often then it fails, so performance is generally good, even on randomly generated puzzles. It does however, sacrifice some speed on easier puzzles, where all that probing turns out to be overkill. This was a very tricky part of the design, getting better performance on the rare hard problems without getting too sluggish on the common easy ones.

The BGU system goes further in this direction, doing even more probing at each decision point. It succeeds in solving everything in our test set, but it also seems to sacrifice even more speed on easy puzzles.

I don't know the details of how the Lazyfd and Copris solvers work, but I suspect that it is an approach similar to probing that spends more computation to try to make better choices, or otherwise adds more of a breadth-first search flavor into its strategy. Again, the focus is more on getting through hard problems than solving simple ones quickly.

Solving nonograms with integer programming systems has been studied quite a bit, but I think the results are underwhelming. I don't think this is a promising approach to the problem.

I suspect that there is a lot of room for progress here. The fact that few puzzles are hard for all solvers suggests that if a hybrid approach that more effectively combines their strengths should be possible.

7. Other Solvers

This is just a list of links to pages describing solvers that I maybe ought to investigate someday. Some of them are puzzlingly undocumented. Others would require me to figure out how to install and configure all sorts of strange packages before I could run them. Some are windows packages that I probably wouldn't be able to collect run time data from anyway. Some I just haven't gotten around to - thoroughly testing a new solver takes me a day or two. But who knows, maybe I'll get around to them someday. If anyone is especially eager to have me look at one of these, or knows of others to add to the list, please let me know.
Last Update: Wed Sep 25 06:35:01 PDT 2013