1. Introduction
2. Sample Results
3. Full Results
4. Random Results
5. Discussions of Individual Solvers
5.1. Dedicated Stand-Alone Solvers6. Conclusions
5.1.1. Jan Wolter's pbnsolve Program5.2. Solvers Based on General Purpose Tools
5.1.2. Mirek and Petr Olák's Nonogram Solver
5.1.3. Steve Simpson's Nonogram Solver
5.1.4. Jakub Wilk's Nonogram Solver
5.1.5. Kenichi Ishigaki's Games::Nonogram Perl Module
5.1.6. Richard Wareham's Nonogram Solver
5.1.7. Frans Faase's Solver Project
5.2.1. Mikael Lagerkvist's Gecode Nonogram Example5.3. Embedded Solvers
5.2.2. Hakan Kjellerstrand's MiniZinc Nonogram Solver
5.2.3. Robert Bosch's IP Solver
5.3.1. Juraj Simlo's Griddlers Solver
5.3.2. Alexander Meshcheryakov's QNonograms
5.3.3. Jan Wolter's webpbn.com Javascript Helper
7. Other Solvers
"Paint-By-Number" is one of many names for a type of graphical logic puzzle originating in Japan. Other common names for these puzzles are Nonograms, Griddlers, Hanjie, and Picross. The puzzles consist of a blank grid with numbers along the top and one side. The numbers tell the sizes of the blocks of the foreground color in that row and column. To solve the puzzle, you must reconstruct the image by figuring out how the color blocks must be place in each row and column. One of the many charms of these puzzles is that they are quite easy to create as well as to solve.
I run a community paint-by-number website, webpbn.com, where people create and solve such puzzles. To help validate puzzles created there, I wrote a program to automatically solve such puzzles, pbnsolve, which is open source. I'm still not entirely happy with its capabilities and performance, so I thought I'd survey some of the solvers out there, using Steve Simpson's list of solvers as a starting point.
My testing procedure is generally to figure out how to get the solver running on my computer (which normally runs Linux but can be booted into Windows), write some code so that I can export puzzles from the webpbn.com database in the correct format to be read by the solver, and then test the solvers on the puzzles.
I started out by testing each solver on a small collection of interesting puzzles, collecting run times in a table. Then for the faster solvers, I ran them on a collection consisting of every black & white puzzle posted on webpbn.com as of a certain date. Some statistics on memory usage were also collected, and a round of tests on randomly generated puzzles has been done.
Hakan Kjellerstrand has done some independent testing of some constraint-based solvers on the same sample data set used here.
This document has been updated many times. Older links and/or citations may
well have refered to older versions which were significantly different.
2. Sample Results
I initially tested each solver on a sample set of puzzles listed below. A couple things to keep in mind:
Note however that the copyrights for all puzzles on webpbn are retained by the designers of the puzzles, so though you can download puzzles for your own use, you can not redistribute them.
However, the authors of some of these puzzles have given permission for their puzzles to be freely redistributed as long as the original attribution and copyright message is retained. These puzzles are marked with a * in the table below. You can consider those marked puzzles to be under something like a Creative Commons attribution license.
Computer B:Previous versions of this document used test results from an older computer, "Computer A". Those results are still available", but, except for all the run times being a bit bigger, they don't really differ in any interesting way.
CPU: 2.6GHz AMD Phenom II X4 810 quad-core processor Memory: 8 Gb OS: OpenSuse 11.2 Compiler: gcc 4.4.1
Both computers are bootable into Windows, but only a few solvers required windows and we weren't able to collect runtime statistics from any of them.
Note that computer B has four processors. None of the solvers surveyed so far exploit this. They are all single-threaded, so they are running on just one processor. If a solver was clever enough to distribute its workload over multiple processors, then the CPU time reported here would be the sum of the CPU times over all processors, which could then be larger than the actual elapsed time from start to end of the run.
Run time results are given in the table below. On the right are CPU times for various solvers. Times are usually reported in seconds, but sometimes in minutes. Runtimes of "0" are below a hundredth of a second. "Fail" means it failed to completely solve the problem. "X" means it couldn't handle the puzzle, usually because the puzzle contains blank lines and the solver can't read them. (Blank lines aren't really particularly hard to handle, but many authors seem to forget to test that special case.) "+" means it ran for longer than half an hour. A "-" means I didn't try it.
| puzzle | size | Unique Solution? |
Line Solvable? |
Blank Lines? |
Wolter | Olák | Simp- son |
Lager- kvist/ Gecode |
Kjeller- strand/ Gecode |
Kjeller- strand/ Lazyfd |
Wilk | Ishi- gaki |
Ware- ham |
Bosch/ glpk |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| #1: Dancer * | 5 x 10 | Yes | Yes | No | 0 | 0 | 0 | 0.01s | 0,01s | 0.04s | 0 | 0.04s | 0.05s | 0.01s |
| #6: Cat * | 20 x 20 | Yes | Yes | No | 0 | 0 | 0 | 0.01s | 0.02s | 0.44s | 0 | 0.28s | 0.12s | 2.0s |
| #21: Skid * | 14 x 25 | Yes | Yes | Yes | 0 | 0 | 0 | 0.01s | 0,02s | 0.64s | 0 | Fail | X | X |
| #27: Bucks * | 27 x 23 | Yes | No | Yes | 0 | 0 | 0 | 0.01s | 0.02s | 1.1s | 0 | Fail | X | X |
| #23: Edge * | 10 x 11 | Yes | No | No | 0 | 0 | 0 | 0.01s | 0.01s | 0.08s | 0 | Fail | Fail | 0.49s |
| #2413: Smoke | 20 x 20 | Yes | No | No | 0 | 0 | 0 | 0.01s | 0.02s | 0.60s | 0 | 1.3s | 17.3m | 10.8m |
| #16: Knot * | 34 x 34 | Yes | Yes | No | 0 | 0 | 0 | 0.01s | 0.02s | 5.5s | 0 | 18.8s | Fail | Fail |
| #529: Swing * | 45 x 45 | Yes | Yes | No | 0 | 0 | 0 | 0.02s | 0.04s | 13.2s | 0.03s | 4.3s | Fail | - |
| #65: Mum * | 34 x 40 | Yes | No | No | 0.01s | 0 | 0.01s | 0.02s | 0.04s | 11.0s | 0.36s | + | Fail | - |
| #1694: Tragic | 45 x 50 | Yes | No | No | 0.05s | 0.03s | 0.54s | 0.14s | 3.6m | 32.1s | 1.9s | + | - | - |
| #1611: Merka | 55 x 60 | Yes | No | Yes | 0.01s | 0.01s | 0 | 0.03s | + | 1.1m | 5.6m | + | X | X |
| #436: Petro * | 40 x 35 | Yes | No | No | 0.10s | 15.2s | 0.10s | 1.4m | 1.6s | 6.2s | 9.2m | + | - | - |
| #4645: M&M | 50 x 70 | Yes | No | Yes | 0.10s | 0.10s | 0.02s | 0.93s | 0.28s | 48.1s | 14.0m | - | X | X |
| #3541: Signed | 60 x 50 | Yes | No | No | 0.04s | 1.1s | 5.4m | 0.57s | 0.56s | 1.0m | + | - | - | - |
| #803: Light | 50 x 45 | Yes | No | No | 0.96s | + | 0.02s | + | + | 4.7s | + | - | - | - |
| #6574: Forever * | 25 x 25 | Yes | No | No | 17.0m | 2.0s | 18.9s | 4.7s | 2.3s | 1,7s | + | - | + | + |
| #2040: Hot | 55 x 60 | Yes | No | No | 1.3s | + | 1.2m | + | + | 2.6m | + | - | - | - |
| #6739: Karate | 40 x 40 | No | No | No | 7.9s | 17.3s | 18.3m | 56.0s | 57.8s | 13.6s | + | - | - | - |
| #2556: Flag | 65 x 45 | No | No | Yes | + | 1.5s | 0 | 3.4s | + | 4.2s | + | - | X | X |
| #2712: Lion | 47 x 47 | No | No | No | 1.9m | + | + | + | + | + | + | - | - | + |
I've had several people report running times quite a bit different from the ones reported here. Hakan Kjellerstand used the lazyfd solver to solve the "Lion" puzzle in two minutes on his computer, but it took me 55 minutes to solve. Mikael Kjellerstand solved it with gecode in 13 minutes on his computer, but I gave up after 11 hours. I don't know why there are these big differences.
I've made some attempt to study the memory usage of the different solvers, but this is difficult as there is no easy way under Linux to get the maximum or average memory usage of a process. The operating system simply does not collect that information. So instead I did repeated runs under different address space rlimits, until I found the smallest memory limit under which the program would still run. This method requires many runs of the program however, so I could only do it for puzzles where the run times were fairly small.
The table below summarizes these results. The first column shows the memory used to solve a very small, very easy 10x5 puzzle. This is more or less the minimum memory usage.
Then runs were done on various other puzzles from the list above that had fairly small runtimes. Each solver was tested on a different set of puzzles, since solvers are fast on different puzzles. The run which used the most memory is reported. This second value shouldn't be taken as an upper limit to the solver's memory usage, just as an indication of how much memory usage grows as the puzzle complexity grows. Note that first three solvers in the list used pretty much the same amount of memory on all puzzles tested.
| Solver | Memory Usage on #1: Dancer |
Largest Memory Usage Observed on Sample Set | |
|---|---|---|---|
| Simpson | 3.8M | 3.8M | all puzzles |
| Olák | 7.2M | 7.2M | all puzzles |
| Wilk | 10.4M | 10.4M | all puzzles |
| Wolter | 13.5M | 14.1M | on #4645: M&M |
| Bosch/glpk | 9.4M | 15.1M | on #6: Cat |
| Ishigaki/perl | 19.4M | 27.4 M | on #2413: Smoke |
| Lagerkvist/Gecode | 106.6M | 112.4M | on #4645: M&M |
| Kjellerstrand/Gecode | 108.9M | 122.3M | on #4645: M&M |
| Wareham/mono | 160.0M | 163.4M | on #6: Cat |
| Kjellerstrand/Lazyfd | 20.5M | 689.2M | on #4645: M&M |
The differences here are quite spectacular, but, of course, it should
come as no surprise that the systems that use interpretors or complex general
purpose libraries have a bigger memory footprint.
3. Full Results
The puzzles in the table above are not by any means typical puzzles.
Most of the puzzles created for humans to solve are similar to the first two
on the list, and can be quickly solved by even fairly simple-minded solvers.
The puzzles listed later in the list were mostly selected from those that
gave the pbnsolve program problems,
and so the sample is somewhat skewed against pbnsolve.
In my experience, solvers tend to be rather finicky. A change in the algorithm that makes it run a bit faster on one puzzle sometimes makes it run much slower on others. For example, pbnsolve's probing algorithm helps it cope with many difficult puzzles, but it fails badly on the "Flag" puzzle in the list above, which it can only solve if you turn off probing. Because of this, it's hard to judge any solver based on just a short list of puzzles.
So given a more typical set of puzzles, how many of them does each solver solve? To test this, I used a snapshot of the webpbn.com puzzle database (as of Oct 2, 2009) containing 2,491 black and white puzzles and 3,232 multicolor puzzles, and ran some of the more interesting solvers on every single one.
Some important notes about this test set:
The number of black and white puzzles solved in various time ranges is tabulated below, sorted by the leftmost column. They were run on the same computer with the same compilation and execution options used in the sample results. To keep these tests from running excessively long, any run that took longer than two minutes was interupted.
| Solver | 0 to 0.09 sec |
0.10 to 0.19 sec |
0.20 to 0.49 sec |
0.50 to 0.99 sec |
1.00 to 3.99 sec |
4.00 to 9.99 sec |
10.00 to 29.99 sec |
30.00 to 59.99 sec |
1 min to 2 min |
2 min or more |
|---|---|---|---|---|---|---|---|---|---|---|
| Wolter | 2443 | 34 | 5 | 3 | 2 | 1 | 0 | 0 | 1 | 2 |
| Simpson | 2439 | 10 | 4 | 5 | 10 | 2 | 3 | 0 | 2 | 16 |
| Olák | 2414 | 13 | 10 | 8 | 16 | 3 | 6 | 3 | 2 | 16 |
| Lagerkvist/Gecode | 2380 | 23 | 20 | 12 | 12 | 12 | 4 | 5 | 5 | 18 |
| Kjellerstrand/Gecode | 2306 | 42 | 31 | 15 | 20 | 14 | 5 | 5 | 3 | 50 |
| Wilk | 2107 | 60 | 76 | 52 | 54 | 31 | 19 | 9 | 11 | 72 |
| Kjellerstrand/Lazyfd | 125 | 92 | 409 | 343 | 712 | 424 | 284 | 66 | 17 | 19 |
The last row clearly looks very different from all the rest. All the other solvers listed are all really using variations on the same basic algorithm, and their performance looks grossly similar. The Lazyfd solver uses a strategy that makes it substantially slower on the ordinary easy puzzles, but serves it very well on the rare hard puzzles.
The other solvers solve the vast majority of puzzles very quickly. Any of the first four solvers listed can solve more than 96% of all puzzles in under a second. Pbnsolve, with a little help from the bias in the data set mentioned above, solved 99.8% of all puzzles in under a second. Clearly, most puzzles are easy. Only a tiny fraction of the puzzles designed for humans to solve are really hard for programs to solve.
Another thing that stands out in this table is that the Olák solver and the Simpson solver look very similar. But this similarity is actually somewhat illusionary. Both solvers required more than two minutes for 16 puzzles, but it isn't the same 16 puzzles. In fact, there are only 3 puzzles on both lists. Every puzzle in the database can be solved in under two minutes by at least one solver, though no solver can solve them all that quickly.
Given that most puzzles are easy, and different puzzles are hard for different solvers, it is very hard to thoroughly test a solver without a large sample set of puzzles to test it on. Tests based on small sample sets give very misleading results. The sample set I used in the previous section was partly generated by picking puzzles from these full runs that gave various solvers problems, so it includes at least some "hard" puzzles for every solver, but there is no guarantee that that will be true for the next solver to come along.
Only pbnsolve and the Olák solver can handle multicolor puzzles. The following results were found when running both solvers on all 3,232 multicolor puzzles in the webpbn.com database.
| Solver | 0 to 0.09 sec |
0.10 to 0.19 sec |
0.20 to 0.49 sec |
0.50 to 0.99 sec |
1.00 to 3.99 sec |
4.00 to 9.99 sec |
10.00 to 29.99 sec |
30.00 to 59.99 sec |
1 min to 2 min |
2 min or more |
|---|---|---|---|---|---|---|---|---|---|---|
| Wolter | 3199 | 14 | 6 | 2 | 1 | 1 | 1 | 2 | 0 | 6 |
| Olák | 3178 | 9 | 5 | 4 | 7 | 6 | 4 | 0 | 0 | 19 |
Though handling color puzzles adds a lot of complexity to the solver, since many more special cases can come up, the majority of color puzzles are easier to solve than black and white puzzles, just because contradictions between what colors the row clues says a square can be and what colors the column clues say it can be help narrow down the range of possibilities. However, the rare pathelogical multicolor puzzle can manage to be a lot more pathelogical, because the fact that white spaces between color blocks aren't required means there can be a whole lot more possible solutions to a line where the crossing clues happen to be unhelpful.
Again, the puzzles that are hard for the two solvers are not always the same puzzles. There are only four multicolor puzzles that neither solver could solve in less than two minutes: #672, #2498, #3085 and #3114.
For this test, I generated a set of 5000 two-color, 30x30 puzzles. Each puzzle was generated by painting each grid cell black or white with a 50% probability. Each solver was then run against this set of puzzles. An tarball containing the puzzle grids used in this test can be downloaded.
These puzzles are not very similar to human designed puzzles. The vast majority have multiple solutions, and scarcely any are solvable without guessing by working on one line at a time. The breakdown is shown below, in comparison to that of the full sample of black and white puzzles from webpbn:
| Random Puzzles | Webpbn Puzzles | |
| Multiple solutions: | 97.5% | 2.6% |
| Unique solution but not line solvable: | 2.1% | 15.6% |
| Unique solution and line solvable: | 0.4% | 81.8% |
So, though these tests avoid a priori biases, the data isn't very much like any real world puzzle data, and it is certainly going to favor some solvers over others. In particular, solvers whose main strength is fast line solving aren't going to perform as well as ones whose main strength is effective searching.
Here are the run time results, sorted by the rightmost column:
| Solver | 0 to 0.09 sec |
0.10 to 0.19 sec |
0.20 to 0.49 sec |
0.50 to 0.99 sec |
1.00 to 3.99 sec |
4.00 to 9.99 sec |
10.00 to 29.99 sec |
30.00 to 59.99 sec |
1 min to 2 min |
2 min or more |
|---|---|---|---|---|---|---|---|---|---|---|
| Kjellerstrand/Lazyfd | 0 | 0 | 0 | 0 | 616 | 2921 | 1208 | 176 | 50 | 29 |
| Wolter | 3736 | 379 | 302 | 142 | 198 | 74 | 59 | 17 | 27 | 66 |
| Olák | 2879 | 333 | 381 | 244 | 405 | 199 | 146 | 84 | 61 | 268 |
| Kjellerstrand/Gecode | 2227 | 461 | 495 | 311 | 491 | 238 | 205 | 115 | 97 | 360 |
| Lagerkvist/Gecode | 2325 | 381 | 468 | 308 | 460 | 251 | 235 | 104 | 79 | 389 |
| Simpson | 2568 | 306 | 309 | 213 | 348 | 215 | 224 | 105 | 91 | 621 |
| Wilk | 644 | 310 | 484 | 370 | 697 | 424 | 452 | 237 | 231 | 1151 |
Once again, the Lazyfd solver is on a completely different planet than the other solvers. Among all 5000 puzzles, it didn't find a single puzzle it could solve in under a second. One took 1.7 seconds, one took 1.8 seconds and all the rest took more than two seconds. (Note that lazyfd's run times are very sensitive to the size of the puzzle. All of the fast times in the full sample are on puzzles much smaller than 30x30.) But though it is slow on the easy puzzles, it's amazing on the hard ones. It solved substantially more puzzles in under 2 minutes than any other solver.
Simpson's solver didn't do as well as I expected on this set of puzzles. The performance of pure depth-first search systems depends very much on the choice of good heuristics for choosing which direction to search in. Simpson's heuristics seems to work very well for human-designed puzzles but not as well for these randomly generated puzzles. This is not too surprising, since this is a ridiculous set of puzzles, and nobody in their right mind would tune their solver to work optimally on them.
Pbnsolve relies less on heuristics than on its probing strategy, which is kind of a poor-man's version of breadth-first search, so its not too surprising that it adapts fairly well to this set of puzzles.
The two Gecode solvers are pretty close together on this test. Evidentally neither of the two search heuristics has a pronounced advantage on this problem set.
As a way of deciding which program is "best" at solving nonograms, testing on random puzzles is absurd. These puzzles are quite unlike the population of puzzles likely to be encountered in any practical application. I think these tests do, however, give interesting insights into the strengths and weaknesses of these different solvers.
5.1.1 Jan Wolter's pbnsolve Program
URL: http://webpbn.com/pbnsolve.htmlVersion Evaluated: 1.05 (version 1.06 exists but performance is the same)
Limitations: None.
Assessment: Among the fastest and most flexible solvers.
Description: This is the solver I wrote myself, and I have already described it in way too much detail.
The most salient point here is that it uses a "probing" strategy. When its straight logic solving gets stuck, then instead of using some kind of heuristic function to make a guess, it generates a long list of possible guesses, and evaluates every single one with its logic solving engine until it stalls again. All the results of this are discarded, keeping only record of which choice solved the most cells. That one choice is then chosen as a guess, and the search proceeds, starting with the recomputation of the logical consequences of that guess.
This gives the solver some of the attributes of a breadth-first search system, in that it looks some distance ahead on many different possible paths instead of being guided by a heuristic function. But it doesn't consume memory as hungrily as a real depth first search system. It pays the price in doing a lot of extra logic solving, but as the logic solver is highly optimized, the run-time overhead is not too horrible.
Set-up: Compilation instructions are in the README file.
In production, pbnsolve is given a copy of the intended solution, and can use this to make uniqueness checking faster in some cases. To be fair to the other solvers we did not do that in these tests. All test runs were done with the -u flag to disable the use of goal solutions from the input file.
By default pbnsolve takes input in XML format. But XML files can sometimes be extremely slow to parse, since doing so may require fetching the DTD files over the Internet. To avoid this, most tests were done with input files in Steve Simpson's .non format.
Results: It's most interesting to compare this to Simpson's solver, which does a pure depth-first search with a carefully tuned search heuristic to select guesses. Both solvers are very fast at finding logical consequences. Simpson's solver is usually bit faster on the easier puzzles, where pbnsolve's expensive strategy for making guesses costs it more than it gains it. The payoff is on the harder puzzles, where pbnsolve is substantially more likely to be able to find its way to a solution. Better guesses mean fewer guesses and thus less combinatorial explosion.
On some puzzles, like the Flag, this fails pbnsolve miserably though. I suspect that there are lots of situations coming up in that puzzle where none of the candidate guesses brings you substantially further than the others, so probing fails to find useful results, and just becomes a very, very expensive way to make bad guesses.
URL: http://www.olsak.net/grid.htmlVersion Evaluated: 1.2
Limitations: None.
Assessment: A fast and flexible solver, with excellent line solving and heuristic search heuristic search capabilities.
Description: This is an open-source C-language solver that can handle color puzzles and triddlers as well as traditional paint-by-number solvers. The code is richly commented, but in Czech. English documentation is minimal. The documentation claims the line solver has linear complexity. Can read two different file formats. A number of test puzzles are included.
Set-up: It's all in one C file. Compile it and you're ready to go. I ran my tests like
./grid -total 2 -log 1 datafileThe -log option turns off the profuse default logging.There are upper bounds for various things compiled into the program. To be able to handle all the puzzles in my multicolor puzzle set, I had to increase MAXBLOKU, which I think is the maximum number of clue numbers in a line. (Note that color puzzles can have twice as many clues per line as black & white puzzles.)
Results: The numbers in the table above tell the story. This is a very fast solver.
Note that earlier versions of this survey showed this solver performing somewhat poorly on the full results, but that turned out to be due to a bug in my batch execution script.
URL: http://www.comp.lancs.ac.uk/~ss/software/nonowimp/Version Evaluated: nonogram-1.9.12 with nonolib-4.3.0.
Limitations: Black and white only.
Assessment: Excellent solver with a terrific heuristic search capability.
Description: An open source solver library and command line interface written in C99. A Java version also exists but was not tested.
Set-up: Precompiled versions are availabe for Windows and RISC OS, but alas, I wanted to run in under Linux. To do that you need to get both the nonolib library and the nonogram command line tool.
The first step in building the library is to create a file named nonolib-env.mk and put some configuration information into it. There is a sample in the configuration in the INSTALL file, but it's bizarre, suggesting a -ansi flag although the program is in C99, not ANSI C. I used:
INSTALL=install INSTALL_PATH=/usr/local RANLIB=ranlib AR=ar CFLAGS += -std=c99Then 'make' and 'make install' work fine. I tried doing this with -O2 added to the CFLAGS but this cause gcc-4.2 to give a segmentation fault while compiling fcomp.c. I've since tested this with gcc-4.4.1 and it compiles fine with that.Compiling the nonogram command line program follows a similar procedure. For nonogram-env.mk, I used:
INSTALL=install INSTALL_PATH=/usr/local CFLAGS += -I$(INSTALL_PATH)/include LDFLAGS += -L$(INSTALL_PATH)/lib CFLAGS += -std=c99For all tests, I ran it with the command ./nonogram -s 1 -i datafile .Results: As can be seen from the run times in the table, this is clearly a first rate solver, very close in performance to the Olák's solver. It also consumes far less memory than the other solvers.
URL: http://jwilk.nfshost.com/software/nonogram.htmlVersion Evaluated: nonogram_0.8.4.4
Limitations: Black and white only.
Assessment: Good at line solving, but not so good at searching.
Description: This is an open-source C-language solver. It comes with a large collection of nice data files. Documentation is minimal. Looks like a linesolver with some search capability on top of that. It does an uncommonly nice job of pretty-printing the solved puzzles.
Set-up: Compile it by typing "make" in the distribution directory.
I was running it with the command
./nonogram -m < datafileThe -m gives less garish output.Results: This solver appears to be fast and capable at line solving, though maybe a bit less so than some of the top solvers. But if line solving doesn't fully solve the puzzle, it declares the puzzle "ambiguous" but says it will try to solve it anyway, starting a heuristic search. It's heuristic search gets the right answers, but not particularly fast. This solver's performance on puzzles where search is needed falls significantly behind some of the other solvers.
URL: http://search.cpan.org/~ishigaki/Games-Nonogram-0.01/Version Evaluated: Games-Nonogram-0.01
Limitations: Black and white only. No blank lines?
Assessment: Somewhat weak and somewhat slow.
Description: This is somewhat weirdly packaged as a Perl module. Documentation is sparse. I don't know what algorithm it uses, but it seems to be some kind of line solver with some search capability. No sample data is given in the distribution, and the details of the input format require some guessing. It's not clear how blank lines should be handled, if they can be handled at all.
Set-up: You install the module just like you'd install any other perl module, following the simple steps in the README file. I think you can ignore any messages that "make test" gives about Test::Pod modules not being installed. But this only installs a library. To run the thing you've got to write a Perl program that uses it. But luckily it's a very short Perl program:
#!/usr/bin/perl use Games::Nonogram; $start= (times)[0]; Games::Nonogram->bootstrap; $end= (times)[0]; printf "CPU Time: %.2f seconds\n",$end-$start;Put that in a file, permit it executable, and you can run it, giving the file name for the input file on the command line.Results: This solves most small puzzles just fine, though slower than other solvers. It is, after all, written in an interpreted language.
It seems to be able to handle at least some simple puzzles with multiple solutions, reporting all possible solutions, so it must have some trial and error search capabilities.
However it failed to solve some other fairly easy puzzles, like the line-solvable Skid puzzle, where it printed out a partially complete solution. It also got only partway through Bucks which is not line solvable. For the Edge puzzle, which is not line solvable, it says:
Block 1 is broken. (Games::Nonogram::Block::left overflow: 4) Unless you're trying to solve by brute force, there may be something wrong in the puzzle data.
URL: http://www-sigproc.eng.cam.ac.uk/ga/index.php?title=User:RichWareham/NonogramsVersion Evaluated: no version number.
Limitations: Black and white only.
Assessment: Slow and seriously buggy.
Description: An open source solver written in C#. A few sample puzzles are included in the testgames.txt file. Nothing in the way of algorithmic documentation. It appears to be a a line solver, though the line solving algorithm looks kind of inefficient (try all possible arrangements of white spaces). Might have some guessing ability.
Set-up: I used the mono compiler to build this. It was actually already installed on my Linux system, and is probably available in packages for most linux systems. I built it with "make -f Makefile.Nonograms".
It doesn't read the puzzle data from a data file like most solvers, instead it takes them from the command line. This seemed rather awkward to me, so I wrote a little script that would run it like this:
mono Nonograms.exe `cat datafile`Results: It ran OK on a couple of smaller puzzles, though the "Edge" puzzle caused it to simply say "Oh I give up".
Then I tried it on the Knot puzzle. On computer A, it locked up my windowing system until I could only kill it with difficulty, and I gave up on further tests. On computer B, I was able to watch its memory usage quickly climb over 4 gigabytes before it crashed. The same thing appeared to happen on the next few puzzles I tried. My impression is that some bug causes it to go into an infinite memory allocating loop on most larger puzzles.
URL: http://www.iwriteiam.nl/D0603.html#2.2Version Evaluated: no version number.
Limitations: Black and white only.
Assessment: Incomplete.
Description: Steve Simpson must have searched very hard to find a link to someone's solver mentioned in their blog. It's in C++ but it seems incomplete and non-functional. In any case, not ready for prime time.
The Constraint Programming (CP) community especially has long used nonograms (everyone in the CP community call paint-by-number puzzles by that name) as a standard example problem. A description of the nonogram problem is included in a standard list of constraint problems.
Because these are usually written just as demonstration programs, they typically don't have very elaborate user interfaces. Many require recompilation to load in new puzzles. If you wanted to use these in a practical nonogram solving application, you might need to wrap some more code around them.
5.2.1. Mikael Lagerkvist's Gecode Nonogram Example
URL: http://www.gecode.org/doc-latest/reference/classNonogram.htmlVersion Evaluated: Version distributed with gecode 3.2.2 (on Computer B) and gecode 3.2.1 (on Computer A)
Limitations: Black and white only.
Assessment: An amazingly good solver, especially for a simple demo program.
Description: The first example of such an implementation that I tested was one written by Mikael Lagerkvist and distributed as an example program for the Gecode constraint solving library. Gecode is an open source package written in C++. Lagerkvist's example nonogram solver is a simple little C++ program, with only a couple hundred lines of code, that constructs a constraint set describing a paint-by-number puzzle, calls Gecode to solve it, and prints the result. It's a bit bare-bones, lacking even the ability to read data in from files. The puzzles to be solved have to be compiled into the program.
The basic concept here is to describe the problem to be solved, in this case a particular paint-by-number puzzle by a set of constraints. One form of constraints that Gecode supports is the requirement that a string match a regular expression. As it happens, line clues in paint-by-number puzzles are easily converted to regular expressions, and that's the way this solver represents the puzzle to the solver.
Apparantly the program can also provide some portion of the search heuristics to be used to find a solution, maybe choosing between different search strategies and providing heuristic functions. I haven't studied this in detail.
The heuristic search strategy being used turns out to be fairly similar to the usual one-line-at-a-time depth-first search algorithms used by other solvers, since it does a one-constraint-at-a-time depth-first search.
Set-up: I downloaded the tarball from gecode.com, and followed their compilation instructions, first doing
./configure makeThis compiles the package, but doesn't install it. You can run the examples without installing it, just by the LD_LIBRARY_PATH environment variable to the build directory and then invoking the program like./example/nonogram -solutions 2 1The "-solutions 2" argument tells how many solutions to find. The default is one, but since we are benchmarking solvers' ability to do uniqueness checking for puzzles, we want it to try to find two solutions if possible. The last argument tells which of the compiled in-examples to solve.To be able to run the test problems, I modified the examples/nonogram.cpp file from the distribution by embedding my own examples into that file. The fact that data is compiled in instead of being read in at runtime gives a bit of an unfair advantage to this program, but it would obviously not be difficult to add that capability to the program, and if you did it would not increase its run time noticably.
Note that the runtimes I report are not the ones printed by the program. I report the same Unix CPU time values that I report for all the other programs tested.
Results: This performs quite well. The table of results doesn't show zero times for the smaller puzzles like some of the other solvers do. It seems like there is about 0.01 seconds of overhead in all runs, but that hardly matters. What matters is that it solves many of the harder puzzles with good times, easily outperforming the Wilk solver. It comes pretty close to matching the dedicated solvers.
This is especially impressive because the faster solvers are large, complex programs custom designed to solve paint-by-number puzzles. This one is a general purpose solver with just a couple hundred custom lines of code to generate the constraints, run the solver, and print the result. Considering that this is a simple application of a general purpose solving tool rather than hugely complex and specialized special purpose solving tool, this is an astonishingly good result.
Getting really first class search performance usually requires a lot of experimentation with different search strategies. This is awkward and slow to do if you have to implement each new strategies from scratch. I suspect that a tool like Gecode lets you try out lots of different strategies with relatively little coding to implement each one. That probably contributes a lot to getting to better solvers faster.
URL: http://www.hakank.org/constraint_programming_blog/2009/09/at_last_2_a_nonogram_solver_us.htmlVersion Evaluated: nonogram_create_automaton2.mzn / minizinc 1.0.3 / Gecode 3.2.2
Limitations: Black and white only.
Assessment: An very good solver, especially for a simple demo program.
Description: This is another example of a paint-by-number solver from the constraint programming community, written in MiniZinc. MiniZinc is a special language for describing constraint programming problems that is supposed to be solver-independent, so that you can describe your problem once, and then try solving it with multiple different solvers. Basically the high-level MiniZinc model is translated into a low-level language called FlatZinc, which is understood by many different solvers.
Kjellerstrand's MiniZinc model for solving nonograms is one of hundreds of programs he has developed to solve different kinds of logic puzzles in different constraint programming systems. It converts the line clues into regular expression constraints, exactly the same way that Lagerkvist's solver does. It's a tiny bit more of a rounded program than Lagerkvist's Gecode solver, since the descriptions of the particular puzzle to solve is loaded from a separate data file, but the data files basically still need to be compiled in.
Because the the same model description can be used with different solvers, the performance varies depending on which solver you use.
Set-up: Kjellerstrand's script is called nonogram_create_automaton2.mzn. It takes data files in the MiniZinc .dzn format. He provides many examples.
I tried three different setups to try it with different solvers. One general annoyance here is that if you get things configured a bit wrong, or leave some of the options off the command lines, it will generally not give you an error message. Instead it will fall back to solving in some less efficient way than you thought you were asking for. This can make debugging your setup tricky. I got lots of help from Kjellerstrand.
Note also that there are plenty of other solvers that understand FlatZinc that I did not test.
Setup with G12 Solver
First I installed the current release of the MiniZinc-to-FlatZinc translator. I downloaded minizinc-1.0.3 from the G12 website. This includes precompiled binaries, so I just had to unpack the tarball and run "sh SETUP" in the distribution directory. The bin subdirectory contained mzn2fzn, a MiniZinc to FlatZinc translator, flatzinc, a FlatZinc-based solver, and minizinc, a wrapper that runs the previous two programs. To run the program with the G12 FlatZinc solver, we do:Setup with the Gecode Solverminizinc --data datafile.dzn nonogram_create_automaton2.mznBut the whole point of the MiniZinc language is that you can create FlatZinc programs for different solvers, so the translation to FlatZinc and the execution of the solver are normally separated, like this:mzn2fzn --data datafile.dzn nonogram_create_automaton2.mzn flatzinc nonogram_create_automaton2.fzn | perl mzn_show.pl "{trtr:12: #:} {nospace} {noresult}"Here mzn_show.pl is Kjellerstrand's Perl pretty printing program for displaying the final solution. Apparantly the output logic in the Kjellerstand's model ceases to work when execution is separated from translation, so he wrote an external program to present pretty output in that case.I did only minimal tests with this configuration. It works fine, but both Kjellerstand and I had better results with the Gecode solver.
I had already installed the Gecode 3.2.2 package to be able to test Lagerkvist's solver, and set-up instructions are given in that section of this survey. Of course, you also need the G12 mzn2fzn program installed in the previous section.Setup with the G12 Lazyfd SolverOne more thing needs to be done before you can use mzn2fzn to generate a FlatZinc file for Gecode/FlatZinc. We need to give the translator some more knowledge of Gecode's capabilities. Under Minizinc's lib/minizinc directory, you should create a new directory named gecode and copy the contents of the directory gecode/flatzinc/mznlib from the Gecode distribution into it. Gecode/FlatZinc can run FlatZinc files generated without these extensions, but performance is greatly improved by using them.
Once you've done this, you can run the model in Gecode/FlatZinc with the commands below. This is what I used for the speed tests reported above.
mzn2fzn -G gecode --data datafile.dzn nonogram_create_automaton2.mzn fz -solutions 2 nonogram_create_automaton2.fzn | perl mzn_show.pl "{trtr:12: #:} {nospace} {noresult}"Here the '-G gecode' flag on the mzn2fzn command causes the translator to use the special Gecode/Flatzinc library we copied over, and the -solutions 2' command on the fz command tells it to try to produce two solutions, so that we are doing uniqueness checking instead of simple solving.After I posted the results of the tests on the Gecode solver, Kjellerstrand suggested trying it using the G12 Lazyfd solver, which apparantly uses a hybrid of two different solving techniques.To do this, I first had to install a pre-release version of the G12 Minizinc package. I downloaded the "release of the day" for November 2, 2009 and installed it in just the same way as I installed the version 1.0.3 package.
Then I had to first make a small modification to the nonogram_create_automaton2.mzn model by deleting the search strategy hints. Apparantly the solver performs better without such hints. We change this hunk of code:
solve :: int_search( [x[i,j] | j in 1..cols, i in 1..rows], first_fail, indomain_min, complete) satisfy;To this:solve satisfy;Then we run it with the commends below:mzn2fzn -G g12_lazyfd --data datafile.dzn nonogram_create_automaton2.mzn flatzinc -S -n 2 --solver lazy nonogram_create_automaton2.fzn | perl mzn_show.pl "{trtr:12: #:} {nospace} {noresult}"For the time tests, I included only the time to run the solver, fz or flatzinc, not the time to run mzn2fzn. Maybe I should have included both, since you have to run both each time you want to solve a new puzzle. However, this would only have added a small constant to all run times, so it's not going to make that big a difference to the overall assessment.
Results: I ran tests with Kjellerstrand's MiniZinc model on both the Gecode and G12 Lazyfd solver.
Results with the Gecode Solver
It's interesting to compare the performance with Gecode to Lagerkvist's C++ Gecode solver. Though they are quite similar, there are distinct differences. In the sample results there are puzzles where each is faster, but overall Lagerkvist's solver looks a little better. The full results make it clear that the difference is quite substantial.When comparing this to the dedicated paint-by-numbers solvers, it's easy to get distracted by the fact that it is slower than the dedicated solvers on the easier puzzles. But though most of the puzzles in the full test set are easy puzzles, we really aren't talking about much of a speed difference, not more than a twentieth of a second. It's on the minority of hard puzzles that this solver shines.
Looking at the full results, it is clear that this system still falls short of the top three dedicated solvers. But it's important to remember that nonogram_create_automaton2.mzn contains only about 100 lines of code. From Kjellerstrand's blog, it is obvious that these 100 lines are the result of a fair amount of thought and experimentation, and a lot of experience with MiniZinc, but the Olák solver is something like 2,500 lines and pbnsolve approaches 7,000. If you tried turning this into a fully useful tool rather than a technology demo, with input file parsers and such, it would get a lot bigger, but clearly the constraint programming approach has big advantages, achieving good search results with low development cost.
Results with the G12 Lazyfd Solver
The sample results for Kjellerstrand's MiniZinc model running on the G12 Lazyfd solver look quite different from any other solver's results. It is honestly pathetically slow on the easy problems.But the remarkable thing is that it solves almost everything in the sample set. Actually, it even solved the Lion puzzle, although it took 55.2 minutes to do it. Now, I didn't allow a lot of other solvers 55 minutes to run, but that's still pretty impressive. (Note that Kjellerstrand got much faster solving times for the Lion than I did.)
In the full results, we see that it solved all but 19 puzzles in under 2 minutes, which puts it close to the Simpson and Olák solver. I suspect that the 2 minute cutoff in that test is actually too low for Lazyfd. For the other solvers, the majority of puzzles that don't solve in under two minutes won't solve in two hours either. It's not clear that this is true for Lazyfd.
It does have a down side though. On some puzzles this thing uses staggering amounts of memory. When I tried running it on my full collection of puzzles on computer A it got hung up on puzzle #354, because it needed 1.4 gigabytes of memory. Since computer A only has a gigabyte of memory, this started it thrashing hopelessly. This is a large puzzle, but not especially hard. On computer B, which has more memory, the Lazyfd solver could do it in 46.5 seconds. And Simpson's solver could do it in 0.01 seconds using 3.8M memory.
Without actually knowing a lot about how lazyfd works, I'd guess that a lot more effort is being spent on choosing optimal directions to search in. The memory consumption suggests that it may be mixing some degree of breadth-first searching with the depth-first strategies used by most solvers. On easy puzzles, this is massive overkill because they are easily solved no matter which way you direct the search. But it starts paying off on puzzles where other solvers run into a wall of combinatorial explosion and go exponential. From the fact that most of the "hard" puzzles are easily solved by at least one of the solvers surveyed, we know that for any particular puzzle there exists a good search strategy. Lazyfd seems to be pretty good at finding the right strategy for each puzzle, though it spends a lot of extra computation and memory doing so.
These two results highlight the advantage of a solver-independent modeling language like MiniZinc. You can describe your problem once, and then try out a wide variety of different solvers and heuristics without having to code every single one from scratch. You can benefit from the work of the best and the brightest solver designers. It's hard to imagine that this isn't where the future lies for the development of solvers for applications like this.
URL: http://www.oberlin.edu/math/faculty/bosch/pbn-page.htmlVersion Evaluated: No version number / glpk version 4.40
Limitations: Black and white only. No blank lines?
Assessment: Interesting, but slow.
Description: Bosch's tiny program takes an input file that describes the clues to a puzzle and translates it into a set of equations in a form that can be solved by standard integer programming software. Integer programs are like linear programs, except the values of the variables need to take integer values, and, while there are delightfully fast algorithms to solve linear programs, solving integer programs is a substantially more difficult task. Bosch provides a paper describing his algorithm, and two sample data files, spaceman.dat and dragonfly.dat.
Set-up: I started by downloading and installing version 4.40 of the Gnu glpk package, which includes a command-line program called "glpsol" which can solve integer programs expressed in the same CPLEX format generated by Bosch's little program. I have no idea how the performance of glpk compares to other packages such as ILOG CPLEX, which is most likely what Bosch used. I chose glpk because it was free.
Running the solver is a multistep process:
I wrote a little perl script that would do all this and then pretty print the result.
- Copy the clue file you want to solve into pbn.dat
- Run Bosch's pbn.c program. This reads the clue file, and writes out a CPLEX file named pbn.lp.
- Run the integer program solver. If you are using glpk, the command is:
glpsol --cpxlp pbn.lp -o pbn.sol- The output file from this, pbn.sol does not, of course, include a pretty picture of the solved puzzles. It just gives the values of the variables in the final solution. You want to look at the "Z" varibles, which are 1 for black and 0 for white.
Results: This solves most small puzzles pretty easily, but generally much slower than the line solvers do. Because it doesn't work line-by-line, some of the puzzles that give the more simple-minded line solvers problems are relatively easy for it. It did the edge logic puzzle, #23, in a half second.
The constraint generating program could use some work. It crashes on puzzles with blank lines. For some reason it generated a syntactically invalid CPLEX file for puzzle 16.
Interestingly, the second puzzle included in the distribution, the 20x20 dragonfly puzzle, required a touch under five hours of CPU time to solve on computer A, yet the puzzle is line solvable. My pbnsolve program solves that so fast that the CPU time registers as zero.
Luiz Mingote and Francisco Azevedo have generalized this approach to multicolor puzzles. Their paper compares its results to the Olák solver and found them similar for a few tiny puzzles. I doubt if it would remain comparable if presented with a larger set of more difficult puzzles, but I cannot test it, as their source code does not appear to be available.
5.3.1. Juraj Simlo's Griddlers Solver
URL: http://jsimlo.sk/griddlers/Version Evaluated: gs13
Limitations: Black and white only. Puzzles must be 50x50 or less.
Assessment: A good solver.
Description: This is a Windows application offering a complete puzzle creation/playing/auto-solving environment, but the emphasis is on the auto-solver. It is not open source. Many fine puzzles are included in the package. It is fairly well documented, including a general description of the algorithm.
Set-up: Nothing much to set up here. Unzip and run.
Results: I had no way to get CPU time information from this program, since the solver is embedded in a GUI, But it solves most puzzles quickly, while you watch.
If you give it puzzles that are too big, it crashes, saying "Access violation at address 00463EAE in module 'gridsolv.exe' read of address 00000012". It seemed like this happened for puzzles where any dimension exceeded 50, but I'm not sure. Because of this I was unable to test it against most of the harder puzzles on my list. It succeeded with all the smaller ones except the "Lion", where, like most other solvers tested, it got hopelessly bogged down in an interminable brute force search.
URL: http://qnonograms.sourceforge.net/Version Evaluated: qnonograms-0.2
Limitations: Black and white only.
Assessment: A good line solver with no search capability.
Description: An open source puzzle creation and solving environment written in C++. When run, it brings up a window with the empty puzzle grid, where you can solve manually or run the solver. The program is fairly minimalistic, and looks like a project in its early stages. It can read input files in two different binary formats (suffixes .jap and .jcr) and one plain text format (suffix .cwd) all documented only by internal comments, and then only scantily.
Setup: To compile from the source package on a Linux system, first do "qmake", which generates the Makefile and then do "make". The ready-to-run binary will be in the build subdirectory. You can give a data file name on the command line, or select one after you run it.
Results: Because the solver is integrated with the rest of the program, it is somewhat difficult to obtain timing results. It appears to be a fairly competent line solver. It solves all the line solvable puzzles on our list above, but for the rest it solves only as much as can be line-solved and then stops.
On the Flag puzzle, it didn't appear to do all possible line solving before quitting.
URL: webpbn.comVersion Evaluated: No version numbers.
Limitations:
Assessment: A somewhat dim-witted solver with no search capability.
Description: Webpbn.com is a community website where users create puzzles for other users to solve. Users solve puzzles on a page where Javascript is used to render the puzzle and update the screen image in response to mouse clicks. There is an automated solver available in this environment that can be activated with the "helper" button, however this button is only available if you are solving a puzzle you created yourself, or are solving a puzzle that you have previously solved manually.
The webpn helper is a small Javascript solver that normally uses a left-right overlap algorithm for line solving. This is normally implemented using Javascript's regular expression engine to find possible solutions, but on browsers where the regular expressions are brain-damaged a version of the line solving algorithm from pbnsolve is used instead.
This left-right overlap algorithm for line solving is not complete, and there is no search capability in the helper, so there are many puzzles that will only be partially solved by the helper. However, when the helper stalls, it is possible for the user to manually mark a few more cells and then restart the helper.
Setup: None.
Results: Puzzle designers need to solve their puzzles many times as they make small changes in hopes of improving the playability. Doing this manually is extremely tedious, and error prone. The helper is designed to streamline this process. It only does things that any compentent human solver would do. Then it stops, so the puzzle designer can see if more sophisticated reasoning can be used to continue. Thus, its stupidity as a solver is actually a feature. It is designed to solve cooperatively with the puzzle designer, allowing the puzzle designer to assess how well the puzzle solves and how difficult it is without having to solve it entirely by hand.
Clearly among the dedicated solvers, pbnsolve, Simpson's solver and the Olák solver are the most highly developed.
The two constraint-based systems by Lagerkvist and Kjellerstrand come quite close in performance to the dedicated solvers, although both are more in the category of demonstrations of constraint programming than fully developed solving applications. The power of the underlaying search libraries and the ease of experimentation with alternative search heuristics obviously serves them well. I think it very likely that approaches based on these kinds of methods will ultimately prove the most effective.
I suspect that there is a lot of room for progress here.
The fact that few puzzles are hard for all solvers suggests that if a hybrid
approach that more effectively combines their strengths might be possible.
7. Other Solvers
This is just a list of links to pages describing solvers that I maybe ought
to investigate someday. Some of them are puzzlingly undocumented. Others
would require me to figure out how to install and configure all sorts of
strange packages before I could run them. Some are windows packages that
I probably wouldn't be able to collect run time data from anyway. But who
knows, maybe I'll get around to them someday anyway. If anyone is especially
eager to have me look at one of these, or knows of others to add to the list,
please let me know.