Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ianprime0509/pbnsolve


https://github.com/ianprime0509/pbnsolve

Last synced: 3 days ago
JSON representation

Awesome Lists containing this project

README

        

# pbnsolve

A continuation of the [pbnsolve](https://webpbn.com/pbnsolve.html) program by
Jan Wolter using Zig.

Original README follows (to be updated at some point):

---

PBNSOLVE VERSION 1.10
Jan Wolter

This software is open source, copyrighted by Jan Wolter, but released under
an Apache 2.0 License.

This software usually requires libxml2 which seems to be available on most
modern Unix systems. It is also available from http://xmlsoft.org/.

It is possible to build pbnsolve without libxml2 by setting a flag in the
config.h file, but all the supported non-xml input file formats are rather
pathetic so this will pretty seriously cripple pbnsolve. People only
interested in black and white puzzle might be able to get by with that.

Sample puzzles to test on can be obtained at http://webpbn.com/export.cgi

To Compile:
-----------

This program was developed for Unix systems. It would probably not be
difficult to port to other operating systems. Libxml2 is available for
most operating systems. The only-unix specific code that comes to mind
is the resource limitation stuff. If you do a port, I'd be interested in
your diffs.

On Unix:

edit config.h

Might want to change some of the settings here, though likely the
ones in the distribution are OK.

edit Makefile

Mainly to get include file paths for libxml2 right and decide between
-O2 (for production) and -g (for development).

NOTE: -O2 makes a HUGE difference. It can make pbnsolve runs more
than twice as fast.

"make pbnsolve"

Should compile without warnings.

Usage:
------

Run syntax is:

pbnsolve -[bdhlopt] -[v] [-n] [-s] [-x] [-d]
[-f] [-a] []

Input files may be in any of too many formats, described in the "Input Format"
section below. Pbnsolve will try to guess the file format based on the
filename suffix. If that doesn't work, it will try to guess it based on
the content of the file, though it doesn't do that terribly well.

If the path is omitted, then it reads from standard input. In
this case it will always expect "xml" format.

Command line options:

-b
Brief output. Error messages are as usual, but normal output is just
one line containing one or more of the following words separated by
spaces:

unique - puzzle had a unique solution.
multiple - puzzle had multiple solutions.
line - puzzle was solvable with line & color logic alone.
depth-2 - puzzle was solvable with two-level lookahead.
trivial - puzzle was too easy, with little or no whitespace.
solvable - found solution but none of the above was proven.
contradiction - puzzle had no solutions.
timeout - cpu limit exceeded without a solution.
stalled - only partially solved puzzle.

Without the -b flag you get wordier output along with ASCII images
of one or two puzzle solutions. You don't get triviality reported
in that case, unless you give the -t flag and notice a difficulty
rating of 100.

Which outputs are possible depends to some extent to the algorithms
selected with the -a flag. "depth-2" occurs only if algorithm C
(Contradiction Check) is enabled. "stalled occurs only if no
search algorithm is enabled (G, P or M).

Note that with the -u or -c flags, we will always get one of
'unique' or 'multiple' if we find any solution at all. If the puzzle
can be solved by line and color logic alone, it will be flagged
logical, but since other solution techniques are possible, it should
not be assumed that other puzzles are not logically solvable in a
broader sense.

-u
Check uniqueness. If we had to do any searching (that is we actually
needed to invoke algorithms G, P or M) then we don't necessarily know
that the first solution we find is the only solution to the puzzle.
If the -u flag is given, then in such cases we proceed to try to find
a second solution. If none are found, the puzzle is unique.

-c
Input puzzle is expected to include a goal solution grid. We try
to see if we can find a solution different from that one, and if
so, report it. This is sometimes faster than -u, and in the case
of multiple solutions always reports back a non-goal solution, but
is otherwise similar.

-o
Print a description of the puzzle data structure before starting
to solve it. This is mainly for debugging the puzzle reading
code. It's not pretty.

-a

Use the listed algorithms. Possible values are listed below. The
order in which the options are given does not determine the order
in which they are tried. Default is -aLHEGP.

L - LRO Line Solving. This is normally the first thing we try,
examining rows and columns one at a time, comparing the leftmost
possible position of the blocks to their rightmost possible
position and marking all cells that fall in the same block
either way. Keep doing this until the puzzle is done or it
we stall.

H - Cache Line Solver Solutions. This is a supplement to the
Line solver. It stores line solver results in a hash table
so they can be reused it the same situation comes up again,
which occurs surprisingly often. Cache hit rates of 80% to
90% are not unusual with large puzzles, so this can speed
up the solver substantially, reducing run times by 30% to
50%, but it increases memory consumption substantially.

E - Exhaustive Check. After line solving stalls, but before
trying anything else, double check every cell on the board
to make sure it really can have all of it's listed values.
Doing this this is probably as likely to slow us down as speed
us up, but it ensures that any puzzle solvable by line and color
logic alone will be flagged as 'logical'. You can use this
as a substitute for the line solver instead of as a backup, but
it is slower.

C - Contradiction Check. Try every possible color for every cell
that has more than one possible color left, and search to see
if that leads to a contradiction, but only search a couple
levels down. If a contradiction is found, eliminate that
color possiblity. Otherwise forget it. This should solve
most cells that humans can readily solve, using techniques
like edge logic, because humans only have limited look-ahead.
The depth limit is set by the -d option. The contradiction
check is done after the line solving algorithms (L and E)
have stalled, but before trying heuristic search algorithms
(GPM).

G - Guessing. Start a depth-first search for a solution, using
heuristics to guess colors for cells, continuing forward until
we find either a solution or a contradiction, and then back
tracking to the last guess to try something else. This was
the default in older versions of pbnsolve. If both guessing
and probing (P or M) are enabled, then probing will be used
first, but if it seems to be stalling, guessing will be tried
for a while.

You can further choose between three different heuristic
functions, by suffixing the flag with a digit. The G1 and G2
algorithms are old and bad. G3 is the one used in older
version of pbnsolve. G4 is the default now. It's based on
the heuristic functions used by Steve Simpson's solver. G5
and G6 are experimental functions that try to use information
from a known solution to make guesses. They work very badly.

P - Probing. Similar to guessing but instead of using heuristics
to choose a guess, we actually explore the consequences of
each possibility, and choose the one that makes the most
progress for us. Thus we invest more computation in making
a good choice. This can slow us down on many relatively easy
puzzles, where the quality of the choice doesn't matter much,
but speeds us up substantially on harder puzzles. If this is
set M is unset, but it can be combined with guessing (G). In
that case we mostly probe, but if it seems to be stalling,
we try guessing.

Different variations of probing can selected by suffixing the
flag with a digit. P1 is our old probing algorithm, which
probed on cells with two or more solved neighbors. P2 adds
probing on all cells adjacent to cells set since the last
guess. P3 instead adds probing on cells who are given a good
rating by the heuristic function from the guessing algorithm.
P4 does all three. P2 is the default.

M - Merged Probing. Similar to probing but whenever we probe on
different possible settings for a cell, we check to see if
all the alternatives set some other cells to the same values.
If they do, set those values. This was implemented in the
hopes that it would improve performance, but the payoff is
usually less than the overhead, so it seems to be a dud.
Setting this unsets P.

-f
Explicitly set the input file format. The argument should be
on of the "suffixes" listed in the "Input Formats" section below.
If this is not given, pbnsolve will first look at the actual
suffix of the input filename. If there is no recognizable suffix,
then it will try to guess the format of the file from it's first
few bytes.

-n
For file formats, like the XML format, in which the file can contain
more than one puzzle, this specifies which puzzle to solve. The
default is 1, which means the first puzzle.

-s
Start solving from one of the "saved" solutions in the input file.
is the number of the input to use, if there is more than one.
If the number is omitted it defaults to 1.

-m or -m
Turns on printing of explanation messages, which try to explain
how pbnsolve solved the puzzle. Explanations are printed for
linesolving (-aL), exhaustive search (-aE) and contradicting (-aC)
but not for heuristic search, because that would typically result
in far too much output. Note that in explanation messages we
refer to rows and columns by one-based numbers, while in debug
messages we use zero-based columns, which is confusing if you turn
both on at once. If a count is given, it tells how often to print
a snapshot of the board. If no count is given, a snapshot will
be printed after every tenth message.

-x
Set a limit on the CPU time used by pbnsolve. If is zero
or omitted, then there is no CPU limit, otherwise pbnsolve will
abruptly terminate if it uses more than that number of CPU seconds.
Normally the default is zero (no limit) but it's possible to build
pbnsolve with a different default CPU limit.

-d
Set a depth limit for the contradiction checking algorithm. Since
contradiction checking is not enabled by default, this has no effect
unless the -aC flag is also given. If this option is omitted, the
default depth is 2.

-t
After run is completed, print out run time and various other
statistics.

-i
If interupted, pause execution, print out statistics, and ask
the user if we should terminate or continue. This gives a way
to monitor performance of long-running solves.

-h
Run in http mode. Output is XML-formatted in a way suitable for
use in an AJAX-application. This doesn't work right with the
-t, -v or -b flags.

-v
Write debugging messages to standard output. The are
flags that indicate what kinds of debugging messages should be output.
Some or all of these flags may be disabled by compile time options.
Recognized flags include:

A - Top level messages.
B - Backtracking messages.
C - Contradiction search message (with -aC)
E - Messages from exhaustive check (with -aE)
G - Messages from guessing.
H - Messages from hashing.
J - Job management messages.
L - Linesolver messages. (disabled by default)
M - Merging Messages.
P - Probing Messages.
Q - Probing Statistics.
U - Messages from Undo code used when backtracking.
S - Cell State change messages.
V - Extraverbosity when used with any of the above.

-w
Print debugging messages relevant to a particular row or column.
You can say -wR12 for row twelve or -wC0 for column zero. This option
is normally disabled by a compile time option, since constantly
checking if the current row needs logging slows things down. You
can watch as many as 20 rows or columns.

CGI Mode:
---------

If the program is renamed "pbnsolve.cgi" then it will work as an AJAX
puzzle validator. It ignores command line arguments and run as if the options
-hcx1 were set, so it checks if the given goal is unique, returns results in
xml format, and times out after one second of CPU time is used. (The actual
timeout value is configurable at compile time).

A CGI variable named "image" should contain the puzzle description (basically
the contents of the input file). The CGI variable "format" may contain the
one of the suffix strings specified below to indicate the format of the input
file. If the format is not given explicitly, a half-hearted attempt will be
made to guess it from the contents of the file.

In CGI mode, output will be something like this for a puzzle with a unique
solution:

Content-type: application/xml


OK
1
1
510

or like this for a puzzle with multiple solutions:

Content-type: application/xml


OK
0

X.
.X

300

The will be "FAIL:" followed by an error message if the
solver was for any reason unable to run, or "TIMEOUT" if the solver
exceeded it's runtime limit, or "OK" otherwise. is 1 if
the puzzle has only one solution, 0 if otherwise. If there are
multiple solutions, it will give one that differs from the goal
solution in the tag. is 1 if it was solvable by line
and color logic alone, 2 if two-level lookahead was needed. If the
solver had to guess, no line appears. is a
measure of how hard the solver had to work to solve the puzzle.
It's -1 if the puzzle was blank, 100 if the puzzle had so little
white space that it was trivial to solve, and a larger number if
it was harder.

Input Formats:
--------------

Pbnsolve can read several input formats. Some input formats include only
the clues, not the goal image, and some include only the goal image but not
the clues, and some can contain both. Pbnsolve will construct clues from a
goal image if the clues are not given.

Note that the -c option (which is always on in CGI mode) requires a goal
image, so only formats that can include a goal image can be used with that.
Uniqueness checking on other formats needs to be done with the -u option.

Some formats can include saved solutions. These are images of incompletely
solved puzzles which pbnsolve can take as a starting point instead of starting
with a blank grid.

Recognized formats are listed below:

PBNSOLVE XML FORMAT
Suffix: "xml"
Contains: clues, goal, saved
Color: any number of colors
Documentation: http://webpbn.com/pbn_fmt.html

This is our native format, but if libxml2 is not available and you built
pbnsolve with the NOXML flag, then it isn't available and pbnsolve is
crippled because none of the other formats currently available support
all of pbnsolve's features.

STEVE SIMPSON'S .NON FORMAT (EXTENDED)
Suffix: "non"
Contains: clues, goal, saved
Color: black & white only
Documentation: http://www.comp.lancs.ac.uk/~ss/nonogram/fmt2

Pbnsolve is slightly incompatible with the spec in that it will treat a
blank line in the clues as a blank clue. The official way to mark a blank
clue is with a '0'. Pbnsolve accepts those as blank clues too.

The official format allows arbitrary additional properties to be defined,
and we use two of these for goal and saved puzzle images.

The "goal" keyword is followed by a image of the puzzle goal, row-by-row
with 1's for blacks and 0's for whites. Any other characters (usually
whitespace) are ignored. If prefer human readability to adherence to
Simpson's spec, you could enter the grid for a 5x10 puzzle like:

goal
01100
01101
00101
01110
10100
10100
00110
01010
01011
11000

or you can use the less readable, but more compliant format of:

goal 01100011010010101110101001010000110010100101111000
or
goal "01100011010010101110101001010000110010100101111000"

Saved grids can be entered in the same way, using the keyword "saved"
instead of "goal" and with '?' characters marking cells that are unknown,
'0' for white and '1' for black. There can be multiple saved solutions
in a file. The -s flag can be used to select one to start from.

These files lack any sort of "magic number" at the beginning, and so
pbnsolve is not generally able to identify them by sniffing the file
content.

This is the best option to use if you can't use the xml format, but
it doesn't support color puzzles.

OLSAK .MK FORMAT
Suffix: "mk"
Contains: clues
Color: black & white only

This is a very simple format. Here's an example:

10 5
2
2 1
1 1
3
1 1
1 1
2
1 1
1 2
2
#
2 1
2 1 3
7
1 3
2 1

The first two numbers are height and width. The next numbers, up to the
'#' are row clues. The rest are column clues. As in the .non files, a
blank clue is indicated by a line with just a zero on it, but pbnsolve
will also treat blank lines as blank clues.

OLSAK .G FORMAT
Suffix: "g"
Contains: clues
Color: multicolor

This rather complex format is documented inside the sample files that
come with the olsak solver. The format can support triddlers and triangle
puzzles, but pbnsolve can't read those variations.

Pbnsolve will not be able to automatically identify these files unless
they have a .g suffix or the -fg flag is given.

JAKUB WILK'S .NIN FORMAT
Suffix: "non"
Contains: clues
Color: black & white only
Documentation: http://jwilk.nfshost.com/software/nonogram.html

Pretty much the same as the .mk format, except the height and width
values are given in the opposite order, and there is no '#' between the
row clues and the column clues.

Pbnsolve cannot "sniff" the format of these files. If you give it one
without specifying the format, it will probably guess that it is an "mk"
file and fail miserably.

MESHCHERYAKOV AND SUKHOY'S .CWD FORMAT
Suffix: "cwd"
Contains: clues
Color: black & white only

Alexander Meshcheryakov's QNonograms program and Vladimir Sukhoy's C++
solver both use the "CWD" format, so I suppose it is something of a
Russian standard. Unfortunately, the ".cwd" filename suffix is a poor
choice since a standard file format for crossword puzzles uses that
suffix too.

This is another variation on the MK format, differing only in that the
height and width are on separate lines, and there is always a blank
line between the row clues and the column clues.

ROBERT BOSCH'S LP SOLVER FORMAT
Suffix: "lp"
Contains: clues
Color: black & white only
Documentation: http://www.oberlin.edu/math/faculty/bosch/pbn-page.html

This is a file format with little to recommend it. Basically a wordier
version of "mk" or "nin". So why'd we bother to support? Dunno.

NETPBM-STYLE PBM FILES
Suffix: "pbm"
Contains: goal, clues computed by pbnsolve
Color: black & white only
Documentation: http://netpbm.sourceforge.net/doc/pbm.html

This is a 2-color bitmap image format used by the netpbm package, which
includes tools to convert just about any other image format (GIF, PNG,
JPEG, etc) to this one. We should probably use libnetpbm to read it,
but we don't. We can take plain or raw files. The plain files are
pretty danged easy to generate from other programs, especially since
pbnsolve doesn't care about the 70 characters-per-line limit.

Some PBM files can contain multiple images. It would be good if the
-n flag could be used to select which one to solve, but this hasn't
been implemented. We always solve the first one.