
The way this thing works in general:

First off, it creates the puzzle in it's start position in memory,
compresses the puzzle and stores that position on disk as the first
generation.

From then on the solver iterates through each position of the
previous generation.  With each position the solver tests all
actions that can be considered a single move (move one piece one or
more times orthogonally but not necessarily in the same direction each
time (ie right then up) to a different position).  For each new
position generated it tests to see if it has not been previously
generated (if you store _all_ positions of the move tree you only need
to search the current and previous 2 generations) and if it is found
to be unique then it is stored on disk.  Repeat this until the
solution is found.  

The path through the move tree can be found by starting at the solved
position and searching the previous generation for any position that
can get to the solution in 1 move.  Repeat this step with this new
position and the generation before that until you reach the puzzle's
start position.


The BIG optimization:

This is the way most solvers work but I've optimized this process by
doing things in batches.  Most importantly the duplicate testing is
performed in batches.  This is where bigmem and smallmem parameters
come into play.  The solver reads in a batch of positions off of disk
from the previous generation into a smallmem memory block.  These
positions are compressed individually and then additionally compressed
as a batch.  Optionally I could have Deflate'd them also (and yes all
three work well in concert better than any or or two individually,
more on the compression algorythms later).  The solver undoes the
batch compression (in place!) so it can access each position in a
random access fashion.  The (now only individually) compressed
positions are all the same length, already sorted, and take up to
smallmem of memory in total.

Now the solver walks through the list of positions in the first
smallmem block of memory and finds new positions, dropping them
(compressed) onto a binary tree in the bigmem sized memory block.
Obviously we wish to discard anything that would be duplicated on the
binary tree, but beyond that we don't do any other dup checking until
either we've gotten to the end of the positions in the previous
generation or we've filled up the bigmem memory block.  Because the
new positions are put onto a binary tree and the new positions are
typically very similar to their previous position, it reads through
the previous generations positions in memory in a non-linear
(bifurcating?) pattern to give the binary tree better shape.  

When it's time to dup check the new positions (either the end of the
generation or bigmem is full), the binary tree is converted in place
into a sorted doubly linked list.  The solver then goes through each
batch of positions from the current generation and the previous two
generations (loading them in turn into the second smallmem sized
memory block) and removes from the linked list any "new" positions
that duplicate a position from the previously saved batches of
positions.  This is made extremely efficient by the fact that each
batch of positions is sorted (remember?) as is the linked list of
"new" positions.

Once all duplicates have been removed we create a one or more smallmem
sized batches (in the second smallmem sized memory block) which are
then batch compressed and saved on disk.

Once the generation is completely gone through we do the same on the
newly created generation until we find the solution.


Finding new positions:

To find all of the possible child positions for a given position is
done by iterating through each movable piece, and if it can move...
record it's current position, and perform a depth first search for
every possible move that one piece can make, recording each new
position and stop recursing at any given point if the piece has gotten
here before.  I do a few precalculations to optimize this some but
that's not going to be covered here.


Compression:

Since the full move tree of a puzzle is being stored on disk and
things run faster the more positions that can be loaded into memory at
any given time, and because comparing two small structures is faster
than comparing to larger structures, most of the work on any given
position is done while it is compressed (everything but finding new
positions).

The compression algorythm I use goes something like this:  Each piece
_type_ is assigned a distinct value.  An empty space also has it's own
value for this purpose.  Walk sequentially across all of the locations
on the map of the puzzle ignoring walls.  As you run across a puzzle
piece (or empty space) that you have not previously recorded, record
the value for that type of piece.  When you have walked the entire
puzzle you will have one entry recorded for each piece and each space
on the puzzle.  Store this in a string with as few bits as necessary
per piece.  Ideally you can use Huffman coding for this, but I didn't
which typically costs me zero to two bytes per puzzle position over
the ideal.

Since each position is sorted in it's batch, I can take advantage of
this to save space in a batch compression.  Very simply all I do is
see how many of the initial bytes of the current compressed position
match those of the previous compressed position and store that number
followed by the non duplicated bytes.  I've found that this can be
gzip'd/deflate'd for additional gain but I've never implemented it.
