##========================================================================
## POD DOCUMENTATION, auto-generated by podextract.perl

=pod

=cut

##========================================================================
## NAME
=pod

=head1 NAME

Tie::File::Indexed - fast tied array access to indexed data files

=cut

##========================================================================
## SYNOPSIS
=pod

=head1 SYNOPSIS

 ##========================================================================
 ## PRELIMINARIES
 
 use Tie::File::Indexed;
 
 ##========================================================================
 ## Tied Array access
 
 ##-- tie an array (uses files "data", "data.idx", and "data.hdr")
 my $filename = "data";
 tie(my @data, 'Tie::File::Indexed', $filename, %options) or die ...
 
 ##-- add some data
 $data[42] = 'blah';      # set an item
 $data[42] = 'blip';      # overwrite an item (really appends a new record)
 $data[24] = 'bonk';      # out-of-order storage
 print $data[42];         # retrieve & print a stored value
 
 ##-- tweak array size
 $n_items = @data;        # get number of stored records
 $#data -= 2;             # chop two records off the end
 
 #... push(), pop(), shift(), unshift(), and splice() should do What You Expect
 
 ##-- file operations
 $tied->unlink();               # unlink underlying files (tied access won't work after this!)
 $tied->rename($newname);       # rename underlying files using CORE::rename()
 $tied->move($newname);         # move underlying files using File::Copy::move()
 $copy = $tied->copy($newname); # copy underlying files using File::Copy::copy()
 
 ##-- advisory locking
 $tied->flock();                # get an advisory lock on $filename
 $tied->funlock();              # release our lock on $filename
 
 ##-- buffering and consolidation
 my $tied = tied(@data);  # get underlying object
 $tied->flush();          # flush underlying filehandles
 $tied->reopen();         # close and re-open underlying filehandles
 $tied->consolidate();    # remove gaps and stale values
 
 ##-- all done
 undef $tied;
 untie(@data);

=cut

##========================================================================
## DESCRIPTION
=pod

=head1 DESCRIPTION

The Tie::File::Indexed class provides fast tied array access to raw data-files
using an auxilliary packed index-file to store and retrieve the offsets and
lengths of the corresponding raw data strings as well as an additional header-file
to store administrative data, resulting in a constant and very small memory footprint.
Random-access storage and retrieval should both be very fast, and even
pop(), shift() and splice() operations on large arrays should be tolerably efficient,
since these only need to modify the (comparatively small) index-file.
No disk-space optimization is performed by default, and
frequent overwrites will cause the data-file to grow monotonically:
see L</consolidate> for a workaround.

The Tie::File::Indexed distribution also comes with several pre-defined
subclasses for transparent encoding/decoding of UTF8-encoded strings,
and complex data structures encoded via the L<JSON|JSON> or L<Storable|Storable>
modules. See L</SUBCLASSES> for details.

=cut

##----------------------------------------------------------------
## DESCRIPTION: Tie::File::Indexed: Constructors etc.
=pod

=head2 Constructors etc.

=over 4

=item new

 $tied = CLASS->new(%opts);
 $tied = CLASS->new($file,%opts);
 $tied = tie(@array, CLASS, $file, %opts);

Creates and returns a new Tie::File::Indexed object, in the third form tying it to
the Perl array @array.  Currently accepted %options:

 file   => $file,    ##-- file basename; uses files "${file}", "${file}.idx", "${file}.hdr"
 mode   => $mode,    ##-- open mode (fcntl flags or perl-style; default='rwa')
 perms  => $perms,   ##-- default: 0666 & ~umask
 pack_o => $pack_o,  ##-- file offset pack template (default='J')
 pack_l => $pack_l,  ##-- string-length pack template (default='J')
 bsize  => $bsize,   ##-- buffer size in bytes for index batch-operations (default=2**21 = 2MB)

When opening an existing file, administrative header-data is read from the header-file F<$file.hdr>,
which is written on close() if opened in write-mode.
Raw data records are read/written from/to the data-file F<$file>,
and their offsets and lengths are stored as packed integers in the index-file F<$file.idx>.
The options C<pack_o> and C<pack_l> control the pack-templates to use for the index-file;
the default pack templates use the 'J' pack format to pack offsets and lengths as Perl internal unsigned
integer values. See the entry for "pack" in L<perlfunc> for details.

=item defaults

 %defaults = CLASS_OR_OBJECT->defaults()

Default attributes for constructor; can be overridden by subclasses.

=item DESTROY

 undef = $tied->DESTROY();

Destructor implicitly calls close().

=back

=cut

##----------------------------------------------------------------
## DESCRIPTION: Tie::File::Indexed: Subclass API: Data I/O
=pod

=head2 Subclass API: Data I/O

=over 4

=item writeData

 $bool = $tied->writeData($data);

Write item $data to C<$tied-E<gt>{datfh}> at its current position.
After writing, C<$tied-E<gt>{datfh}> should be positioned to the first byte following the written item.
The object is assumed to be opened in write-mode.
The default implementation just writes C<$data> as a byte-string (C<undef> is written as the empty string).
Can be overridden by subclasses to perform transparent encoding of complex data.

=item readData

 $data_or_undef = $tied->readData($length);

Read item data of length C<$length> from C<$tied-E<gt>{datfh}> at its current position.
Can be overridden by subclasses to perform transparent decoding of complex data.

=back

=cut

##----------------------------------------------------------------
## DESCRIPTION: Tie::File::Indexed: Subclass API: Index I/O
=pod

=head2 Subclass API: Index I/O

=over 4

=item readIndex

 ($off,$len) = $tied->readIndex($index);
 ($off,$len) = $tied->readIndex(undef);

Reads an index-record from C<$tied-E<gt>{idxfh}>.
If C<$index> is C<undef>, read from the current position of C<$tied-E<gt>{idxfh}>,
otherwise reads the index record for item at logical index C<$index>,
which is assumed to exist in the array.
Returns offset and length in C<$tied-E<gt>{datfh}> of the item data,
or the empty list on error.

=item writeIndex

 $tied_or_undef = $tied->writeIndex($index,$off,$len);
 $tied_or_undef = $tied->writeIndex(undef, $off,$len);

Writes index-record for a logical item to C<tied-E<gt>{idxfh}>.
If C<$index> is C<undef>, writes at the current position of C<$tied-E<gt>{idxfh}>,
otherwise writes a record for the logical iundex C<$index>, creating one if it
doesn't already exist.
Returns the tied object on success, undef on error.

=item shiftIndex

 $tied_or_undef = $tied->shiftIndex($start,$n,$shift);

Moves C<$n> index records starting from C<$start> by C<$shift> positions (may be negative).
Operates directly on C<$tied-E<gt>{idxfh}>.
Doesn't change unaffected values.
Used by SPLICE() method.

=back

=cut


##----------------------------------------------------------------
## DESCRIPTION: Tie::File::Indexed: Object API: header
=pod

=head2 Object API: header

=over 4

=item headerKeys

 @keys = $tied->headerKeys();

Keys to save as header.

=item headerData

 \%header = $tied->headerData();

Data to save as header.

=item loadHeader

 $tied_or_undef = $tied->loadHeader();
 $tied_or_undef = $tied->loadHeader($headerFile,%opts);

Loads header from C<$headerFile>, by default C<"$tied-E<gt>{file}.hdr">.

=item saveHeader

 $tied_or_undef = $tied->saveHeader();
 $tied_or_undef = $tied->saveHeader($headerFile);

Saves header data to C<$headerFile>, by default C<"$tied-E<gt>{file}.hdr">.

=back

=cut

##----------------------------------------------------------------
## DESCRIPTION: Tie::File::Indexed: Object API: open/close
=pod

=head2 Object API: open/close

=over 4

=item open

 $tied_or_undef = $tied->open($file,$mode);
 $tied_or_undef = $tied->open($file);
 $tied_or_undef = $tied->open();

Opens underlying file(s) for use.

=item close

 $tied_or_undef = $tied->close();

Close any opened files, writes header if opened in write mode.

=item opened

 $bool = $tied->opened();

Returns true iff object is opened.

=item reopen

 $bool = $tied->reopen();

Closes and re-opens underlying filehandles.
Should cause a "real" flush even on systems without a working L<IO::Handle::flush()|IO::Handle/"$io-E<gt>flush"> method.

=item flush

 $tied_or_undef = $tied->flush();
 $tied_or_undef = $tied->flush($flushHeader);

Attempts to flush underlying filehandles using their C<flush()> method if available,
otherwise calls L<reopen()/reopen>.
If C<$flushHeader> is specified and true, also writes header file.

=back

=cut

##----------------------------------------------------------------
## DESCRIPTION: Tie::File::Indexed: Object API: file operations
=pod

=head2 Object API: file operations

=over 4

=item unlink

 $tied_or_undef = $tied->unlink();
 $tied_or_undef = CLASS_OR_OBJECT->unlink($file);

Attempts to unlink any underlying file(s) for the data-file $file.
Implicitly calls close() before unlinking.

=item rename

 $tied_or_undef = $tied->rename($newname);

Renames underlying files using L<CORE::rename()|perlfunc/"rename">.
Implicitly close()s and re-open()s C<$tied>,
which must be opened in write-mode.

=item copy

 $dst_object_or_undef = $tied_src->copy($dst_filename, %dst_opts);
 $dst_object_or_undef = $tied_src->copy($dst_object);

Copies underlying files using L<File::Copy::copy()|File::Copy/"copy">.
Source object must be opened.
Implicitly calls flush() on both source and destination objects
before and after the copy operation, respectively.
If a destination object is specified (2nd form), it must be opened in write-mode,
otherwise a new destination object will be created and returned.
You canB<NOT> use this method to convert between incompatible file formats
(e.g. Storable and JSON), but it should be faster than array assignment:

 tie(my @a, 'Tie::File::Indexed::JSON',     'a.tfx');
 tie(my @b, 'Tie::File::Indexed::Storable', 'b.tfx');
 tied(@a)->copy(tied(@b));                                # this won't work!
 @b = @a;                                                 # ... but this ought to

 tie(my @a2, 'Tie::File::Indexed::JSON', 'a2.tfx');
 @a2 = @a;                                                # slow element-wise copy
 tied(@a)->copy(tied(@a2));                               # ... fast bulk copy

=item move

 $tied_or_undef = $tied->move($newname);

Moves underlying files using L<File::Copy::move()|File::Copy/"move">.
Implicitly close()s and re-open()s C<$tied>,
which must be opened in write-mode.

=back

=cut

##----------------------------------------------------------------
## DESCRIPTION: Tie::File::Indexed: Object API: advisory locking
=pod

=head2 Object API: advisory locking

=over 4

=item flock

 $bool = $tied->flock();
 $bool = $tied->flock($lock);

Get an advisory lock of type C<$lock> (default=C<Fcntl::LOCK_EX>) on C<$tied-E<gt>{datfh}>, using perl's flock() function.
Implicitly calls flush() prior to locking.

=item funlock

 $bool = $tied->funlock();
 $bool = $tied->funlock($lock);

Unlock C<$tied-E<gt>{datfh}> using perl's flock() function; C<$lock> defaults to C<Fcntl::LOCK_UN>.

=back

=cut

##----------------------------------------------------------------
## DESCRIPTION: Tie::File::Indexed: Object API: buffering and consolidatation
=pod

=head2 Object API: buffering and consolidation

=over 4

=item consolidate

 $tied_or_undef = $tied->consolidate();
 $tied_or_undef = $tied->consolidate($tmpfile);

Consolidates file data: ensures that data in C<$tied-E<gt>{datfh}> are in index-order and contain no gaps or unused blocks.
The object must be opened in write-mode.
Uses C<$tmpfile> as a temporary file for consolidation (default=C<"$tied-E<gt>{file}.tmp">).

If you never overwrite data in your tied arrays, you probably won't need this method.
It can be useful to reduce the size of the associated data-file and/or optimize index-ordered access operations, since
(over)writing any existing array item causes a new record to be appended to the data-file.
Consider the following code:

 tie(my @data, 'Tie::File::Indexed', "data") or die ...  ##-- tie the array; data-file is empty
 $data[1] = 'bar';                                       ##-- data-file is now "bar"
 $data[0] = 'foo';                                       ##-- data-file is now "barfoo"
 $data[1] = 'baz';                                       ##-- data-file is now "barfoobaz"

Here, the element at index 0 ("foo") is stored "out-of-order", since its phyiscal location in
the data-file (2nd record) does not correspond to its logical location in the array (1st element).
Further, the 1st record in the data-file ("bar") is unused, since it was overwritten
by the value C<"baz"> stored in the 3rd data-file record.  The index-file takes care of resolving
the offset and length of the logical array elements (so that e.g. C<$data[1] eq 'baz'> rather
than C<'bar'>), but no effort is made to re-use unreferenced material in the data-file,
so that the original value for C<$data[1]> is effectively orphaned.  Calling C<consolidate()>
at this point ensures that the disk-files are logically sorted and contain no unreferenced
material:

 tied(@data)->consolidate();                             ##-- data-file is now "foobaz"

This method is never implicitly called, so if you need it, you'll have to call it yourself.

=back

=cut


##========================================================================
## END POD DOCUMENTATION, auto-generated by podextract.perl

=pod

=cut

##========================================================================
## SUBCLASSES
=pod

=head1 SUBCLASSES

The default data storage methods in Tie::File::Indexed are suitable for
simple perl scalars (integers, floating-point numbers, or simple byte-strings).
The Tie::File::Indexed distribution comes with several pre-defined subclasses
for storing other types of data as well.  Currently, the following pre-defined
subclasses are supported:

=over 4

=item L<Tie::File::Indexed::Utf8|Tie::File::Indexed::Utf8>

Stores data records as UTF-8 encoded strings.
Useful if your data strings are expected to be encoded in UTF8.

=item L<Tie::File::Indexed::JSON|Tie::File::Indexed::JSON>

Stores data records as JSON strings using the L<JSON|JSON> module.
Useful if you need to store complex data structures and simple scalars
in the same tied array.

=item L<Tie::File::Indexed::Storable|Tie::File::Indexed::Storable>

Stores data records in native binary format using L<Storable::nstore_fd()|Storable/"store_fd">.
Useful if you need to store only references (bless()ed or otherwise) to be used on the local machine.
Individual data records can be used directly with L<Storable::retrieve_fd()|Storable/"retrieve_fd">.

=item L<Tie::File::Indexed::StorableN|Tie::File::Indexed::StorableN>

Stores data records in portable "network" binary format using L<Storable::nstore_fd()|Storable/"nstore_fd">.
Useful if you need to store only references (bless()ed or otherwise) to be shared between machine architectures.
Individual data records can be used directly with L<Storable::retrieve_fd()|Storable/"retrieve_fd">.

=item L<Tie::File::Indexed::Freeze|Tie::File::Indexed::Freeze>

Stores data records in native binary format using L<Storable::freeze()|Storable/"freeze">.
Useful if you need to store only references (bless()ed or otherwise) to be used on the local machine.
Data-files are slightly smaller than those produced by L<Tie::File::Indexed::Storable|Tie::File::Indexed::Storable>,
but individual data records cannot be used directly with L<Storable::retrieve_fd()|Storable/"retrieve_fd">.

=item L<Tie::File::Indexed::FreezeN|Tie::File::Indexed::FreezeN>

Stores data records in portable "network" binary format using L<Storable::nfreeze()|Storable/"nfreeze">.
Useful if you need to store only references (bless()ed or otherwise) to be shared between machine architectures.
Data-files are slightly smaller than those produced by L<Tie::File::Indexed::Storable|Tie::File::Indexed::Storable>,
but individual data records cannot be used directly with L<Storable::retrieve_fd()|Storable/"retrieve_fd">.

=back

=cut

##========================================================================
## SUBCLASSES
=pod

=head1 CAVEATS

=head2 Monotonic growth and random access

No disk-space optimization is performed by default, and
frequent overwrites will however cause the data-file to grow monotonically:
every time a logical item is written to the array via the C<STORE()> method,
a new physical record is appended to the data-file, and the index-record
for the item is updated to point to the new record.
This is fine if you only insert elements in logical order (e.g. using C<push>)
and never overwrite elements which have already been stored.  Otherwise,
out-of-order elements may degrade performance for logical-order access
(e.g. via C<foreach>), since lots of random C<seek()> operations often
don't play nicely together with perl's buffering strategy and or the
underlying filesystem cache.  Overwriting elements is a bigger problem,
since overwrites cause the associated data-file to grow ever larger.
The L</consolidate> method is provided as a workaround for these
undesirable effects.  Future versions of this module may perform
some implicit on-the-fly disk-space optimization or consolidation,
although currently no such implicit optimization or consolidation is
performed: if you need to consolidate, do it yourself!

=cut


##======================================================================
## Footer
##======================================================================
=pod

=head1 AUTHOR

Bryan Jurish E<lt>moocow@cpan.orgE<gt>

=head1 COPYRIGHT AND LICENSE

Copyright (C) 2015 by Bryan Jurish

This package is free software; you can redistribute it and/or modify
it under the same terms as Perl itself, either Perl version 5.20.2 or,
at your option, any later version of Perl 5 you may have available.

=cut