NAME

DirDB - use a directory as a persistence back end for (multi-level) (blessed) hashes (that may contain array references) (and can be advisorialy locked)

SYNOPSIS

  use DirDB;
  tie my %session, 'DirDB', "./data/session";
  $session{$sessionID}{email} = get_emailaddress();
  $session{$sessionID}{objectcache}{fribble} ||= new fribble;
  #
  use Tie::File; # see below -- any array-in-a-filesystem representation
                 # is supported
  push @{$session{$sessionID}{events}}, $event;

DESCRIPTION

DirDB is a package that lets you access a directory as a hash. The final directory will be created, but not the whole path to it. It is similar to Tie::Persistent, but different in that all accesses are immediately reflected in the file system, and very little is kept in perl memory. (your OS's file cacheing takes care of that -- DirDB only hits the disk a lot on poorly designed operating systems without file system caches, which isn't any of them any more.)

The empty string, used as a key, will be translated into ' EMPTY' for purposes of storage and retrieval. File names beginning with a space are reserved for metadata for subclasses, such as object type or array size or whatever. Key names beginning with a space get an additional space prepended to the name for purposes of naming the file to store that value.

As of version 0.05, DirDB can store hash references. references to tied hashes are recursively copied, references to plain hashes are first tied to DirDB and then recursively copied. Storing a circular hash reference structure will cause DirDB to croak.

As of version 0.06, DirDB now recursively copies subdirectory contents into an in-memory hash and returns a reference to that hash when a previously stored hash reference is deleted in non-void context.

As of version 0.07, non-HASH references are stored using Storable

As of version 0.08, non-HASH references cause croaking again: the Storable functioning has been moved to DirDB::Storable

Version 0.10 will store and retrieve blessed hash-references and blesses them back into what they were when they were stored.

Version 0.12 closes some directory handles which were not being closed automatically on cygwin, interfering with tests passing.

ARRAY tie-time argument

Version 0.11 allows storing and retrieval of references to arrays through taking an 'ARRAY' tie-time argument, which is an arrayref of the args used to tie the array before returning it. A token that is string-equal to 'DATAPATH' will be replaced with a place in the file system for the array tieing implementation to do it's thing. At this version, the default array implementation is

     ['Tie::File' => DATAPATH => recsep => "\0"]

but this may change, perhaps when a DirDB::Array package that gracefully handles references is devised. Forwards-compatibility is maintained by storing the array implementation details with each stored arrayref.

lock method (package DirDB::lock)

Version 0.11 also introduces a lock method that obtains an advisory mkdir lock on either a whole tied hash or on a key in it.

     tie %P, DirDB=>'/home/aurora/persistentdata';
     ...
     my $advisory_lock1 = tied(%P)->lock; # on the whole hash
     my $advisory_lock2 = tied(%P)->lock('birdy'); # on the key 'birdy'
     {
        my $advisory_lock3 = tied(%P)->lock(''); # on the null key 

these locks last until they are DESTROYed by the garbage collctor or until the release method is called on them.

        $advisory_lock1->release;
        release $advisory_lock2;
     };

croaking on permissions problems

DirDB will croak if it can't open an existing file system entity.

 tie my %d => DirDB, '/tmp/foodb';
 
 $d{ref1}->{ref2}->{ref3}->{ref4} = 'something'; 
 # 'something' is now stored in /tmp/foodb/ref1/ref2/ref3/ref4
 
 my %e = (1 => 2, 2 => 3);
 $d{e} = \%e;
 # %e is now tied to /tmp/foodb/e, and 
 # /tmp/foodb/e/1 and /tmp/foodb/e/2 now contain 2 and 3, respectively

 $d{f} = \%e;
 # like `cp -R /tmp/foodb/e /tmp/foodb/f`

 $e{destination} = 'Kashmir';
 # sets /tmp/foodb/e/destination
 # leaves /tmp/foodb/f alone
 
 my %g = (1 => 2, 2 => 3);
 $d{g} = {%g};
 # %g has been copied into /tmp/foodb/g/ without tying %g.
 

Pipes and so on are opened for reading and read from on FETCH, and clobbered on STORE.

The underlying object is a scalar containing the path to the directory. Keys are names within the directory, values are the contents of the files.

STOREMETA and FETCHMETA methods are provided for subclasses who which to store and fetch metadata (such as array size) which will not appear in the data returned by NEXTKEY and which cannot be accessed directly through STORE or FETCH. Currently one metadatum, 'BLESS' is used to indicate what package to bless a tied hashref into.

storing and retrieving blessed objects

blessed objects can now be stored, as long as their underlying representation is a hash. This may change. The root of a DirDB tree will not get blessed but all blessed hashreference branches will be blessed on fetch into the package they were in when stored.

storing and retrieving array references

at this version, Tie::File is used for an array implementation. The array implementation can be specified with an ARRAY tie-time argument, like so:

        use Array::Virtual;
        use DirDB 0.11;
        tie my %Persistent, DirDB => './data',
                ARRAY => ["Array::Virtual", DATAPATH => 0664];

RISKS

stale lock risk

"mkdir locking" is used to protect incomplete directories from being accessed while they are being written, and is now used as well for advisory locking. It is conceivable that your program might catch a signal and die while inside a critical section. If this happens, a simple

    find /your/data -type d -name '* LOCK*'

at the command line will identify what you need to delete.

Only the very end of the write operation is protected by the locking: during a write, other processes will be able to read the old data. They will also be able to start their own overwrites.

DirDB attempts to guarantee that written data is complete (not partial.)

DirDB does not attempt to guarantee atomicity of updates.

unexpected persistence

Untied hash references assigned into a DirDB tied hash will become tied to the file system at the point they are first assigned. This has the potential to cause confusion.

Tied hash references are recursively copied. This includes hash references tied due to being assigned into a DirDB tied hash.

EXPORT

None by default.

AUTHOR

David Nicol, davidnicol@cpan.org

Assistance

version 0.04 QA provided by members of Kansas City Perl Mongers, including Andrew Moore and Craig S. Cottingham.

LICENSE

GPL/Artistic (the same terms as Perl itself)

SEE ALSO

better read perltie before trying to extend this

DirDB::Storable uses Storable for storing and retrieving arbitrary types

DirDB::FTP provides complete DirDB function over the FTP protocol

Tie::Dir is concerned with accessing stat information, not file contents