Alberto Attilio Reggiori

NAME

Data::MagicTie - This module implements an adaptor like Perl TIE interface over hash and array that support BLOBs, delegation, duplicate keys, locking and storage splitting

SYNOPSIS

        use Data::MagicTie;

        my $hash = tie %a,'Data::MagicTie'; #in-memory hash with duplicates and delegation support
        my $array = tie @a,'Data::MagicTie'; #in-memory array with duplicates and delegation support
        my $hash = tie %a,'Data::MagicTie','test',( Style => "DB_File", Split => 7, Mode => 'r'); #query 7 dbs in one
        my $hash = tie %a,'Data::MagicTie','test',( Split => 1 ); #normal hash
        my $hash = tie %a,'Data::MagicTie','test',( Style => "BerkeleyDB"); #sleepycat-ish :-)
        my $hash = tie %a,'Data::MagicTie','test',( Style => "DBMS", Host => 'me.jrc.it'); #cool way

        $a{mykey} = 'myvalue'; #store
        $a{myspecialkey} = [ {},'blaaa',[],[ { a => b, c => d }] ]; # ref store
        my $b = $a{mykey}; #fetch
        #iterator
        while (($k,$v) = each %a) {
                my $c = $v;
        };
        #clear
        %a=();

        #basic delegation model - first match %a then %b
        my $hash1 = tie %b,'Data::MagicTie','test1',(Style => "DB_File");
        $hash1->set_parent($hash);
        print $b{mykey}; # looks up in %a :)
        untie %b;
        untie %a;

        #duplicates
        my $hash = tie %a,'Data::MagicTie','test',( Style => "DB_File", Split => 7, Duplicates => 1); #7 dbs + duplicates

        $a{mykey} = 'myvalue'; #store
        $a{mykey} = [ {},'blaaa',[] ]; #ref store

        #iterator
        my $val;
        foreach(keys %a) {
                print $_=$a{$_}."\n"; # either scalars or refs
        };

        $hash->del_dup('mykey','myvalue');
        $hash->del_dup('mykey',[ {},'blaaa',[] ]);

DESCRIPTION

Perl provides two basic ways to model pluralities: hash and arrays. The perltie interface allows to easily map such data structuring constructs to databases and storages such as BerkeleyDB key/value-ed databases. Most of implementations existing today do provide duplicate keys, locking and BLOB support, but they are not almost integrated; most of the times the use of such features are not transparent at all to the end-user. In addition, most packages using the DB_File module do not provide a way to split up the storage over several files to scale up for large databases (i.e. most DB files get too big and inefficient to process - NOTE: this is not longer true with new generation Sleepycat BerkeleyDB implementations).

The Data::MagicTie module provides an integrated and omogenuous interface over hashes and arrays that support BLOBs, delegation, duplicate keys, locking and storage splitting; value lists can either be stored as an in-memory data structure, or a local or remote BerkeleyDB file. The module acts as an adaptor over actual implementations of Generic Data Storages (GDSs) such as Data::MagicTie::Array(3), Data::MagicTie::Hash(3), DBMS(3), DB_File(3) and BerkeleyDB(3). By default Data::MagicTie assumes an in-memory data structure model. NOTE: a user would decide to use in-memory Data::MagicTie implementation over normal Perl hash/array to obtain duplicate keys and delegation support :)

The values can be either strings or in-memory data structures (BLOBs) - see Storable(3); each tie database can then be splitted up on several files for eccifency and reduce the size of the actual database files. More, for query purposes only, tie operations can be "chained" to transparently access different databases; such a chain feature does not need any additional field in the database, but it is just using in-memory Perl OO methods to delegate read operations (FETCH, EXISTS, FIRSTKEY, NETXKEY). I.e. a look up for a key or value in a database ends up in a read operation in the current database or in one of its "delegates".

Each atomic operation using the Perl operators actually trigger either in-memory, local or remote database lookups and freeze/thaw operations on values. Perl iteration constructs such as each, keys and values can then be used to iterate over the content of a tied database; when the file is splitted over several single files the module iterates over the whole set of files. Even when a parent (delegate) is set for a database these operators allow to scan the whole set of storages (note: this feature might be not efficent over large databases).

By using such a Perl TIE model is becoming easy to write simple "cache" systems for example using the Apache Web server and mod_perl. This could really important for RDF storages and cumbersome and cpu-consuming queries - see RDFStore::Model(3)

CONSTRUCTORS

The following methods construct/tie Data::MagicTie databases and objects:

$db_hash = tie %b, 'Data::MagicTie' [, %whateveryoulikeit ];
tie %b to a MagicTie database. The %whateveryoulikeit hash contains a set of configuration options about how and where store actual data. Possible options are the following:
Name

A string identifing the name of the database; this option make sense only for persistent storages such as DB_File(3), BerkeleyDB(3) or DBMS(3).

Style

A string identifing if the database is going to be DB_File(3), BerkeleyDB(3) or DBMS(3). Possible values are 'DB_File', 'BerkeleyDB' or 'DBMS'. By setting DBMS here the database is going to be stored on a remote DBMS(3) server. Default is to use an in-memory storage using Data::MagicTie::Array(3), Data::MagicTie::Hash(3).

Split

An integer about how many files to split around the database. Default to 1 (normal perltie behaviour). Please note that set a number too high here might exceed you operating system MAX filedescriptors threshold (see man dbmsd(8) and DBMS(3) if installed on your system). Note that this option is ignored for default in-memory style.

Mode

A string which value can be 'r' (read only), 'w' (write only) or 'wr' (read/write). Default mode is 'rw'. This option obviously does make sense only for persistent databases such as DBMS(3), Berkeley_DB(3) or DB_File(3). Write mode forces the creation of the database. Open read only a new database fails. Internally the module maps these strings to low level Fcntl and BerkeleyDB constants such as O_CREAT, O_RDWR, O_WRONLY, O_RDONLY, DB_CREATE and DB_RDONLY.

Shared

This option allows to tie a Data::MagicTie GDS to an another existing Data::MagicTie GDS of the same type (hash/array) and delegate all read operations to the underling object (copy on-read); any write operation will call the copyOnWrite method and will make a copy of the secondary database (copy on-write) over the input one and reset the Shared option. Please note such copy on-write could be really expensive for memory consumation and CPU cycles for in-memory databases, bear in mind what you are copying while doing so!. Before copying the database the input GDS is actually tied and created using the original options passed by the user. By using the copyOnWrite method the user can break/interrupt the sharing and copy the data across them; if a list of values is passed to the method only those specific keys are actually copied from the first to the second GDS (see below). By default the mothod copy the whole content across.

Example

$a = tie %a, "Data::MagicTie",( Name => 'secondary' ); $a{test}='value'; $a{'test me please'}='value'; $a{'another test'}='value';

$b = tie %b, "Data::MagicTie",( Name => 'primary', Shared => $a); print $b{test}; # prints 'value'

$b{test}='newvalue'; #reset the Shared option, tie %b to a GDS named 'primary' and copy the content accross

#or the user could also... $b->copyOnWrite(); # to stop the sharing and copy the whole content across

# break sharing and copy 'test' and 'another test' across :) $b->copyOnWrite('test','another test');

untie %a; untie %b;

Tie a GDS to another one sitting on a disk or remote DBMS database allow the user to easily share copies of data. As soon as a copy on-write is over the Shared option is reset to NULL and the input GDS is becoming completely independent from the secondary one.

This option is an alternative way to manage delegates but in a more complicated and tricky way. The canonical delegation model provided by the get/set/reset parent methods below always require to run the operation (method invocation) on the current database before passing through to the underling model, while by using the Shared option the interaction is directly with the underling layer. In a near future this new way of managing duplicates could replace the current model :)

Duplicates

This is in integer flag to tell to the Data::MagicTie module whether or not use the BerkleyDB (>1.x) library code to handle duplicate keys. By default no duplicates are used. This option works best for DB_File and BerkeleyDB styles hash tables while for all the other cases (arrays and DBMS style) the Storable(3) module is actually used to mimic duplicate keys behaviour by storing values as arrays. Please note that such a solution does require an addional FETCH operation for each STORE and is not atomic and fault tolerant...yet :)

Host

This option is only valid for DBMS style and tells to the system which is the IP address or machine name of the DBMS(3) server. Default is 'localhost'. See man dbmsd(8)

Port

This option is only valid for DBMS style and tells to the system which is the TCP/IP port to connect to for the DBMS protocol. Default is '1234'. See man dbmsd(8)

$db_array = tie @b, 'Data::MagicTie' [, %whateveryoulikeit ];

Tie @b to a MagicTie database. The %whateveryoulikeit hash is the same as above.

METHODS

Most of the method are common to the standard perltie(3) interface (sync, TIEHASH, TIEARRAY, FETCH, STORE, EXISTS, FIRSTKEY, NEXTKEY, CLEAR, DELETE, DESTROY)

get_Options()

Return an hash reference containing all the major options plus the directory and filename of the database. See CONSTRUCTORS

In addition Data::MagicTie provides additional method that allow to manage a simple delegation or pass-through model; delegation happen just for read methods such as FETCH, EXISTS, FIRSTKEY, NEXTKEY.

Canonical delegation model

set_parent($ref)

Set the parent delegate to which forward read requests. $ref must be a valid Data::MagicTie blessed Perl object, othewise the delegate is not set. After this method call any FETCH, EXISTS, FIRSTKEY or NEXTKEY invocation (normally automagically called by Perl for you :-) starts up a chain of requests to parents till the result has been found or undef.

get_parent()

Return a valid Data::MagicTie blessed Perl object pointing to the parent of a tied database

reset_parent()

Remove the parent of the database and the operations are back to normal.

Data::MagicTie provides also equivalent methods to the DB_File module to manage duplicate keys - see DB_File(3) :

Duplicates

get_dup($key)

This method allows to read duplicate key values. In a scalar context the method returns the number of values associated with the key, $key. In list context, it returns all the values which match $key. Note that the values will be returned in an apparently random order. In list context, if the second parameter is present and evaluates TRUE, the method returns an associative array. The keys of the associative array correspond to the values that matched the key $key and the values of the hash are a count of the number of times that particular value occurred.

del_dup($key,$value)

This method deletes a specific key/value pair.

find_dup($key, $value)

This method checks for the existence of a specific key/value pair. If the pair exists, the cursor is left pointing to the pair and the method returns 0. Otherwise the method returns a non-zero value.

EXAMPLES

Canonical delegation model howto
 use Data::MagicTie;

 my $hash = tie %a,'Data::MagicTie','test',(Style => "DB_File");
 my $hash1 = tie %b,'Data::MagicTie','test1',(Style => "DB_File");
 my $hash2 = tie %c,'Data::MagicTie','test2',(Style => "DB_File");

 for (1..10) {
        $a{"A".$_} = "valueA".$_;
        $b{"B".$_} = "valueB".$_;
        $c{"C".$_} = "valueC".$_;
 };

 #basic delegation model - first match %a then %a1 then %2
 $hash->set_parent($hash1);
 $hash1->set_parent($hash2);
 print $a{B3}; # looks up in %b
 print $a{C9}; # looks up in %c

 #I think this one is much cooler :->
 my $hash3 = tie %d,'Data::MagicTie','test3',( Style -> "DBMS" );
 my $hash4 = tie %e,'Data::MagicTie','test4',( Style => "BerkeleyDB" );

 for (1..10) {
        $d{"D".$_} = "valueD".$_;
        $e{"E".$_} = "valueE".$_;
 };

 #...and then use local or remote databases transparently
 $hash2->set_parent($hash3);
 $hash3->set_parent($hash4);
 print $a{D1}; # really the Perl way of doing ;-)
 print $a{E1},"\n";

 #iterator
 while (($k,$v) = each %a) {
        print $k,"=",$v,"\n";
 };

 undef $hash;
 untie %a;
 undef $hash1;
 untie %b;
 undef $hash2;
 untie %c;
 undef $hash3;
 untie %d;
 undef $hash4;
 untie %e;

BUGS

        - The current implementation of TIE supports only the TIEHASH and TIEARRAY interfaces.
        - DBMS style does not support TIEARRAY yet.
        - Data::MagicTie ARRAY support is not complete (FETCHSIZE at least should be added) - see perltie(3)
        - a well-known problem using BLOBs is the following:
                
                tie %a,"Data::MagicTie","test";
                $a{key1} = sub { print "test"; }; # works
                $a{key2} = { a => [ 1,2,3], b => { tt => [6,7],zz => "test1234" } }; # it works too
                $a{key3}->{this}->{is}->{not} = sub { "working"; }; #does not always work

        The problem seems to be realated to the fact Perl is "automagically" extending/defining
        hashes (or other in-memory structures). As soon as you start to reference a value it
        gets created "automatically" :-( 
        E.g.
                $a = {};
                $a->{a1} = { a2 => [] };

                $b->{a1}->{a2} = []; # this is the same of the two lines above

        In the Data::MagicTie realm this problem affects the Storable freeze/thaw method results.
        Any idea how to fix this?

SEE ALSO

perltie(3) Storable(3) DBMS(3) DB_File(3) BerkeleyDB(3)

AUTHOR

Alberto Reggiori <areggiori@webweaving.org>

You can send your postcards and bugfixes to

Alberto Reggiori Via Giacomo Puccini 16 - 21014 Laveno Mombello (VA) ITALY

9 POD Errors

The following errors were encountered while parsing the POD:

Around line 1636:

'=item' outside of any '=over'

Around line 1708:

You forgot a '=back' before '=head1'

You forgot a '=back' before '=head1'

Around line 1713:

'=item' outside of any '=over'

Around line 1720:

You forgot a '=back' before '=head2'

Around line 1741:

You forgot a '=back' before '=head2'

Around line 1743:

'=item' outside of any '=over'

Around line 1756:

You forgot a '=back' before '=head1'

Around line 1758:

'=item' outside of any '=over'

Around line 1809:

You forgot a '=back' before '=head1'