Genezzo::Block::RDBlock.pm - Row Directory Block tied hash class. A class that lets you treat the contents of a block (byte buffer) as a hash.
Note: This implementation is almost, but not quite, a pushhash. The push hash implementation is Genezzo::Row::RSBlock. It also forms the basis of a tied array in Genezzo::Block::RDBArray.
use Genezzo::Block::RDBlock; use Genezzo::Block::Std; local $Genezzo::Block::Std::DEFBLOCKSIZE = 500; my $buff = "\0" x 500; # construct an empty byte buffer my %tied_hash = (); my $tie_val = tie %tied_hash, 'Genezzo::Block::RDBlock', (refbufstr => \$buff); # pushhash style # (note that the "PUSH" pseudo key is not supported)... my $newkey = $tie_val->HPush("this is a test"); # or array style, your choice my $pushcount = $tie_val->PUSH(qw(push lots of data)); $tied_hash{$newkey} = "update this entry"; # a hash that supports array style FETCHSIZE my $getcount = $tie_val->FETCHSIZE(); # Note: not HCount
RDBlock is the basis for persistent tied hashes, pushhashes, and tied arrays. After the hash is tied to the byte buffer, the buffer can be written to persistent storage. The storage is designed such that inserts/appends/pushes are fairly efficient, and deletes are inexpensive. The pctfree/pctused parameters allow some tuning to reserve space in the buffer for updates that "grow" existing values. Updates that do not change the packed size of data are about as efficient as insert/appends -- just the cost to copy your bytes into the buffer -- but updates that do change the size of stored values can require a large amount of byte shifting to open up storage space. Also, the buffer does not grow to accomodate large values. Wrapper classes are necessary to specify mechanisms for packing complex data structures and techniques to split objects across multiple buffers.
The structure and techniques for the Row Directory Block are described in Chapter 14, "The Tuple-Oriented File System", of "Transaction Processing: Concepts and Techniques" by Jim Gray and Andreas Reuter, 1993.
A tuple is a collection of values -- in the standard vernacular you would call it a "row" in a database. The refbufstr argument to the hash constructor is a "block", a fixed-size contiguous buffer of bytes. When you write (STORE) a value into the RDBlock hash, it writes an entry into the block as a byte string, and reads (FETCH) work in an analogous fashion.
STORE
FETCH
The RDBlock data structures refer to stored values as "rows", but the basic STORE and FETCH only understand how to store and retrieve individual byte strings. Wrapper classes for RDBlock must marshall/unmarshall (Freeze/Thaw) between simple strings and more complex data structures.
The block has some header and footer information, plus a row directory, a data structure that records the offsets, extents, and status information of the stored row data. While the physical location of row data in a block may change as other rows are added, deleted or modified, the row keeps the same hash key.
Each row has an associated "status" bitfield, which is some combination of the following values:
set if row is deleted. Deleted rows are simply marked as deleted, but the physical storaged is not immediately recouped.
set for data rows, unset for metadata. All information stored via the standard public interfaces is data. You can manipulate the private interfaces to store "metadata", additional rows that describe, for example, block contents, transaction information, or data relationships, but are invisible to the public interfaces. By convention, row 0 is always a metadata row.
set if row is locked. Not used in this base class -- provided for subclasses that must supply and maintain the appropriate metadata to identify locker and transaction information.
set if the stored value is the very first part of a row.
set if the stored value is the very last part of a row. If STORE writes a complete value it sets both head and tail to true. The base class only writes rows that fit in a single block, so both head and tail are always set.
These flags are useful if you wish to write subclasses with rows that span multiple blocks. Neither head nor tail is set if only the middle section of a multi-part row is stored.
If you supply STORE with a value of undef, it writes a marker for a zero-length string and sets this flag. FETCH will correctly return an undef.
Note: When packing more complex data structures, make sure to use an encoding that distinguishes between undefs and zero-length strings. A simple scheme for packing an array of strings is to prefix the packed array with a bitstring that specifies which entries are null.
RDBlock support all standard hash operations, with the exception that you cannot create or insert a user key -- you must push new entries and use the generated key or basic iteration to retrieve your data.
It also supports three additional public methods: an array style PUSH and FETCHSIZE, plus a PushHash style HPush. Note that these methods are associated with the tie value (i.e. the blessed ref for the RDBlock class), not the tied hash. Finally, it has five "private" methods that may be of use in constructing subclasses: push_one, packdeleted, offset2hkey, lastkey, prevkey
PUSH
FETCHSIZE
PUSH appends the list to the end of the hash and returns the number of items it pushed.
Returns the total number of valid, undeleted data items in the hash.
HPush returns the new key for each pushed value. It only accepts a single argument, not a list.
my $newkey = $tie_val->HPush("this is a test");
Note that there is not a corresponding "pop" operation.
DATAROW, RowStat
The storage mechanism uses network longs (32 bits?) to describe the lengths of rows and offsets within the block. (That seems pretty large -- maybe it should use shorts to restrict blocksize and row piece length to 64K? Or init should take an optional module name for block type that lets us vary the row directory, header and footer sizing).
Jeffrey I. Cohen, jcohen@genezzo.com
perl(1).
Copyright (c) 2003, 2004 Jeffrey I Cohen. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
Address bug reports and comments to: jcohen@genezzo.com
For more information, please visit the Genezzo homepage at http://www.genezzo.com
To install Genezzo::Util, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Genezzo::Util
CPAN shell
perl -MCPAN -e shell install Genezzo::Util
For more information on module installation, please visit the detailed CPAN module installation guide.