Jeffrey Cohen
and 1 contributors

NAME - Row Source TABle tied hash class.


 use Genezzo::Row::RSTab;

 # see -- implementation and usage is tightly tied
 # to genezzo engine...

 # make a factory for rsfile
 my $fac2 = make_fac2('Genezzo::Row::RSFile');
 my %args = (
             factory   => $fac2,
             # need tablename, bufcache, etc...
             tablename => ...
             tso       => ...
             bufcache  => ...

  my %td_hash;
  $tie_val = 
    tie %td_hash, 'Genezzo::Row::RSTab', %args;

 # pushhash style 
 my @rowarr = ("this is a test", "and this is too");
 my $newkey = $tie_val->HPush(\@rowarr);

 @rowarr = ("update this entry", "and this is too");
 $tied_hash{$newkey} = \@rowarr;

 my $getcount = $tie_val->HCount();


RSTab is a hierarchical pushhash (see Genezzo::PushHash::hph) class that stores perl arrays as rows in a table, writing them into a block (byte buffer) via Genezzo::Row::RSFile and Genezzo::Block::RDBlock.


tablename (Required) - the name of the table
tso (Required) - tablespace object from Genezzo::Tablespace
bufcache (Required) - buffer cache object from Genezzo::BufCa::BCFile


Logically, a table is made of rows, and rows are vectors of columns. Physically (at least from an OS implementation viewpoint), a table is made up of blocks stored in files. The RSTab hierarchical pushhash (hph) uses an RSFile factory, though it could be constructed as an hph of arbitrary depth. The basic HPush mechanism takes an array, flattens it into a string, and pushes the string into one of the underlying blocks.

While the RSTab api is primarily intended as a row-based interface, it has some extensions to directly manipulate the underlying blocks. These extensions are useful for building specialized index mechanisms (see Genezzo::Index) like B-trees, or for supporting rows that span multiple blocks.

Basic PushHash

You can use RSTab as a persistent hash of arrays of scalars if you like. The arrays and scalars can be of arbitrary length (as long as they fit in your datafiles).

SQL DBI-style interface

RSTab is designed to efficiently support prepare/execute/fetch operations against tables. What distinguishes this API from a standard hash is that the "prepare" operation generates a custom, stateful iterator that understands filters and range selection. A filter is simply a predicate which is applied to every row -- rows which pass are returned to the caller, and rows which fail are "filtered out". Range selection is somewhat similar, with the notion of start and stop keys -- the iterator only returns the rows which are restricted to a certain range of values. In general, range selection is driven off a separate indexing mechanism that positions the fetch to specifically retrieve the range in an efficient manner, versus fetching all rows and filtering rows outside the range.

HPHRowBlk - Row and Block operations

HPHRowBlk is a special pushhash subclass with certain direct block manipulation methods. One very useful function is HSuck, which provides support for rows that span multiple blocks. While the standard HPush fails if a row exceeds the space in a single block, the HSuck api lets the underlying blocks consume the rows in pieces -- each block "sucks up" as much of the row as it can. The RSTab HPush is re-implemented on top of HSuck to support large rows.

Counting, Estimation, Approximation

RSTab has some support for count estimation, inspired by some of Peter Haas' work (Sequential Sampling Procedures for Query Size Estimation, ACM SIGMOD 1992, Online Aggregation (with J. Hellerstein and H. Wang), ACM SIGMOD 1997 Ripple Joins for Online Aggregation (with J. Hellerstein) ACM SIGMOD 1999). It could use support for confidence intervals, so drop me a line if you understand Central Limit Theorem, Hoeffding and Chebyshev inequalites. Knowledge of change-points and time-series is also a plus.


RSTab support all standard hph hierarchical pushhash operations, with the extension that it manipulates arrays of scalars, not individual scalars.





rownum filter support to move to separate package
$href: remove - need a dict function to return allfileused via tso
HSuck: need a way to specify packing method
HSuck: fix trailing zero replacement
NextCount: fix quitloop
localPush/Store: qualify length packstr as percentage of blocksize (1/3?)
localStore: race condition on rowstat
localFetchDelete: frag flag info, delete status. Could express this function as a generalized "RowSplice" (as distinct from RDBlkA::HSplice, which is a block splice operator). Would need be able to splice based upon column number/array offset, as well as substring byte offset -- the inverse functionality of PackRow2/HSuck
DBI - support Bind and projection (returning only certain specified columns, versus all columns)
_init: change to use TSTableAFU support versus href->{filesused}
need support for constraints that "mutate" supplied values, e.g. manipulate numeric precision or supply default values for columns. Also need support for foreign keys in delete.


Jeffrey I. Cohen,


Genezzo::PushHash::HPHRowBlk, Genezzo::PushHash::hph, Genezzo::PushHash::PushHash, Genezzo::Tablespace, Genezzo::Row::RSFile, Genezzo::Row::RSBlock, Genezzo::Block::RDBlock, Genezzo::BufCa::BCFile, Genezzo::BufCa::BufCaElt, perl(1).

Copyright (c) 2003, 2004, 2005 Jeffrey I Cohen. All rights reserved.

    This program is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA

Address bug reports and comments to:

For more information, please visit the Genezzo homepage at