construct comparison/equality callbacks
my $cmp1 = sub { my ($k1, $k2) = @_; # NOTE: use "spaceship" (-1,0,1) comparison with # short-circuit OR (which returns 0 or VALUE, not 0 or 1) # to perform multi-column key comparison # a la Schwartzian Transform return ( ( ($k1->[0] <=> $k2->[0]) || ($k1->[1] <=> $k2->[1])) == -1 ); }; my $eq1 = sub { my ($k1, $k2) = @_; return (($k1->[0] == $k2->[0]) && ($k1->[1] == $k2->[1]) ); };
Genezzo::Index::bt2 - basic btree
A btree built of row directory blocks.
use Genezzo::Index::bt?; my $tt = Genezzo::Index::btree->new(); $tt->insert(1, "hi"); $tt->insert(7, "there");
This btree algorithm is a bottom-up implementation based upon ideas from Chapter 16 of "Algorithms in C++ (third edition)", by Robert Sedgewick, 1998 and Chapter 15, "Access Paths", of "Transaction Processing: Concepts and Techniques" by Jim Gray and Andreas Reuter, 1993. The pedagogical examples use a fixed number of entries per node, or fixed-size keys in each block, but this implementation has significant extensions to support variable numbers of variably-sized keys in fixed-size disk blocks, with the associated error handling, plus support for reverse scans.
This package supports a constructor "new", plus standard b-tree methods like insert, delete, search.
The "new" constructor takes many arguments, but they are all optional. If none are specified, the constructor will allocate 100 blocks of the default size for a b-tree. The default assumption is to support scalar string keys with a scalar string values. The tree will have a maximum of 50 entries per node.
The maximum number of entries in a node. If set to zero, the insert will pack as many entries as space allows in each node
The constructor will allocate a private buffer cache for the b-tree of up to the number of blocks specified. If numblocks=0, no cache is created. In this case, the user must create a subclass to overload the make_new_block and getarr methods.
The size of each block in the b-tree
The key type is either a single scalar "c" (for char) or "n" (for number), or a ref to an array of "c" and/or "n" values. If key_type is specified, bt2 finds or constructs the appropriate compare/equals and pack/unpack functions, overriding any user-supplied arguments. If key type is not specified, bt2 processes the insert keys as a scalar strings.
Supply methods to compare your key. This package contains special comparison methods for numeric and multi-column keys, and their associated packing functions.
"Packing" functions convert key/value pairs to and from a byte representation which gets stored in the nodes of the b-tree. The b-tree package supports scalar keys and values by default. It also contains methods for multi-column keys with a single value.
special flag for Index Organized Tables, which means the "value" can be an array, not a scalar. This approach requires a couple extra checks in the branch nodes, since branches contain (key, nodeid) pairs, and leaves contain (key, array of values). Normally, indexes only have a scalar value: a nodeid or a rid.
Enforce uniqueness (no duplicates) at insertion time
Special case for building non-unique indexes where the "value" is null because it is already part of the key vector. In this usage, we construct a unique index (unique_key=1) where the key vector is the key columns *plus* the table rid, and the value is null. The key columns might be duplicates, but the addition of the rid guarantees uniqueness. The fetch is asymmetric: the table rid is returned as both the last key column and the value.
Q: Why not just have a non-unique index and store the rids as regular values? A: This approach clusters related rids, so index scans are more efficient and deletes are easier. Note that the basic index row physical storage is unaffected. Only the unpack function needs an extra argument to describe the number of key columns. Q: But doesn't the extra comparison for the rid column make inserts more expensive? A: Yes, but we're trading off insert performance against index scan performance. The workload of most database applications is typically dominated by selects, not inserts.
none
Jeffrey I. Cohen, jcohen@genezzo.com
perl(1).
Copyright (c) 2003, 2004 Jeffrey I Cohen. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
Address bug reports and comments to: jcohen@genezzo.com
For more information, please visit the Genezzo homepage at http://www.genezzo.com
To install Genezzo, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Genezzo
CPAN shell
perl -MCPAN -e shell install Genezzo
For more information on module installation, please visit the detailed CPAN module installation guide.