Tree::RB::XS - Red/Black Tree implemented in C, with similar API to Tree::RB
NOTICE: This module is very new and you should give it some thorough testing before trusting it with production data.
use Tree::RB::XS qw/ :cmp :get /; my $tree= Tree::RB::XS->new; $tree->put($_ => $_) for 'a'..'z'; # store key/value, overwrite say $tree->get('a'); # get value by key, or undef $tree->delete('a'); # delete by key say $tree->get('a', GET_GT); # find relative to the key, finds 'b' $tree->delete('a','z'); # delete a range, inclusive $tree->delete($tree->min_node); # delete using nodes or iterators $tree->clear; # efficiently delete all nodes $tree= Tree::RB::XS->new(CMP_INT); # optimize for integer comparisons $tree->put(1 => "x"); $tree= Tree::RB::XS->new( compare_fn => 'float', # optimize for floating-point allow_duplicates => 1 ); $tree->put(rand() => 1) for 1..1000; $tree->delete(0, $tree->get_node_lt(.5)); $tree= Tree::RB::XS->new(CMP_MEMCMP); # optimize for byte strings $tree->put("test" => "x"); say $tree->min->key; # inspect the node objects say $tree->nth(0)->key; my $node= $tree->min; my $next= $node->next; $node->prune; # iterators for (my $i= $tree->iter; !$i->done; $i->step) { if ($i->key ...) { $i->value .... } } my $i= $tree->rev_iter; while (my $node= $i->next) { $node->prune if $node->key =~ ...; } my $i= $tree->iter; while (my @batch= $i->next_values(100)) { ... }
This module is a wrapper for a Red/Black Tree implemented in C. It's primary features over other search trees on CPAN are optimized comparisons of keys (speed), O(log N) node-by-index lookup (which allows the tree to act as an array), and the option to allow duplicate keys while preserving insertion order.
O(log N)
The API is almost a complete superset of Tree::RB with a few differences:
The get method in this module is not affected by array context, unless you request "compat_list_get".
get
resort is not implemented (would be lots of effort, and unlikely to be needed)
resort
Tree structure is not mutable via the attributes of Node, nor can nodes be created independent from a tree.
Many methods have official names changed, but aliases are provided for compatibility.
my $tree= Tree::RB::XS->new( %OPTIONS ); ...->new( $compare_fn );
If new is given a single parameter, it is assumed to be the compare_fn.
new
compare_fn
Options:
Choose a custom key-compare function. This can be the ID of an optimized function, a coderef, or the name of one of the optimized IDs like "int" for CMP_INT. See below for details.
CMP_INT
Whether to allow two nodes with the same key. Defaults to false.
Whether to enable full compatibility with Tree::RB's list-context behavior for "get". Defaults to false.
This is an internal detail that probably shouldn't have been exposed, and might be removed in the future. Choose one of the optimized compare_fn instead.
Specifies the function that compares keys. Read-only; pass as an option to the constructor.
This is one of: "CMP_PERL" (default), "CMP_INT", "CMP_FLOAT", "CMP_MEMCMP", "CMP_UTF8", "CMP_NUMSPLIT", or a coderef. CMP_INT and CMP_FLOAT are the most efficient, and internally store the key as a number. CMP_MEMCMP and CMP_UTF8 copy the key into an internal buffer, and offer moderate speed gains over CMP_PERL. CMP_PERL is Perl's own cmp operator.
CMP_FLOAT
CMP_MEMCMP
CMP_UTF8
CMP_PERL
cmp
If set to a coderef, it should take two parameters and return an integer indicating their order in the same manner as Perl's cmp. Beware that using a custom coderef throws away most of the speed gains from using this XS variant over plain Tree::RB. If speed is important, try pre-processing your keys in a way that allows you to use one of the built-in ones.
Patches welcome, for anyone who wants to expand the list of optimized built-in comparison functions.
Boolean, read/write. Controls whether "insert" will allow additional nodes with keys that already exist in the tree. This does not change the behavior of "put", only "insert". If you set this to false, it does not remove duplicates that already existed. The initial value is false.
Boolean, read/write. Controls whether "get" returns multiple values in list context. I wanted to match the API of Tree::RB, but I can't bring myself to make an innocent-named method like 'get' change its behavior in list context. So, by deault, this attribute is false and get always returns one value. But if you set this to true, get changes in list context to also return the Node, like is done in "lookup" in Tree::RB.
Tree::RB
The key-storage strategy used by the tree. Read-only; pass as an option to the constructor. This is an implementation detail that may be removed in a future version.
Returns the number of elements in the tree.
Get the root node of the tree, or undef if the tree is empty.
undef
Alias: root
root
Get the tree node with minimum key. Returns undef if the tree is empty.
Alias: min
min
Get the tree node with maximum key. Returns undef if the tree is empty.
Alias: max
max
Get the Nth node in the sequence from min to max. N is a zero-based index. You may use negative numbers to count down form max.
Alias: nth
nth
my $val= $tree->get($key); ...->get($key, $mode);
Fetch a value from the tree, by its key. Unlike "get" in Tree::RB, this returns a single value, regardless of list context. But, you can set compat_list_get to make get an alias for lookup.
lookup
Mode can be used to indicate something other than an exact match: "GET_EQ", "GET_EQ_LAST", "GET_LE", "GET_LE_LAST", "GET_LT", "GET_GE", "GET_GT". (described below)
Same as "get", but returns the node instead of the value. In trees with duplicate keys, this returns the first matching node. (nodes with identical keys are preserved in the order they were added)
Aliases with built-in mode constants:
my @values= $tree->get_all($key);
In trees with duplicate keys, this method is useful to return the values of all nodes that match the key. This can be more efficient than stepping node-to-node for small numbers of duplicates, but beware that large numbers of duplicate could have an adverse affect on Perl's stack.
Provided for compatibility with Tree::RB. Same as "get" in scalar context, but if called in list context it returns both the value and the node from "get_node". You can also use Tree::RB's lookup-mode constants of "LUEQUAL", etc.
my $old_val= $tree->put($key, $new_val);
Associate the key with a new value. If the key previously existed, this returns the old value, and updates the tree to reference the new value. If the tree allows duplicate keys, this will remove all but one node having this key and then set its value. Only the first old value will be returned.
Insert a new node into the tree, and return the index at which it was inserted. If "allow_duplicates" is not enabled, and the node already existed, this returns -1 and does not change the tree. If allow_duplicates is enabled, this adds the new node after all nodes of the same key, preserving the insertion order.
allow_duplicates
my $count= $tree->delete($key); ...->delete($key1, $key2); ...->delete($node1, $node2); ...->delete($start, $tree->get_node_lt($limit));
Delete any node with a key identical to $key, and return the number of nodes removed. If you supply two keys (or two nodes) this will delete those nodes and all nodes inbetween; $key1 is searched with mode GET_GE and $key2 is searched with mode GET_LE_LAST, so the keys themselves do not need to be found in the tree. The keys (or nodes) must be given in ascending order, else no nodes are deleted.
$key
$key1
GET_GE
$key2
GET_LE_LAST
If you want to delete a range *exclusive* of one or both ends of the range, just use the "get_node" method with the desired mode to look up each end of the nodes that you do want removed.
my $count= $tree->clear();
This is the fastest way to remove all nodes from the tree. It gets to destroy all the nodes without worrying about the tree structure or shifting iterators aside.
my $iter= $tree->iter; # from min_node ...->iter($from_key, $get_mode=GET_GE); # from get_node ...->iter($from_node); # from existing node
Return an iterator object that traverses the tree from min to max, or from the key or node you provide up to max.
Like iter, but the ->next and ->step methods walk backward to smaller key values, and the default $get_mode is "GET_LE_LAST".
iter
->next
->step
$get_mode
The sort key. Read-only, but if you supplied a reference and you modify what it points to, you will break the sorting of the tree.
The data associated with the node. Read/Write.
The previous node in the sequence of keys. Alias predecessor for Tree::RB::Node compat.
predecessor
Tree::RB::Node
The next node in the sequence of keys. Alias successor for Tree::RB::Node compat.
successor
The tree this node belongs to. This becomes undef if the tree is freed or if the node is pruned from the tree.
The left sub-tree.
The left-most leaf of the sub-tree. Alias min for Tree::RB::Node compat.
The right sub-tree.
The right-most child of the sub-tree. Alias max for Tree::RB::Node compat.
The parent node, if any.
0 = black, 1 = red.
The number of items in the tree rooted at this node (inclusive)
Remove this single node from the tree. The node will still have its key and value, but all attributes linking to other nodes will become undef.
Remove all children of this node, optionally calling a callback for each. For compat with "strip" in Tree::RB::Node.
Return sub-tree as list-of-lists. (array of arrays rather?) For compat with "as_lol" in Tree::RB::Node.
Shortcut for $node->tree->iter($node).
$node->tree->iter($node)
Shortcut for $node->tree->rev_iter($node).
$node->tree->rev_iter($node)
Iterators are similar to Nodes, but they hold a strong reference to the tree, and if a node they point to is removed from the tree they advance to the next node. (and obviously they iterate, where node objects do not)
The iterator references a "current node" which you can inspect the key and value of. You can call 'step' to move to a new current node, and you can call 'next' which returns the current node while switching the reference to the next node.
Note that if you avoid referencing the Node, and stick to the attributes and methods of the iterator, the tree can avoid allocating the Perl object to represent the Node. This gives a bit of a performance boost for large tree operations.
The key of the current node.
The value of the current node. Note that this returns the actual value, which in an aliased context allows you to modify the value stored in the tree.
$_++ for $iter->value;
The index of the current node.
A reference back to the Tree. Note that each iterator holds a strong reference to the tree.
True if the iterator has reached the end of its sequence, and no longer references a current Node.
my $nodes= $iter->next; my @nodes= $iter->next($count); my @nodes= $iter->next('*');
Return the current node (as a node object) and advance to the following node in the sequence. After the end of the sequence, calls to next return undef. If you pass the optional $count, it will return up to that many nodes, as a list. It will also return an empty list at the end of the sequence instead of returning undef. You can use the string '*' for the count to indicate all the rest of the nodes in the sequence.
next
$count
'*'
Same as next, but return the keys of the nodes.
Same as next, but return the values of the nodes. Like "value", these are also aliases, and can be modified.
$_++ for $iter->next_values('*');
my %x= $iter->next_kv('*');
Same as next, but return pairs of key and value for each node. This is useful for dumping them into a hash. (unless you have duplicate keys enabled, then don't dump them into a hash or you would lose elements)
$iter->step; # advance by one node $iter->step(10); # advance by 10 nodes $iter->step(-4); # back up 4 nodes
This moves the iterator by one or more nodes in the forward or backward direction. For a reverse-iterator, positive numbers move toward the minimum key and negative numbers move toward the maximum key. If the offset would take the iterator beyond the last node, the current node becomes undef. If the offset would take the iterator beyond the first node, the first node becomes the current node.
Delete the current node, return its value, and advance to the next node.
for (my $i= $tree->iter; !$i->done;) { if ($i->key =~ ...) { say "Removing ".$i->key." = ".$i->delete; } else { $i->step; } }
This is useful when combined with the key and value attributes of the iterator, but not so much when you are looping using next, because next has already moved to the next node beyond the one it returned to you. When using next, call delete on the node, not the iterator.
key
value
delete
Returns a new iterator of the same direction pointing at the same node.
This class implements the required methods needed for tie:
tie
my %hash my $tree= tie %hash, 'Tree::RB::XS'; $hash{$_}= $_ for 1..10; delete $hash{3}; $_ += 1 for values %hash; tied(%hash)->hseek(5); say each %hash; # 5
But you get better performance by using the tree's API directly. This should only be used when you need to integrate with code that isn't aware of the tree.
tied(%hash)->hseek( $key ); ->hseek({ -reverse => $bool }); ->hseek( $key, { -reverse => $bool });
This is a method of the tree, but only relevant to the tied hash interface. It controls the behavior of the next call to each %hash or keys %hash, causing the first element to be the node at or after the $key. (or before, if you change to a reverse iterator)
each %hash
keys %hash
This method differs from "hseek" in Tree::RB in that Tree::RB will change the logical first node of the iteration *indefinitely* such that repeated calls to keys do not see any element less than $key. This hseek only applies to the next iteration. (which I'm guessing was th intent in Tree::RB?)
keys
hseek
Export all with ':cmp'
Use Perl's cmp function. This forces the keys of the nodes to be stored as Perl Scalars.
Compare keys directly as whatever integer type Perl was compiled with. (i.e. 32-bit or 64-bit) This is the fastest option.
Compare the keys directly as whatever floating-point type Perl was compiled with. (i.e. 64-bit double or 80-bit long double)
Compare the keys as UTF8 byte strings, using Perl's internal bytes_cmp_utf8 function.
bytes_cmp_utf8
Compare the keys using C's memcmp function.
memcmp
Compare using the equivalent of this coderef:
sub { my @a_parts= split /([0-9]+)/, $_[0]; my @b_parts= split /([0-9]+)/, $_[1]; for (my $i= 0; $i < @a_parts || $i < @b_parts; $i++) { no warnings 'uninitialized'; my $cmp= ($i & 1)? ($a_parts[$i] <=> $b_parts[$i]) : ($a_parts[$i] cmp $b_parts[$i]); return $cmp if $cmp; } return 0; }
except the XS implementation is not limited by the integer size of perl, and operates directly on the strings without splitting anything. (i.e. much faster)
This results in a sort where integer portions of a string are sorted numerically, and any non-digit segment is compared as a string. This produces sort-orders like the following:
2020-01-01 2020-4-7 2020-10-12
or
14.4.2 14.14.0
If the key_type is KEY_TYPE_BSTR this will sort the string portions using memcmp, else they are sorted with Perl's unicode-aware sort.
key_type
KEY_TYPE_BSTR
Export all with ':key_type';
This key_type causes the tree to store whole Perl scalars for each node. Its default comparison function is Perl's own cmp operator.
This key_type causes the tree to store keys as Perl's integers, which are either 32-bit or 64-bit depending on how Perl was compiled. Its default comparison function puts the numbers in non-decreasing order.
This key_type causes the tree to store keys as Perl's floating point type, which are either 64-bit doubles or 80-bit long-doubles. Its default comparison function puts the numbers in non-decreasing order.
This key_type causes the tree to store keys as byte strings. The default comparison function is the standard Libc memcmp.
Same as KEY_TYPE_BSTR but reads the bytes from the supplied key as UTF-8 bytes. The default comparison function is also memcmp even though this does not sort Unicode correctly. (for correct unicode, use KEY_TYPE_ANY, but it's slower...)
KEY_TYPE_ANY
Export all with ':get'
This specifies a node with a key equal to the search key. If duplicate keys are enabled, this specifies the left-most match (least recently added). Has alias LUEQUAL to match Tree::RB.
LUEQUAL
Same as GET_EQ, but if duplicate keys are enabled, this specifies the right-most match (most recently inserted).
GET_EQ
This specifies the same node of GET_EQ, unless there are no matches, then it falls back to the left-most node with a key greater than the search key. Has alias LUGTEQ to match Tree:RB.
LUGTEQ
This specifies the same node of GET_EQ, unless there are no matches, then it falls back to the right-most node with a key less than the search key. Has alias LULTEQ to match Tree::RB.
LULTEQ
This specifies the same node of GET_EQ_LAST, unless there are no matches, then it falls back to the right-most node with a key less than the search key.
GET_EQ_LAST
Return the first node greater than the key, or undef if the key is greater than any node. Has alias LUGREAT to match Tree::RB.
LUGREAT
Return the right-most node less than the key, or undef if the key is less than any node. Has alias LULESS to match Tree::RB.
LULESS
Look for the last node matching the specified key (returning undef if not found) then return $node->next. This is the same as GET_GT except it ensures the key existed. Has alias LUNEXT to match Tree::RB.
$node->next
GET_GT
LUNEXT
Look for the first node matching the specified key (returning undef if not found) then return $node->prev. This is the same as GET_LT except it ensures the key existed. Has alias LUPREV to match Tree::RB.
$node->prev
GET_LT
LUPREV
The Red/Black module this one used as API inspiration. The fastest pure-perl tree module on CPAN. Implemented as blessed arrayrefs.
Another XS-based tree module. About 6%-70% slower than this one depending on whether you use coderef comparisons or optimized comparisons.
An AVL tree implementation in pure perl. The API is perhaps more convenient, with the ability to add your object to the tree with a callback that derives the key from that object. However, it runs significantly slower than Tree::RB.
version 0.07
Michael Conrad <mike@nrdvana.net>
This software is copyright (c) 2022 by Michael Conrad.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.
To install Tree::RB::XS, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Tree::RB::XS
CPAN shell
perl -MCPAN -e shell install Tree::RB::XS
For more information on module installation, please visit the detailed CPAN module installation guide.