Algorithm::Tree::NCA - Constant time retrieval of Nearest Common Ancestor
use Algorithm::Tree::NCA; my $tree = ...; my $nca = new Algorithm::Tree::NCA(-tree => $tree); my $x = $tree->get_node(...); my $y = $tree->get_node(...); my $z = $nca->nca($x,$y);
This package provides constant-time retrieval of the Nearest Common Ancestor (NCA) of nodes in a tree. The implementation is based on the algorithm by Harel and which can, after linear-time preprocessing, retrieve the nearest common ancestor of two nodes in constant time.
To implement the algorithm it is necessary to store some data for each node in the tree.
All data above, with the exception of the node number, is stored in an array inside the Algorithm::Tree::NCA object.
Algorithm::Tree::NCA
The node number has to be stored in the actual tree node in some manner (alternative solutions would be to slow to give constant-time retrieval), which requires a set method and a get method for the nodes. Since the most common case is using hashes to represent nodes, there are default implementations of the set and get methods.
The default set method is:
sub _set_method { my($node,$value) = @_; $node->{'_nca_number'} = $value; }
and the default get method is:
sub _get_method { my($node) = @_; return $node->{'_nca_number'}; }
If have chosen another representation of your nodes, you can provide alternative set and get methods by using the -set and -get options when creating the Algorithm::Tree::NCA object.
This section describes the algorithm used for preprocessing and for nearest common ancestor retrieval. It does not provide any intuition to why the algorithm works, just a description how it works. For the algorithm description, it is assumed that the nodes themself contain all necessary information. The algorithm is described in a Pascal-like fashion. For detailed information about the algorithm, please have a look in [1] or [2].
The height of a non-zero integer is the number of zeros at the right end of the integer. The least significant set bit (LSSB) of a non-zero number is the the number with only the least significant bit set (surprise). For instance, here is the LSSB and the height of some numbers:
Number LSSB Height -------- -------- ------ 01001101 00000001 0 01001100 00000100 2
Important to note here is that for numbers i and j, height(i) < height(j) if and only if LSSB(i) < LSSB(j), which means that we can replace a test of height(i) < height(j) with LSSB(i) < LSSB(j). Since LSSB(i) is easier to compute, this will speed up the computation.
Preprocessing the tree requires the computation of three numbers: the node number, the run, and a magic number. It also requires computation of the leader of each run. These computations are done in two recursive descents and ascents of the tree.
Procedure Preprocess(root:Node) Var x,y : Integer; (* Dummy variables *) Begin (x,y) := Enumerate(root,nil,1); ComputeMagic(root,root,0); End;
In the first phase, we enumerate the tree, compute the run for each node, the max number assigned to a node in the subtree, and also the parent of each node. If the parent is already available through other means, that part is redundant. The run of a node is the number of the node in the subtree with the largest height.
Function Enumerate(node,parent:Node; num:Integer) : (Integer,Integer) Var run : Integer; Begin node.parent := parent; node.number := num; node.run := num; run := num; num := num + 1; Foreach child in node.children Do (num,run) := Enumerate(child,node,num); If height(run) > height(node.run) Then node.run := run; Done node.max := num; Return (num,node.run) End;
In the second phase, we compute the leader for each run (which we can since we know the run for each node) and the magic number. The leader has to be stored so that we can access is through a node number, so we store it in an array.
VAR Leader : Array [1..NODE_COUNT] of NodePtr;
The leader for each run can either be stored for each node (which we assume here), or only stored in the node where the node.run == node.number. We can then compute the leader of any node through Leader(node.run), which requires less storage if Leader is implemented as a spare array.
node.run == node.number
node
Leader(node.run)
Leader
The magic number of a node is the bitwise or of all run:s of nodes in the path leading from the root node to the node.
Procedure ComputeMagic(node, current_leader:Node; magic:Integer) Begin node.magic = magic | LSSB(node.run); If node.run != leader.run Then Leader(node.number) = node Else Leader(node.number) = current_leader Foreach child in node.children Do ComputeMagic(child, Leader(node.number), node.magic) Done End;
To find the NCA of two nodes, we map the nodes to a binary tree and find the NCA b there (which is easy). We then do some bitwise arithmetics to find the bit j where the magic numbers of the two nodes have a 1 in common and that has a greater height than b.
1
Function NCA(x,y:Node) : Node; Begin b := BinNCA(x,y); (* b = 10111000 *) m := (NOT b) AND (b - 1); (* m = 00000111 *) c := x.magic AND y.magic; u := c AND (NOT m); j := LSSB(u); x1 := Closest(x,j); y1 := Closest(y,j); If x1.number < y1.number Then Return x1 Else Return y1 End;
Retrieving the nearest common ancestor in a complete binary tree is easy assuming you have a special numbering of the nodes. We number each node with a path number such that the root node is numbered 10000000 and for each choice down the tree, we use (for example) 0 for left and 1 for right. Assuming that the nodes are not on the same path from the root, we can then:
10000000
0
The value returned is the path number of the node that is the nearest common ancestor. (In this implementation, the mapping from node number to run is a mapping to a binary tree where the run is the path number.)
Function BinNCA(x,y:Node) : Integer Var k,m,r : Integer; Begin (* Check that neither is the ancestor of the other *) If x.number <= y.number and y.number < x.max Then Return x.run If y.number <= x.number and x.number < y.max Then Return y.run (* Suppose x.run = 10110--- and y.run = 10111--- *) (* Then x.run XOR y.run = 00001---, and further: *) k := MSSB(x.run XOR y.run); (* k = 00001000 *) m := k XOR (k - 1); (* m = 00001111 *) r := (NOT m) AND x.run; (* r = 10110000 *) Return r OR k; (* result: 10111000 *) End;
To find the node closest to a node n but on the same run as the NCA z we need the j supplied by the NCA function above.
NCA
Function Closest(n:Node; j:Integer) : Node; Begin l := LSSB(n.magic); If l = j Then Return x k := MSSB((j - 1) AND x.magic); u := ((NOT ((k - 1) OR k)) AND x.run) OR k w := Leader(u); Return w.parent; (* z = w.parent *) End;
Mats Kindahl <matkin@acm.org>
perl.
To install Algorithm::Tree::NCA::Data, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Algorithm::Tree::NCA::Data
CPAN shell
perl -MCPAN -e shell install Algorithm::Tree::NCA::Data
For more information on module installation, please visit the detailed CPAN module installation guide.