The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Math::Vector::Real::kdTree - kd-Tree implementation on top of Math::Vector::Real

SYNOPSIS

  use Math::Vector::Real::kdTree;

  use Math::Vector::Real;
  use Math::Vector::Real::Random;

  my @v = map Math::Vector::Real->random_normal(4), 1..1000;

  my $tree = Math::Vector::Real::kdTree->new(@v);

  my $ix = $tree->find_nearest_vector(V(0, 0, 0, 0));

  say "nearest vector is $ix, $v[$ix]";

DESCRIPTION

This module implements a kd-Tree data structure in Perl and common algorithms on top of it.

Methods

The following methods are provided:

$t = Math::Vector::Real::kdTree->new(@points)

Creates a new kd-Tree containing the given points.

$t2 = $t->clone

Creates a duplicate of the tree. The two trees will share internal read only data so this method is more efficient in terms of memory usage than others performing a deep copy.

my $ix = $t->insert($p0, $p1, ...)

Inserts the given points into the kd-Tree.

Returns the index assigned to the first point inserted.

$s = $t->size

Returns the number of points inside the tree.

$p = $t->at($ix)

Returns the point at the given index inside the tree.

$t->move($ix, $p)

Moves the point at index $ix to the new given position readjusting the tree structure accordingly.

($ix, $d) = $t->find_nearest_vector($p, $max_d, @but_ix)
($ix, $d) = $t->find_nearest_vector($p, $max_d, \%but_ix)

Find the nearest vector for the given point $p and returns its index and the distance between the two points (in scalar context the index is returned).

If $max_d is defined, the search is limited to the points within that distance

Optionally, a list of point indexes to be excluded from the search can be passed or, alternatively, a reference to a hash containing the indexes of the points to be excluded.

@ix = $t->find_nearest_vector_all_internal

Returns the index of the nearest vector from the tree.

It is equivalent to the following code (though, it uses a better algorithm):

  @ix = map {
            scalar $t->nearest_vector($t->at($_), undef, $_)
        } 0..($t->size - 1);
$ix = $t->find_farthest_vector($p, $min_d, @but_ix)

Find the point from the tree farthest from the given $p.

The optional argument $min_d specifies a minimal distance. Undef is returned when not point farthest that it is found.

@but_ix specifies points that should not be considered when looking for the farthest point.

$ix = $t->find_farthest_vector_internal($ix, $min_d, @but_ix)

Given the index of a point on the tree this method returns the index of the farthest vector also from the tree.

@k = $t->k_means_start($n)

This method uses the internal tree structure to generate a set of point that can be used as seeds for other k_means methods.

There isn't any guarantee on the quality of the generated seeds, but the used algorithm seems to perform well in practice.

@k = $t->k_means_step(@k)

Performs a step of the Lloyd's algorithm for k-means calculation.

@k = $t->k_means_loop(@k)

Iterates until the Lloyd's algorithm converges and returns the final means.

@ix = $t->k_means_assign(@k)

Returns for every point in the three the index of the cluster it belongs to.

@ix = $t->find_in_ball($z, $d, $but)
$n = $t->find_in_ball($z, $d, $but)

Finds the points inside the tree contained in the hypersphere with center $z and radius $d.

In scalar context returns the number of points found. In list context returns the indexes of the points.

If the extra argument $but is provided. The point with that index is ignored.

@ix = $t->ordered_by_proximity

Returns the indexes of the points in an ordered where is likely that the indexes of near vectors are also in near positions in the list.

k-means

The module can be used to calculate the k-means of a set of vectors as follows:

  # inputs
  my @v = ...; my $k = ...;
  
  # k-mean calculation
  my $t = Math::Vector::Real::kdTree->new(@v);
  my @means = $t->k_means_start($k);
  @means = $t->k_means_loop(@means);
  @assign = $t->k_means_assign(@means);
  my @cluster = map [], 1..$k;
  for (0..$#assign) {
    my $cluster_ix = $assign[$_];
    my $cluster = $cluster[$cluster_ix];
    push @$cluster, $t->at($_);
  }
  
  use Data::Dumper;
  print Dumper \@cluster;

SEE ALSO

Wikipedia k-d Tree entry.

K-means filtering algorithm.

Math::Vector::Real

COPYRIGHT AND LICENSE

Copyright (C) 2011-2014 by Salvador Fandiño <sfandino@yahoo.com>

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.12.3 or, at your option, any later version of Perl 5 you may have available.