Tree::M - implement M-trees for efficient "metric/multimedia-searches"
use Tree::M; $M = new Tree::M
Ever had the problem of managing multi-dimensional (spatial) data but your database only had one-dimensional indices (b-tree etc.)? Queries like
select data from table where latitude > 40 and latitude < 50 and longitude> 50 and longitude< 60;
are quite inefficient, unless longitude and latitude are part of the same spatial index (e.g. an R-tree).
An M-tree is an index tree that does not directly look at the stored keys but rather requires a distance (a metric, e.g. a vector norm) function to be defined that sorts keys according to their distance. In the example above the distance function could be the maximum norm (
max(x1-x2, y1-y2)). The lookup above would then be something like this:
my $res = $M->range([45,55], 5);
This module implements an M-tree. Although the data structure and the distance function is arbitrary, the current version only implements n-dimensional discrete vectors and hardwires the distance function to the suared euclidean metric (i.e.
(x1-x2)**2 + (y1-y2)**2 + (z1-z2)**2 + ...). Evolution towards more freedom is expected ;)
Creates a new M-Tree. Before it can be used you have to call one of the
open methods below.
ndims => integer the number of dimensions each vector has range => [min, max, steps] min the lowest allowable scalar value in each dimension max the maximum allowable number steps the number of discrete steps (used when stored externally) pagesize => integer the size of one page on underlying storage. usually 4096, but large objects (ndims > 20 or so) might want to increase this
Example: create an M-Tree that stores 8-bit rgb-values:
$M = new Tree::M ndims => 3, range => [0, 255, 256];
Example: create an M-Tree that stores coordinates from -1..1 with 100 different steps:
$M = new Tree::M ndims => 2, range => [-1, 1, 100];
Open or create the external storage file
$path and associate it with the tree.
[this braindamaged API will go away ;)]
Insert a vector (given by an array reference) into the index and associate it with the value
$data (a 32-bit integer).
Synchronize the data file with memory. Useful after calling
insert to ensure the data actually reaches stable storage.
Search all entries not farther away from
$radius and return an arrayref containing the searchresults.
Each result is again anarrayref composed like this:
e.g. the same as given to the
$n "nearest neighbours". The results arrayref (see
range) contains the
$n index values nearest to
@v, sorted for distance.
Calculcate the distance between two vectors, just as they databse engine would do it.
Return the maximum height of the tree (usually a small integer specifying the length of the path from the root to the farthest leaf)
Inserting too many duplicate keys into the tree cause the C++ library to die, so don't do that.
Marc Lehmann <email@example.com>.