Algorithm::SpatialIndex::Strategy::MedianQuadTree - QuadTree splitting on bucket medians
use Algorithm::SpatialIndex; my $idx = Algorithm::SpatialIndex->new( strategy => 'MedianQuadTree', );
A modified quad tree implementation that I'll call Median Quad Tree (MQT) in this document. (Not sure if this data structure has a different name elsewhere.) See
For a description of the public interface, see Algorithm::SpatialIndex. This spatial index strategy uses the default
Algorithm::SpatialIndex::Strategy::QuadTree as a base class and gets away with containing very little code because of that. This also means that most of the interface and behaviour is inherited.
For a basic discussion of quad trees, take a look at the documentation of the Algorithm::QuadTree module or look it up on Wikipedia. The following describes how the MQT differs from a normal quad tree and some implementation details.
Any x/y coordinate pair can be used to divide a rectangular area into four sub-rectangles. When splitting up a node of an ordinary quad tree, the center of the quad tree is chosen to split the node into four sub-nodes. This can be done either when the tree is created (before it is populated) with a static depth of the tree, or dynamically whenever the number of items associated with a node becomes too large.
For the MQT, the point which splits a given node into four is chosen to be the median of all contained item coordinates in each dimension.
This has several consequences:
Due to the dynamic nature of the coordinate choice when splitting a node in four, the MQT cannot be of a fixed depth and preallocated. It needs to grow dynamically as it is filled.
When the data in the tree is a reasonably general sample of the underlying distribution of data, then this algorithm will create a tree of evenly filled buckets, but not necessarily a well balanced tree. To obtain a reasonable sample of the underlying data distribution, it is prudent to insert items in random order.
Due to this behaviour, the tree will create very small nodes/bins where the most data is concentrated and bins of gradually increasing size as one moves away from the highest concentrations. A normal quad tree will have a similar behaviour, but due to the fixed size of the bins, will "converge" much more slowly.
If the data is inserted in order of the item coordinates, an ordinary quad tree is a more efficient. The MQT will make bad choices as the median of a contiguous subset of the ordered data will not reflect the overall distribution and the property of having fairly evenly filled buckets vanishes.
It is likely that polling a spatial index using an MQT is faster than an ordinary quad tree if the distribution of data is very different from uniformity. If in doubt, benchmark.
If the data is uniform but inserted in random order, the MQT will at best be equal in performance to a quad tree.
Filling a dynamically growing MQT has slightly more overhead than filling a dynamically growing quad tree due to the median calculation, but it is neither algorithmically slower nor necessarily slower in practice. Algorithmically, splitting a quad tree node will be O(n) in the bucket size. Splitting an MQT node will be O(n*log(n)) if a naive median calculation is used or also O(n) with a linear-time median calculation (like this implementation).
If the MQT is filled in random order and the data is not uniformly distributed, it will, on average, have more evenly filled buckets and thus less nodes than a quad tree, thus reducing the amount of tree walking required while filling and later polling the index.
Steffen Mueller, <email@example.com>
Copyright (C) 2010-2011 by Steffen Mueller
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.10.1 or, at your option, any later version of Perl 5 you may have available.