- BASIC METHODS
- BIN ACCESS METHODS
- SEE ALSO
- COPYRIGHT AND LICENSE
Math::SimpleHisto::XS - Simple histogramming, but kinda fast
use Math::SimpleHisto::XS; my $hist = Math::SimpleHisto::XS->new( min => 10, max => 20, nbins => 1000, ); $hist->fill($x); $hist->fill($x, $weight); $hist->fill(\@xs); $hist->fill(\@xs, \@ws); my $data_bins = $hist->all_bin_contents; # get bin contents as array ref my $bin_centers = $hist->bin_centers; # dito for the bins
This module implements simple 1D histograms with fixed bin size. The implementation is mostly in C with a thin Perl layer on top.
If this module isn't powerful enough for your histogramming needs, have a look at the powerful-but-experimental SOOT module or submit a patch.
The lower bin boundary is considered part of the bin. The upper bin boundary is considered part of the next bin or overflow.
Bin numbering starts at
Nothing is exported by this module into the calling namespace by default. You can choose to export several constants:
Or you can use the import tag
':all' to import all.
Constructor, takes named arguments. Mandatory parameters:
The lower boundary of the histogram.
The upper boundary of the histogram.
The number of bins in the histogram.
$hist->clone() clones the object entirely.
$hist->new_alike() clones the parameters of the object, but resets the contents of the clone.
Fill data into the histogram. Takes one or two arguments. The first must be the coordinate that determines where data is to be added to the histogram. The second is optional and can be a weight for the data to be added. It defaults to
If the coordinate is a reference to an array, it is assumed to contain many data points that are to be filled into the histogram. In this case, if the weight is used, it must also be a reference to an array of weights.
Return static histogram attributes: minimum coordinate, maximum coordinate, number of bins, total width of the histogram, and the size of each bin.
Return the accumulated contents of the under- and overflow bins (which have the ranges from
(-inf, min) and
[max, inf) respectively).
The total sum of weights that have been filled into the histogram, excluding under- and overflow.
The total number of fill operations (currently including fills that fill into under- and overflow, but this is subject to change).
BIN ACCESS METHODS
$hist->all_bin_contents() returns the contents of all histogram bins as a reference to an array. This is not the internal storage but a copy.
$hist->bin_content($ibin) returns the content of a single bin.
$hist->bin_centers() returns a reference to an array containing the coordinates of all bin centers.
$hist->bin_center($ibin) returns the coordinate of the center of a single bin.
bin_center respectively, but for the lower boundary coordinate(s) of the bin(s). Note that this lower boundary is considered part of the bin.
bin_center respectively, but for the upper boundary coordinate(s) of the bin(s). Note that this lower boundary is not considered part of the bin.
$hist->find_bin($x) returns the bin number of the bin in which the given coordinate falls. Returns undef if the coordinate is outside the histogram range.
$hist->set_bin_content($ibin, $content) sets the content of a single bin.
$hist->set_underflow($content) sets the content of the underflow bin.
set_overflow does the obvious.
$hist->set_nfills($n) sets the number of fills.
Given a reference to an array containing numbers, sets the contents of each bin in the histogram to the number in the respective array element. Number of elements needs to match the number of bins in the histogram.
Returns the integral over the histogram. Very limited at this point. Usage:
my $integral = $hist->integral($from, $to, TYPE);
$to are the integration limits and the optional
TYPE is a constant indicating the method to use for integration. Currently, only
INTEGRAL_CONSTANT is implemented (and assumed as the default). This means that the bins will be treated as rectangles, but fractional bins are treated correctly.
If the integration limits are outside the histogram boundaries, there is no warning, the integration is silently performed within the range of the histogram.
Calculates the (weighted) mean of the histogram contents.
Note that the result is not usually the same as if you calculated the mean of the input data directly due to the effect of the binning.
Normalizes the histogram to the parameter of the
$hist->normalize($total) call. Normalization defaults to
This class defines serialization hooks for the Storable module. Therefore, you can simply serialize objects using the usual
use Storable; my $string = Storable::nfreeze($histogram); # ... later ... my $histo_object = Storable::thaw($string);
Currently, this mechanism hardcodes the use of the
simple dump format. This is subject to change!
The various serialization formats that this module supports (see the
dump documentation below) all have various pros and cons. For example, the
native_pack format is by far the fastest, but is not portable. The
simple format is a very simple-minded text format, but it is portable and performs well (comparable to the
JSON format when using
JSON::XS, other JSON modules will be MUCH slower). Of all formats, the
YAML format is the slowest. See xt/bench_dumping.pl for a simple benchmark script.
None of the serialization formats currently supports compression, but the
native_pack format produces the smallest output at about half the size of the JSON output. The
simple format is close to
JSON for all but the smallest histograms, where it produces slightly smaller dumps. The
YAML produced is a bit bigger than the
This module has fairly simple serialization methods. Just call the
dump method on an object of this class and provide the type of serialization desire. Currently valid serializations are
native_pack. Case doesn't matter.
YAML support, you need to have the
YAML::Tiny module available. For
JSON support, you need any of
JSON. The three modules are tried in order at compile time. The chosen implementation can be polled by looking at the
$Math::SimpleHisto::XS::JSON_Implementation variable. It contains the module name. Setting this vairable has no effect.
The simple serialization format is a home grown text format that is subject to change, but in all likeliness, there will be some form of version migration code in the deserializer for backwards compatibility.
All of the serialization formats except for
native_pack are text-based and thus portable and endianness-neutral.
native_pack should not be used when the serialized data is transferred to another machine.
Given the type of the dump (
native_pack) and the actual dump string, creates a new histogram object from the contained data and returns it.
YAML dumps requires the respective support modules to be available. See above.
SOOT is a dynamic wrapper around the ROOT C++ library which does histogramming and much more. Beware, it is experimental software.
Serialization can make use of the JSON::XS, JSON::PP, JSON or YAML::Tiny modules. You may want to use the convenient Storable module for transparent serialization of nested data structures containing objects of this class.
Steffen Mueller, <firstname.lastname@example.org>
COPYRIGHT AND LICENSE
Copyright (C) 2011 by Steffen Mueller
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.1 or, at your option, any later version of Perl 5 you may have available.