Math::SimpleHisto::XS - Simple histogramming, but kinda fast
use Math::SimpleHisto::XS; my $hist = Math::SimpleHisto::XS->new( min => 10, max => 20, nbins => 1000, ); $hist->fill($x); $hist->fill($x, $weight); $hist->fill(\@xs); $hist->fill(\@xs, \@ws); my $data_bins = $hist->all_bin_contents; # get bin contents as array ref my $bin_centers = $hist->bin_centers; # dito for the bins
This module implements simple 1D histograms with fixed bin size. The implementation is mostly in C with a thin Perl layer on top.
If this module isn't powerful enough for your histogramming needs, have a look at the powerful-but-experimental SOOT module or submit a patch.
The lower bin boundary is considered part of the bin. The upper bin boundary is considered part of the next bin or overflow.
Bin numbering starts at 0.
0
Nothing is exported by this module into the calling namespace by default. You can choose to export several constants:
INTEGRAL_CONSTANT
Or you can use the import tag ':all' to import all.
':all'
new
Constructor, takes named arguments. Mandatory parameters:
The lower boundary of the histogram.
The upper boundary of the histogram.
The number of bins in the histogram.
clone
new_alike
$hist->clone() clones the object entirely.
$hist->clone()
$hist->new_alike() clones the parameters of the object, but resets the contents of the clone.
$hist->new_alike()
fill
Fill data into the histogram. Takes one or two arguments. The first must be the coordinate that determines where data is to be added to the histogram. The second is optional and can be a weight for the data to be added. It defaults to 1.
1
If the coordinate is a reference to an array, it is assumed to contain many data points that are to be filled into the histogram. In this case, if the weight is used, it must also be a reference to an array of weights.
min
max
nbins
width
binsize
Return static histogram attributes: minimum coordinate, maximum coordinate, number of bins, total width of the histogram, and the size of each bin.
underflow
overflow
Return the accumulated contents of the under- and overflow bins (which have the ranges from (-inf, min) and [max, inf) respectively).
(-inf, min)
[max, inf)
total
The total sum of weights that have been filled into the histogram, excluding under- and overflow.
nfills
The total number of fill operations (currently including fills that fill into under- and overflow, but this is subject to change).
all_bin_contents
bin_content
$hist->all_bin_contents() returns the contents of all histogram bins as a reference to an array. This is not the internal storage but a copy.
$hist->all_bin_contents()
$hist->bin_content($ibin) returns the content of a single bin.
$hist->bin_content($ibin)
bin_centers
bin_center
$hist->bin_centers() returns a reference to an array containing the coordinates of all bin centers.
$hist->bin_centers()
$hist->bin_center($ibin) returns the coordinate of the center of a single bin.
$hist->bin_center($ibin)
bin_lower_boundaries
bin_lower_boundary
Same as bin_centers and bin_center respectively, but for the lower boundary coordinate(s) of the bin(s). Note that this lower boundary is considered part of the bin.
bin_upper_boundaries
bin_upper_boundary
Same as bin_centers and bin_center respectively, but for the upper boundary coordinate(s) of the bin(s). Note that this lower boundary is not considered part of the bin.
find_bin
$hist->find_bin($x) returns the bin number of the bin in which the given coordinate falls. Returns undef if the coordinate is outside the histogram range.
$hist->find_bin($x)
set_bin_content
$hist->set_bin_content($ibin, $content) sets the content of a single bin.
$hist->set_bin_content($ibin, $content)
set_underflow
set_overflow
$hist->set_underflow($content) sets the content of the underflow bin. set_overflow does the obvious.
$hist->set_underflow($content)
set_nfills
$hist->set_nfills($n) sets the number of fills.
$hist->set_nfills($n)
set_all_bin_contents
Given a reference to an array containing numbers, sets the contents of each bin in the histogram to the number in the respective array element. Number of elements needs to match the number of bins in the histogram.
integral
Returns the integral over the histogram. Very limited at this point. Usage:
my $integral = $hist->integral($from, $to, TYPE);
Where $from and $to are the integration limits and the optional TYPE is a constant indicating the method to use for integration. Currently, only INTEGRAL_CONSTANT is implemented (and assumed as the default). This means that the bins will be treated as rectangles, but fractional bins are treated correctly.
$from
$to
TYPE
If the integration limits are outside the histogram boundaries, there is no warning, the integration is silently performed within the range of the histogram.
mean
Calculates the (weighted) mean of the histogram contents.
Note that the result is not usually the same as if you calculated the mean of the input data directly due to the effect of the binning.
normalize
Normalizes the histogram to the parameter of the $hist->normalize($total) call. Normalization defaults to 1.
$hist->normalize($total)
This class defines serialization hooks for the Storable module. Therefore, you can simply serialize objects using the usual
use Storable; my $string = Storable::nfreeze($histogram); # ... later ... my $histo_object = Storable::thaw($string);
Currently, this mechanism hardcodes the use of the simple dump format. This is subject to change!
simple
The various serialization formats that this module supports (see the dump documentation below) all have various pros and cons. For example, the native_pack format is by far the fastest, but is not portable. The simple format is a very simple-minded text format, but it is portable and performs well (comparable to the JSON format when using JSON::XS, other JSON modules will be MUCH slower). Of all formats, the YAML format is the slowest. See xt/bench_dumping.pl for a simple benchmark script.
dump
native_pack
JSON
JSON::XS
YAML
None of the serialization formats currently supports compression, but the native_pack format produces the smallest output at about half the size of the JSON output. The simple format is close to JSON for all but the smallest histograms, where it produces slightly smaller dumps. The YAML produced is a bit bigger than the JSON.
This module has fairly simple serialization methods. Just call the dump method on an object of this class and provide the type of serialization desire. Currently valid serializations are simple, JSON, YAML, and native_pack. Case doesn't matter.
For YAML support, you need to have the YAML::Tiny module available. For JSON support, you need any of JSON::XS, JSON::PP, or JSON. The three modules are tried in order at compile time. The chosen implementation can be polled by looking at the $Math::SimpleHisto::XS::JSON_Implementation variable. It contains the module name. Setting this vairable has no effect.
YAML::Tiny
JSON::PP
$Math::SimpleHisto::XS::JSON_Implementation
The simple serialization format is a home grown text format that is subject to change, but in all likeliness, there will be some form of version migration code in the deserializer for backwards compatibility.
All of the serialization formats except for native_pack are text-based and thus portable and endianness-neutral.
native_pack should not be used when the serialized data is transferred to another machine.
new_from_dump
Given the type of the dump (simple, JSON, YAML, native_pack) and the actual dump string, creates a new histogram object from the contained data and returns it.
Deserializing JSON and YAML dumps requires the respective support modules to be available. See above.
SOOT is a dynamic wrapper around the ROOT C++ library which does histogramming and much more. Beware, it is experimental software.
Serialization can make use of the JSON::XS, JSON::PP, JSON or YAML::Tiny modules. You may want to use the convenient Storable module for transparent serialization of nested data structures containing objects of this class.
Steffen Mueller, <smueller@cpan.org>
Copyright (C) 2011 by Steffen Mueller
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.1 or, at your option, any later version of Perl 5 you may have available.
To install Math::SimpleHisto::XS, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Math::SimpleHisto::XS
CPAN shell
perl -MCPAN -e shell install Math::SimpleHisto::XS
For more information on module installation, please visit the detailed CPAN module installation guide.