The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Math::SimpleHisto::XS - Simple histogramming, but kinda fast

SYNOPSIS

  use Math::SimpleHisto::XS;
  my $hist = Math::SimpleHisto::XS->new(
    min => 10, max => 20, nbins => 1000,
  );
  
  $hist->fill($x);
  $hist->fill($x, $weight);
  $hist->fill(\@xs);
  $hist->fill(\@xs, \@ws);
  
  my $data_bins = $hist->all_bin_contents; # get bin contents as array ref
  my $bin_centers = $hist->bin_centers; # dito for the bins

DESCRIPTION

This module implements simple 1D histograms with fixed bin size. The implementation is mostly in C with a thin Perl layer on top.

If this module isn't powerful enough for your histogramming needs, have a look at the powerful-but-experimental SOOT module or submit a patch.

The lower bin boundary is considered part of the bin. The upper bin boundary is considered part of the next bin or overflow.

Bin numbering starts at 0.

EXPORT

Nothing is exported by this module into the calling namespace by default. You can choose to export several constants:

  INTEGRAL_CONSTANT

Or you can use the import tag ':all' to import all.

BASIC METHODS

new

Constructor, takes named arguments. Mandatory parameters:

min

The lower boundary of the histogram.

max

The upper boundary of the histogram.

nbins

The number of bins in the histogram.

clone, new_alike

$hist->clone() clones the object entirely.

$hist->new_alike() clones the parameters of the object, but resets the contents of the clone.

fill

Fill data into the histogram. Takes one or two arguments. The first must be the coordinate that determines where data is to be added to the histogram. The second is optional and can be a weight for the data to be added. It defaults to 1.

If the coordinate is a reference to an array, it is assumed to contain many data points that are to be filled into the histogram. In this case, if the weight is used, it must also be a reference to an array of weights.

min, max, nbins, width, binsize

Return static histogram attributes: minimum coordinate, maximum coordinate, number of bins, total width of the histogram, and the size of each bin.

underflow, overflow

Return the accumulated contents of the under- and overflow bins (which have the ranges from (-inf, min) and [max, inf) respectively).

total

The total sum of weights that have been filled into the histogram, excluding under- and overflow.

nfills

The total number of fill operations (currently including fills that fill into under- and overflow, but this is subject to change).

BIN ACCESS METHODS

all_bin_contents, bin_content

$hist->all_bin_contents() returns the contents of all histogram bins as a reference to an array. This is not the internal storage but a copy.

$hist->bin_content($ibin) returns the content of a single bin.

bin_centers, bin_center

$hist->bin_centers() returns a reference to an array containing the coordinates of all bin centers.

$hist->bin_center($ibin) returns the coordinate of the center of a single bin.

bin_lower_boundaries, bin_lower_boundary

Same as bin_centers and bin_center respectively, but for the lower boundary coordinate(s) of the bin(s). Note that this lower boundary is considered part of the bin.

bin_upper_boundaries, bin_upper_boundary

Same as bin_centers and bin_center respectively, but for the upper boundary coordinate(s) of the bin(s). Note that this lower boundary is not considered part of the bin.

find_bin

$hist->find_bin($x) returns the bin number of the bin in which the given coordinate falls. Returns undef if the coordinate is outside the histogram range.

SETTERS

set_bin_content

$hist->set_bin_content($ibin, $content) sets the content of a single bin.

set_underflow, set_overflow

$hist->set_underflow($content) sets the content of the underflow bin. set_overflow does the obvious.

set_nfills

$hist->set_nfills($n) sets the number of fills.

set_all_bin_contents

Given a reference to an array containing numbers, sets the contents of each bin in the histogram to the number in the respective array element. Number of elements needs to match the number of bins in the histogram.

CALCULATIONS

integral

Returns the integral over the histogram. Very limited at this point. Usage:

  my $integral = $hist->integral($from, $to, TYPE);

Where $from and $to are the integration limits and the optional TYPE is a constant indicating the method to use for integration. Currently, only INTEGRAL_CONSTANT is implemented (and assumed as the default). This means that the bins will be treated as rectangles, but fractional bins are treated correctly.

If the integration limits are outside the histogram boundaries, there is no warning, the integration is silently performed within the range of the histogram.

mean

Calculates the (weighted) mean of the histogram contents.

Note that the result is not usually the same as if you calculated the mean of the input data directly due to the effect of the binning.

normalize

Normalizes the histogram to the parameter of the $hist->normalize($total) call. Normalization defaults to 1.

SERIALIZATION

This class defines serialization hooks for the Storable module. Therefore, you can simply serialize objects using the usual

  use Storable;
  my $string = Storable::nfreeze($histogram);
  # ... later ...
  my $histo_object = Storable::thaw($string);

Currently, this mechanism hardcodes the use of the simple dump format. This is subject to change!

The various serialization formats that this module supports (see the dump documentation below) all have various pros and cons. For example, the native_pack format is by far the fastest, but is not portable. The simple format is a very simple-minded text format, but it is portable and performs well (comparable to the JSON format when using JSON::XS, other JSON modules will be MUCH slower). Of all formats, the YAML format is the slowest. See xt/bench_dumping.pl for a simple benchmark script.

None of the serialization formats currently supports compression, but the native_pack format produces the smallest output at about half the size of the JSON output. The simple format is close to JSON for all but the smallest histograms, where it produces slightly smaller dumps. The YAML produced is a bit bigger than the JSON.

dump

This module has fairly simple serialization methods. Just call the dump method on an object of this class and provide the type of serialization desire. Currently valid serializations are simple, JSON, YAML, and native_pack. Case doesn't matter.

For YAML support, you need to have the YAML::Tiny module available. For JSON support, you need any of JSON::XS, JSON::PP, or JSON. The three modules are tried in order at compile time. The chosen implementation can be polled by looking at the $Math::SimpleHisto::XS::JSON_Implementation variable. It contains the module name. Setting this vairable has no effect.

The simple serialization format is a home grown text format that is subject to change, but in all likeliness, there will be some form of version migration code in the deserializer for backwards compatibility.

All of the serialization formats except for native_pack are text-based and thus portable and endianness-neutral.

native_pack should not be used when the serialized data is transferred to another machine.

new_from_dump

Given the type of the dump (simple, JSON, YAML, native_pack) and the actual dump string, creates a new histogram object from the contained data and returns it.

Deserializing JSON and YAML dumps requires the respective support modules to be available. See above.

SEE ALSO

SOOT is a dynamic wrapper around the ROOT C++ library which does histogramming and much more. Beware, it is experimental software.

Serialization can make use of the JSON::XS, JSON::PP, JSON or YAML::Tiny modules. You may want to use the convenient Storable module for transparent serialization of nested data structures containing objects of this class.

AUTHOR

Steffen Mueller, <smueller@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2011 by Steffen Mueller

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.1 or, at your option, any later version of Perl 5 you may have available.