The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

histify - generate simple histograms from streamed data

SYNOPSIS

  generator | histify [--nbins=X] [--min=X] [--max=X] \
                      [--cumulative] \
                      [--desc=<center|left|right|number|range|none>] \
                      [--xw] [--dump-as-input] [--dump] [--pipe] \
                      [--rebin=X] [--random=X]

Reads whitespace-separated numbers from STDIN and generates a histogram. If no histogram boundaries are specified using options, the number of bins defaults to 10 and the min/max are extracted from the data. That means reading all data into memory. If you specify min/max, the program works with constant memory overhead.

Prints the resulting histogram contents one bin per line.

Using --desc=<type> adds an extra column to the output before the histogram content (separated by a tab) that can be any one of: The bin "number", the bin "center", the "left" bin boundary, the "right" bin boundary, or the bin "range" (lower and upper boundary separated by a comma).

The --xw option will cause histify to read alternating X values and weights instead of just X values from STDIN. This is useful for re-binning partially aggregated input data.

The --dump-as-input (or -d) option indicates that the input will not be of the form outlined above, but instead be the dump of a Math::SimpleHisto::XS histogram of any format supported by the module. At this time, this option is not compatible with the --xw, --max, --min, --nbins options. The --dump option changes the output from a TSV format to a JSON dump that will be readable with --dump-as-input. The --pipe option enables both --dump-as-input and --dump. When the --dump-as-input option is enabled, then each line on STDIN may contain a histogram dump. If there is more than one, then histify will attempt to add the histograms. They must contain data in identical binning.

The --cumulative option causes histify to calculate the cumulative histogram of the input.

The --rebin option causes histify to rebin the histogram after the fact by a given factor which must be a divisor of the original number of bins.

The --random option makes histify create a new histogram with the supplied parameters (default: 10 bins between 0 and 1) and the provided number of random fills (default: 1000).