Catmandu::Stat - Catmandu modules for working with statistical data
# Calculate statistics on the availabity of the ISBN fields in the dataset cat data.json | catmandu convert JSON to Stat --fields isbn # Preprocess data and calculate statistics catmandu convert MARC to Stat --fix 'marc_map(020a,isbn)' --fields isbn < data.mrc # Or in fix files # Calculate the mean of foo. E.g. foo => [1,2,3,4] stat_mean(foo) # foo => '2.5' # Calculate the median of foo. E.g. foo => [1,2,3,4] stat_median(foo) # foo => '2.5' # Calculate the standard deviation of foo. E.g. foo => [1,2,3,4] stat_stddev(foo) # foo => '1.12' # Calculate the variance of foo. E.g. foo => [1,2,3,4] stat_variance(foo) # foo => '1.25'
Catmandu::Exporter::Stat
Catmandu::Fix::stat_mean
Catmandu::Fix::stat_median
Catmandu::Fix::stat_stddev
Catmandu::Fix::stat_variance
The Catmandu::Stat distribution includes a CSV file on the Sacramento crime rate in January 2006, "t/SacramentocrimeJanuary2006.csv" also available at http://samplecsvs.s3.amazonaws.com/SacramentocrimeJanuary2006.csv
To view statistics on the fields available in this file type:
$ catmandu convert CSV to Stat < t/SacramentocrimeJanuary2006.csv | name | count | zeros | zeros% | min | max | mean | variance | stdev | uniq~ | uniq% | entropy | |---------------|-------|-------|--------|-----|-----|------|----------|-------|-------|-------|-----------| | # | 7584 | | | | | | | | | | | | address | 7584 | 0 | 0.0 | 1 | 1 | 1 | 0.0 | 0.0 | 5425 | 71.5 | 12.4/12.4 | | beat | 7584 | 0 | 0.0 | 1 | 1 | 1 | 0.0 | 0.0 | 20 | 0.3 | 4.3/12.9 | | cdatetime | 7584 | 0 | 0.0 | 1 | 1 | 1 | 0.0 | 0.0 | 5071 | 66.9 | 12.3/12.3 | | crimedescr | 7584 | 0 | 0.0 | 1 | 1 | 1 | 0.0 | 0.0 | 305 | 4.0 | 5.6/12.6 | | district | 7584 | 0 | 0.0 | 1 | 1 | 1 | 0.0 | 0.0 | 6 | 0.1 | 2.6/12.9 | | grid | 7584 | 0 | 0.0 | 1 | 1 | 1 | 0.0 | 0.0 | 537 | 7.1 | 7.8/9.9 | | latitude | 7584 | 0 | 0.0 | 1 | 1 | 1 | 0.0 | 0.0 | 5288 | 69.7 | 12.4/12.4 | | longitude | 7584 | 0 | 0.0 | 1 | 1 | 1 | 0.0 | 0.0 | 5295 | 69.8 | 12.4/12.4 | | ucr_ncic_code | 7584 | 0 | 0.0 | 1 | 1 | 1 | 0.0 | 0.0 | 88 | 1.2 | 4.1/12.9 |
The file has 7584 rows where and all the fields address to ucr_ncic_code contain values. Each field has only one value (no arrays available in the CSV file). The are 5492 unique addresses in the CSV file. The district field has the lowest entropy, most of its values are shared among many rows.
address
ucr_ncic_code
district
Catmandu, Catmandu::Breaker,
Patrick Hochstenbach, <patrick.hochstenbach at ugent.be>
<patrick.hochstenbach at ugent.be>
This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.
See http://dev.perl.org/licenses/ for more information.
To install Catmandu::Stat, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Catmandu::Stat
CPAN shell
perl -MCPAN -e shell install Catmandu::Stat
For more information on module installation, please visit the detailed CPAN module installation guide.