The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Statistics::Zed - Data-handling and calculations for ratio of observed to standard deviation (zscore)

VERSION

Version 0.10

SYNOPSIS

use Statistics::Zed 0.10;

# new() with optional args:
$zed = Statistics::Zed->new(
   ccorr    => 1,
   tails    => 2,
   precision_s => 3,
   precision_p => 7,
);

# optionally pre-load one or more values with these names:
$zed->load(observed => [5, 6, 3], expected => [2.5, 3, 3], variance => [8, 8, 9]);
$zed->add(observed => [3, 6], expected => [2.7, 2.5], variance => [7, 8]); # update loaded arrays
$z_value = $zed->score(); # calc z_value from pre-loaded data

# alternatively, call zscore() - alias score() - with the required args (with arefs or single values):
$z_value = $zed->zscore(
   observed => 5,
   expected => 2.5,
   variance => 8,
);

# as either of above, but call in array context for more results:
($z_value, $p_value, $observed_deviation, $standard_deviation) = $zed->zscore();

# as either of above but with optional args:
$z_value = $zed->zscore(ccorr => 1, precision_s => 3);

# get the normal distribution p_value only - alias z2p():
$p_value = $zed->p_value(); # using pre-loaded data
$p_value = $zed->p_value(observed => 5, expected => 2.5, variance => 8); # from given data
$p_value = $zed->p_value(tails => 2, ccorr => 1, precision_p => 5); # same as either with optional args

# "inverse phi" (wraps to Math::Cephes::ndtri):
$z_value = $zed->p2z(value => $p_value, tails => 1|2);

DESCRIPTION

Methods are provided to:

+ calculate a z-score: ratio of an observed deviation to a standard deviation, with optional continuity correction

+ convert z-value to normal p-value, and convert p-value to normal-equiv z-value

+ load, add, save & retrieve observed, expected and variance values to compute z_score across samples

+ support z-testing in Statistics::Sequences and other modules.

Optionally, load/add observed, expected and variance values (named as such) and compute a z-score between/after updates. The module uses Statistics::Data to cache each observed, expected and variance values, and to provide for the load/add methods, as well as to save/retrieve these values between class calls (not documented here, see Statistics::Data). Alternatively, simply call zscore and pvalue, passing them the values by these labels in a hash (or hashref), with either single numerical values or referenced arrays of the same. Optionally, specify tails, where relevant, and precision the returned z-values and p-values as required.

SUBROUTINES/METHODS

new

$zed = Statistics::Zed->new();
$zed = Statistics::Zed->new(ccorr => NUM, tails => 1|2, precision_s => INT, precision_p => INT);

Returns a Statistics::Zed object. Accepts setting of any of the OPTIONS.

load

$zed->load(observed => [NUMs], expected => [NUMs], variance => [NUMs]); # labelled list of each required series
$zed->load({ observed => [NUMs], expected => [NUMs], variance => [NUMs] }); # same but as referenced hash

Optionally load data for each of observed, expected and variance series as arefs (reference to list of numbers), using load in Statistics::Data. Returns 1 if successful but croaks if data cannot be loaded; see DIAGNOSTICS.

add

$zed->add(observed => [NUMs], expected => [NUMs], variance => [NUMs]); # labelled list of each required series
$zed->add({ observed => [NUMs], expected => [NUMs], variance => [NUMs] }); # same but as referenced hash

Update any existing, previously loaded data, via add in Statistics::Data. Returns 1 if successful but croaks if data cannot be added; see DIAGNOSTICS.

zscore

$zval = $zed->zscore(); # assuming observed, expected and variance values already loaded/added, as above
$zval = $zed->zscore(observed => NUM, expected => NUM, variance => NUM);
$zval = $zed->zscore(observed => [NUMs], expected => [NUMs], variance => [NUMs]);
($zval, $pval, $obs_dev, $stdev) = $zed->zscore(); # same but array context call for more info
$zscore = $zed->zscore(observed => [12], expected => [5], variance => [16], ccorr => 1); # same but with continuity correction

Returns the z-value for the values of observed, expected and variance sent to load and/or add, or as sent in a call to this method itself as a hash (or hashref). If called wanting an array, then the z-value, its probability, the observed deviation and the standard deviation are returned.

Alias: score, z_value

As described in OPTIONS, optionally specify a numerical value for ccorr for performing the continuity-correction to the observed deviation, and a value of either 1 or 2 to specify the tails for reading off the normal distribution.

The basic formula is the basic:

   Z = ( ×X ) / SD

where X is the expected value (mean, etc.). If supplying an array of values for each of the required arguments, then the z-score is based on summing their values, i.e., (sum of observeds less sum of expecteds) divided by square-root of the sum of the variances.

p_value

$p_value = $zed->p_value($z); # assumes 2-tailed
$p_value = $zed->p_value(value => $z); # assumes 2-tailed
$p_value = $zed->p_value(value => $z, tails => 1);
$p_value = $zed->p_value(); # assuming observed, expected and variance values already loaded/added, as above
$p_value = $zed->p_value(observed => NUM, expected => NUM, variance => NUM);
$p_value = $zed->p_value(observed => [NUMs], expected => [NUMs], variance => [NUMs]);

Alias: pvalue, z2p

Send a z-value, get its associated p-value, 2-tailed by default, or depending on the value of the optional argument tails. If you pass in just one value (unkeyed), it is taken as the z-value. Alternatively, it can be passed the same arguments as for zscore so that it will calculate the zscore itself but return only the p-value.

Uses Math::Cephes ndtr normal probability function, which returns 0 if the z-value is greater than or equal to 38.

The optional argument precision_p renders the returned p-value to so many decimal places (simply by sprintf).

p2z

$z_value = $zed->p2z($p_value) # the p-value is assumed to be 2-tailed
$z_value = $zed->p2z(value => $p_value) # the p-value is assumed to be 2-tailed
$z_value = $zed->p2z(value => $p_value, tails => 1) # specify 1-tailed probability

Returns the z-value associated with a p-value using the inverse phi function ndtri in Math::Cephes. The p-value is assumed to be two-tailed, and so is firstly (before conversion) divided by 2, e.g., .05 becomes .025 so you get z = 1.96. As a one-tailed probability, it is then assumed to be a probability of being greater than a certain amount, i.e., of getting a z-value greater than or equal to that observed. So the inverse phi function is actually given (1 - p-value) to work on. So .055 comes back as 1.598 (speaking of the top-end of the distribution), and .991 comes back as -2.349 (now going from right to left across the distribution). This is not the same as found in inversion methods in common spreadsheet packages but seems to be expected by humans.

obsdev

$obsdev = $zed->obsdev(); # assuming observed and expected values already loaded/added, as above
$obsdev = $zed->obsdev(observed => NUM, expected => NUM);
$obsdev = $zed->obsdev(observed => [NUMs], expected => [NUMs]);

Returns the observed deviation (only), as would be returned as the third value if calling zscore in array context. This is simply the (sum of) the observed value(s) less the (sum of) the expected value(s), with the (sum of) the latter given the continuity correction if this is (optionally) also given as an argument, named ccorr; see OPTIONS.

ccorr

$zed->ccorr(value => 1); # will be used in all methods, unless they are given a ccorr value to use
$val = $zed->ccorr(); # returns any value set in new() or previously here

Set the value of the optional ccorr argument to be used for all statistics methods, or, without a value, return the current value. This might be undef if it has not previously been explicitly set in new or via this method. To quash any set value, specify value => 0. When sending a value for ccorr to any other method, this value takes precedence over any previously set, but it does not "re-set" the cached value that is set here or in new. See OPTIONS for how this value is used. It is assumed that the value sent is a valid numerical value.

tails

$zed->tails(value => 1); # will be used in all methods, unless they are given a tails value to use
$val = $zed->tails(); # returns any value set in new() or previously here

Set the value of the optional tails argument to be used for all statistics methods, or, without a value, return the current value. The default is 2; and this can be overriden by setting its value in new, by this method, or as an explicit argument in any method. When sending a value for tails to any other method, this value takes precedence over any previously set, but it does not "re-set" the cached value that is set here or in new. See p_value, p2z and OPTIONS for how this value is used. The value set must be either 1 or 2; a croak is heard otherwise.

string

$str = $zed->string(); # assuming observed, expected and variance values already loaded/added, as above
$str = $zed->string(observed => NUM, expected => NUM, variance => NUM);
$str = $zed->string(observed => [NUMs], expected => [NUMs], variance => [NUMs]);

Returns a string giving the zscore and p-value. Takes the same arguments as for zscore, which it calls itself, taking its returned values to make up a string in the form Z = 0.141, 1p = 0.44377. Accepts the optional arguments tails, ccorr, precsion_s and precision_p; see OPTIONS. In the example, precision_s has been specified as 3, precision_p has been set to 5, and tails has been set to 1.

dump

$zed->dump(); # assuming observed, expected and variance values already loaded/added, as above
$zed->dump(observed => NUM, expected => NUM, variance => NUM);
$zed->dump(observed => [NUMs], expected => [NUMs], variance => [NUMs]);

Prints to STDOUT a line giving the zscore and p-value, being what would be returned by string but with a new-line "\n" character appended.

OPTIONS

The following can be set in calls to the above methods, including new, where relevant.

ccorr

Apply the continuity correction. Default = 0. Otherwise, specify a correcting difference value (not necesarily 1), and the procedure is to calculate the observed difference as its absolute value less half of this correcting value, returning the observed difference with its original sign. To clarify for Germans, this is the Stetigkeitskorrektur.

tails

Tails from which to assess the association p-value (1 or 2). Default = 2.

precision_s

Precision of the z-value (the statistic). Default is undefined - you get all decimal values available.

precision_p

Precision of the associated p-value. Default is undefined - you get all decimal values available.

Deprecated methods

Methods for "series testing" are deprecated. Use load and add instead to manage keeping a cache of the oberved, expected and variance values; the z- and p-methods will look them up, if available. See dump_vals in Statistics::Data for dumping series data using the present class object, which uses Statistics::Data as a base.

DIAGNOSTICS

Data for deviation ratio are incomplete

Croaked when loading or adding data. As the croak goes on to say, loading and adding (updating) needs arefs of data labelled observed, expected and variance. Also, if any one of them are loaded/updated at one time, it's expected that all three are loaded/updated. For more info about loading data, see Statistics::Data.

Cannot compute z-value: No defined or numerical '$_' value(s)

Croaked via zscore if the three required observed, expected and variance values were not defined in the present call (each to a reference to an array of values, or with a single numerical value), or could not be accessed from a previous load/add. See access in Statistics::Data for any error that might have resulted from a bad load/add. See looks_like_number in Scalar::Util for any error that might have resulted from supplying a single value for observed, expected or variance.

Cannot compute p-value: Argument 'tails' should have value of either 1 or 2, not '$tails'

Croaked when calling p_value directly or via zscore, or when calling p2z, and any given value for tails is not appropriate.

Cannot compute z-value from p-value

Croaked by p2z if its value attribute is not defined, is empty string, is not numeric, or, if numeric, is greater than 1 or less than zero, as per all_proportions in Statistics::Data.

Cannot set tails() option: value must be numeric and equal either 1 or 2, not '$_'

Croaked from tails method; self-explanatory.

Could not print statistical values

Croaked by the internal dump method if, for some reason, printing to STDOUT is not available.

DEPENDENCIES

Math::Cephes - ndtr and ndtri normal distribution functions

Statistics::Lite - sum method

String::Util - hascontent and nocontent methods

Scalar::Util - looks_like_number method

Statistics::Data - this module uses the latter as a base, for its loading/adding data methods (if required), and a p2z validity check.

SEE ALSO

Statistics::Sequences : for application of this module.

AUTHOR

Roderick Garton, <rgarton at cpan.org>

LICENSE AND COPYRIGHT

Copyright 2006-2014 Roderick Garton.

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License. See perl.org for more information.