NAME

Statistics::ANOVA::KW - Kruskall-Wallis statistics and test (nonparametric independent analysis of variance by ranks for nominally grouped data)

VERSION

This is documentation for Version 0.01 of Statistics::ANOVA::KW.

SYNOPSIS

 use Statistics::ANOVA::KW;
 my $kw = Statistics::ANOVA::KW->new();
 $kw->load({1 => [2, 4, 6], 2 => [3, 3, 12], 3 => [5, 7, 11, 16]});
 my $h_value = $kw->h_value(); # default used to correct for ties
 my $p_value = $kw->chiprob_test(); # H taken as chi^2 distributed
 my ($h_value, $df, $count, $p_value_by_chi, $phi) = $kw->chiprob_test(); # same as above, called in array context
 my ($f_value, $df_b, $df_w, $p_value_by_f, $omega_sq) = $kw->fprob_test(); # F-equivalent value tests

 # or without pre-loading, and specify correct_ties as well:
 $h_value = $kw->h_value(data => {1 => [2, 4, 6], 2 => [5, 3, 12]}, correct_ties => 1);
 # or test only a subset of the loaded data:
 $h_value = $kw->h_value(lab => [1, 3]);

DESCRIPTION

Performs calculations for the Kruskal-Wallis one-way nonparametric analysis of variance by ranks. This is for (at least) ordinal-level measurements of two or more samples of a nominal/categorical variable with equality of variances across the samples. The test is unreliable for small number of observations per sample (conventionally, all samples should have more than five observations). See REFERENCES for more information, and discussions of the assumptions/interpretations, and pros/cons, of the test at laerd statistics (pro) and biostathandbook (con). Note that the Kruskall-Wallis test is often described as a test for three or more samples, in contrast to the Mann-Whitney test, which is restricted to two samples, but KW can also be used with only two samples: the absolute value of the z-value from a Mann-Whitney test equals the square-root of the KW statistic for two factors.

Data-loading and retrieval are as provided in Statistics::Data, on which this module's class object is based, so its other methods are available here.

Return values are tested on installation against published examples and output from other software (e.g., SPSS).

new

 $kw = Statistics::ANOVA::KW->new();

New object for accessing methods and storing results. This "isa" Statistics::Data object.

load, add, unload

 $kw->load('a' => [1, 4, 3.2], 'b' => [6.5, 6.5, 9], 'c' => [3, 7, 4.4]);

The given data can now be used by any of the following methods. This is inherited from Statistics::Data, and all its other methods are available here via the class object. Only passing of data as a hash of arrays (HOA) is supported for now. Once loaded, subsets of the loaded data can be tested by passing their names (or labels) in a referenced array to the argument lab in the following methods (as supported by Statistics::Data). Once loaded, any non-numeric values in the samples are culled ahead of running the following methods.

Alternatively, without pre-loading the data, directly give the following methods the HOA of data as the value for the optional named argument data.

h_value

 $h_value = $kw->h_value(data => \%data, correct_ties => 1);
 $h_value = $kw->h_value(); # assuming data have already been loaded, & default of TRUE for correct_ties

Returns the Kruskall-Wallis H statistic.

chiprob_test

 ($chi_value, $df, $count, $p_value, $phi) = $kw->chiprob_test(data => HOA, correct_ties => 1); # H as chi-square
 $p_value = $kw->chiprob_test(data => HOA, correct_ties => 1);
 $p_value = $kw->chiprob_test(); # assuming data have already been loaded, & default of TRUE for correct_ties

Performs the ANOVA and, assuming chi-square distribution of the Kruskall-Wallis H value, returns its value, its degrees-of-freedom, the total number of observations (N), its chi-square probability value, and phi-coefficient as an estimate of effect-size ( = square-root of (chi-square divided by N) ). Returns only the p-value if called in scalar context. Default value of optional argument correct_ties is 1.

chiprob_str

 $str = $kw->chiprob_str(data => HOA, correct_ties => 1);
 $str = $kw->chiprob_str(); # assuming data have already been loaded, & default of TRUE for correct_ties

Performs the same test as for chiprob_test but returns not an array but a string of the conventional reporting form, e.g., chi^2(df, N = total observations) = chi_value, p = p_value.

fprob_test

 ($f_value, $df_b, $df_w, $p_value, $es_omega) = $kw->fprob_test(data => HOA, correct_ties => BOOL);
 $p_value = $kw->fprob_test(data => HOA, correct_ties => BOOL);
 $p_value = $kw->fprob_test(); # assuming data have already been loaded, & default of TRUE for correct_ties

Performs the same test as above but transforms the chi-square value into an F-distributed value, returning an array comprised of (1) this F-estimate value, its (2) between- and (3) within-groups degrees-of-freedom, (4) the associated probability of the value per the F-distribution, and (5) an estimate of the effect-size statistic, (partial) omega-squared. The latter is returned only if Statistics::ANOVA::EffectSize is installed and available. Called in scalar context, only the F-estimated p-value is returned. The default value of the optional argument correct_ties is 1. This method has not been tested against sample/published data (not being provided in the usual software packages).

fprob_str

 $str = $kw->chiprob_str(data => HOA, correct_ties => BOOL);
 $str = $kw->chiprob_str(); # assuming data have already been loaded, using default of TRUE for correct_ties

Performs the same test as for fprob_test but returns not an array but a string of the conventional reporting form, e.g., F(df_b, df_w) = f_value, p = p_value (and also, then, an estimate of partial omega-squared, if available, see above).

DEPENDENCIES

List::AllUtils : used for summing.

Math::Cephes : used for probability functions.

Statistics::Data : used as base.

Statistics::Data::Rank : used to calculate between-group sum-of ranks.

REFERENCES

Hollander, M., & Wolfe, D. A. (1999). Nonparametric statistical methods. New York, NY, US: Wiley.

Rice, J. A. (1995). Mathematical statistics and data analysis. Belmont, CA, US: Duxbury.

Sarantakos, S. (1993). Social research. Melbourne, Australia: MacMillan.

Siegal, S. (1956). Nonparametric statistics for the behavioral sciences. New York, NY, US: McGraw-Hill

SEE ALSO

Statistics::ANOVA::JT : Also a nonparametric ANOVA by ranks for independent samples, but where the ordinality of the numerical labels of the sample names (the order of the groups) is taken into account.

Statistics::KruskallWallis : Returns the H-value and its chi-square p-value (only), and implements the Newman-Keuls test for pairwise comparison.

AUTHOR

Roderick Garton, <rgarton at cpan.org>

BUGS

Please report any bugs or feature requests to bug-statistics-anova-kw-0.01 at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Statistics-ANOVA-KW-0.01. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc Statistics::ANOVA::KW

You can also look for information at:

ACKNOWLEDGEMENTS

LICENSE AND COPYRIGHT

Copyright 2015 Roderick Garton.

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

See http://dev.perl.org/licenses/ for more information.