The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Statistics::Data::Rank - Utilities for ranking data

VERSION

This is documentation for Version 0.02, released February 2015.

SYNOPSIS

 use Statistics::Data::Rank;
 my $rank = Statistics::Data::Rank->new();
 my %vars = ('nodrug' => [174, 224, 260], 'placebo' => [261, 213, 231], 'morphine' => [199, 143, 113]);
 my $ranks_href = $rankd->ranks_between(data => \%vars); # pre-load data:
 $rankd->load(\%vars); 
 $ranks_href = $rankd->ranks_within();
 my $sor = $rankd->sum_of_ranks_within(); # or _between()
 # or specify which vars to rank/sum-rank:
 $sor = $rankd->sum_of_ranks_within(lab => [qw/placebo morphine/]);

DESCRIPTION

Performs ranking of nammed data, either by an independent, between-variable method (as in Kruskall-Wallis test), or a dependent, cross-variable method (as in Friedman test). Methods return hash of ranks and sum-of-ranks. Data must be pre-loaded (as per Statistics::Data or sent to the methods with the argument data as a hash-ref of array-refs. Output is tested ahead of installation to ensure it matches published data (Siegal, 1956).

SUBROUTINES/METHODS

new

 $rankd = Statistics::Data->new();

Constructor, expecting/accepting no args. Inherited from Statistics::Data.

load, add, unload

 $rankd->load('a' => [1, 4], 'b' => [3, 7]);

The given data can now be used by any of the following methods. This is inherited from Statistics::Data, and all its other methods are available here via the class object. Only passing of data as a hash of arrays (HOA) is supported for now. Alternatively, give each of the following methods the HOA for the optional named argument data.

ranks_between

 $ranks_href = $rankd->ranks_between(data => $values_href);
 $ranks_href = $rankd->ranks_between(lab => [qw/fez bop/]); # two, say, of previously loaded data
 $ranks_href = $rankd->ranks_between(); # all of any previously loaded data
 ($ranks_href, $ties_aref, $nties) = $rankd->ranks_between(data => $values_href);

Given a hash of arefs where the keys are names (groups, treatments) of the sample data (each as an aref), return a hash of the ranks of each value under each name, after pooling all the data and ranking them with a link to their name. Ties are resolved by giving each tied score the mean of the ranks for which it is tied (see Siegal, 1956, p. 188ff). If called in list context, then a reference to an array of the number of variables having the same value per its rank, and a scalar for the number of ties, are also returned. Before ranking, data are checked for numeracy, and any non-numeric or empty values are culled.

Used, e.g., by Kruskal-Wallis ANOVA, Jonckheere-Terpstra ANOVA, Dwass-Steel comparison, and Worsley-cluster tests.

ranks_within

 $ranks_href = $rankd->ranks_within(data => $values_href); # pass data now
 $ranks_href = $rankd->ranks_within(); # using all of any previously loaded data
 ($ranks_href, $ties_href) = $rankd->ranks_within();

Given a hash of arefs where the keys are variable names, and the values are their actual sample data (each as an aref), returns a hash of the ranks of each value under each name, calculated dependently (per the values across individual indices). So if 'a' => [1, 3, 7] and 'b' => [4, 5, 6], the ranks returned will be 'a' => [1, 2, 6] and 'b' => [3, 4, 5]. Ties are resolved by giving each tied score the mean of the ranks for which it is tied (see Siegal, 1956, p. 188ff). If called in list context, then a reference to hash of aref is also returned, giving the number of variables having the same value at each index for a rank. Before ranking, data are checked for numeracy, and any non-numeric or empty values are culled.

Used, e.g., by Friedman and Page tests.

sum_of_ranks_between

 $sor = $rankd->sum_of_ranks_between(); # all pre-loaded data
 $sor = $rankd->sum_of_ranks_between(data => HASHREF); # or using these data
 $sor = $rankd->sum_of_ranks_between(lab => STRING); # or for a particular load

Returns the sum of ranks for (1) the entire dataset, either as given in argument data, or all pre-loaded variables; or for a particular pre-loaded dataset (variable) as given in the named argument lab, where (assuming more than one variable), all values have been pooled and ordered by value per variable.

sum_of_ranks_within

 $sor = $rankd->sum_of_ranks_within(); # all pre-loaded data
 $sor = $rankd->sum_of_ranks_within(data => HASHREF); # or using these data
 $sor = $rankd->sum_of_ranks_within(lab => STRING); # or for a particular load

If called in array context, the sum-href is returned followed by the href of ties (useful for some statistic). Otherwise, it returns the href of summed ranks. The sum for a particular named variable can also be returned by the argument lab.

sumsq_ranks_within

Returns the sum of the squared sums-of-ranks calculated dependently (per the values across individual indices). Used in Friedman ANOVA. Expects a hashref of the variables, keyed by name. Called in list context, also returns a hash of the tied ranks.

DEPENDENCIES

List::AllUtils : used for summing.

Statistics::Data : used as base.

Statistics::Lite : for basic decriptives.

String::Util : string content checking.

DIAGNOSTICS

Variable data must be numeric and not empty

croaked ahead of calculating (sum of) ranks between or within and there is no hashref of data available.

Named variable does not exist

croaked by sum_of_ranks_between and sum_of_ranks_within if the value of the optional argument lab does not exist as pre-loaded data; either in a call to load or add, or as data in the present method.

REFERENCES

Siegal, S. (1956). Nonparametric statistics for the behavioral sciences. New York, NY, US: McGraw-Hill

AUTHOR

Roderick Garton, <rgarton at cpan.org>

BUGS AND LIMITATIONS

Please report any bugs or feature requests to bug-statistics-data-rank-0.02 at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Statistics-Data-Rank-0.02. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc Statistics::Data::Rank

You can also look for information at:

ACKNOWLEDGEMENTS

Statistics::RankCorrelation : loop for dealing with ties in calculating "ranks within" adapted from Boggs' "rank" function.

LICENSE AND COPYRIGHT

Copyright 2015 Roderick Garton.

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

See http://dev.perl.org/licenses/ for more information.