 NAME
 SYNOPSIS
 DESCRIPTION
 METHODS
 EXAMPLE
 REFERENCES
 SEE ALSO
 LIMITATIONS/TO DO
 REVISION HISTORY
 AUTHOR/LICENSE
NAME
Statistics::FisherPitman  Randomizationbased alternative to oneway independent groups ANOVA; unequal variances okay
SYNOPSIS
use Statistics::FisherPitman 0.034;
my @dat1 = (qw/12 12 14 15 12 11 15/);
my @dat2 = (qw/13 14 18 19 22 21 26/);
my $fishpit = Statistics::FisherPitman>new();
$fishpit>load({d1 => \@dat1, d2 => \@dat2});
# Oh, more data just came in:
my @dat3 = (qw/11 7 7 2 19 19/);
$fishpit>add({d3 => \@dat3});
my $T = $fishpit>t_value();
# now go to monte carlo to get a p for your T
# or get a t_value and p_value in one by randomization test:
$fishpit>p_value(resamplings => 1000)>dump(title => "A test");
DESCRIPTION
Tests for a difference between independent samples. It is commonly recommended as an alternative to the oneway independent groups ANOVA when variances are unequal, as its teststatistic, T, is not dependent on an estimate of variance. As a randomization test, it is "distributionfree", with the probability of obtaining the observed value of T being derived from the data themselves.
METHODS
new
$fishpit = Statistics::FisherPitman>new()
Class constructor; expects nothing.
load
$fishpit>load('aname', @data1)
$fishpit>load('aname', \@data1)
$fishpit>load({'aname' => \@data1, 'another_name' => \@data2})
Alias: load_data
Accepts either (1) a single name => value
pair of a sample name, and a list (referenced or not) of data; or (2) a hash reference of named array references of data. The data are loaded into the class object by name, within a hash named data
, as Statistics::Descriptive::Full objects. So you can easily get at any descriptives for the groups you've loaded  e.g., $fishpit>{'data'}>{'aname'}>mean()  or you could get at the data again by going $fishpit>{'data'}>{'aname'}>get_data(); and so on. The names of the data are up to you.
Each call unloads any previous loads.
Returns the Statistics::FisherPitman object.
add
$fishpit>add('another_name', @data2)
$fishpit>add('another_name', \@data2)
$fishpit>add({'another_name' => \@data2})
Alias: add_data
Same as load except that any previous loads are not unloaded.
unload
$fishpit>unload();
Empties all cached data and calculations upon them, ensuring these will not be used for testing. This will be automatically called with each new load, but, to take care of any development, it could be good practice to call it yourself whenever switching from one dataset for testing to another.
t_value
$fishpit>t_value()
Returns a FisherPitman Tvalue for the loaded data, and lumps the value into the class object for the key t_value.
T is calculated as follows:
g
T = SUM n_{i} x_{i}²
i = 1
which pertains to the number of observations in each i of g samples, and
n_{i}
x_{i} = 1/n_{i} SUM x_{ij}
j = 1
(for each j observation in the i sample).
p_value
$fishpit>p_value(resamplings => 'nonnegative number')
Alias: test
With a positive value for resamplings, the loaded data will be shuffled so many times, and the Tvalue calculated for each resampling. The proportion of Tvalues in these resamplings that are greater than or equal to the Tvalue of the original data, as loaded, is the p_value for basing significance considerations upon.
Randomization test is simply based on pooling all the data and, for each resampling, giving them a FisherYates shuffle, and distributing them to so many groups, of so many samplesizes, as in the original dataset.
The class object is fed the values for t_value
and p_value
. Confidence interval (95%) of the true proportion (pvalue) is also calculated and stored as a twoelement array named conf_int
. The method returns only itself. So you can get at these values thus:
print "T = $fishpit>{'t_value'}, p = $fishpit>{'p_value'}\n";
print '95% confidence interval for the proportion of Ts greater than or equal to the observed value ranges from ';
print "$fishpit>{'conf_int'}>[0] to $fishpit>{'conf_int'}>[1].\n";
dump
$fishpit>dump(title => 'A test of something', conf_int => 10, precision_p => integer)
Prints a line to STDOUT of the form T = t_value, p = p_value. Above this string, a title can also be printed, by giving a value to the optional title argument. The 95% confidence interval, and the precision of the pvalue(s), can also be optionally dumped, as above. Ends with a linebreak, i.e., "\n".
string
$fishpit>string(conf_int => 10, precision_p => integer)
Returns a line of the form T = t_value, p = p_value, to the precision specified (if any), and, optionally, with the confidenceinterval for the pvalue appended.
EXAMPLE
This example is taken from Berry & Mielke (2002); see ex/fishpit.pl
in the installation dist for implementation. The following (real) data are lead (Pb) values (in mg/kg) of soil samples from two districts in New Orleans, one from school grounds, another from surrounding streets. Was there a significant difference in lead levels between the samples? The variances were determined to be unequal, and the FisherPitman test put to the question. As there were over 100 billion possible permutations of the data, a large number of resamplings was used: 10 million.
The following shows how the test would be performed with the present module; using a smaller number of resamplings produces much the same result. A test of equality of variances is also shown.
my $data = {
dist1 => [qw/16.0 34.3 34.6 57.6 63.1 88.2 94.2 111.8 112.1 139.0 165.6 176.7 216.2 221.1 276.7 362.8 373.4 387.1 442.2 706.0/],
dist2 => [qw/4.7 10.8 35.7 53.1 75.6 105.5 200.4 212.8 212.9 215.2 257.6 347.4 461.9 566.0 984.0 1040.0 1306.0 1908.0 3559.0 21679.0/],
};
# First test equality of variances:
require Statistics::ANOVA;
my $anova = Statistics::ANOVA>new();
$anova>load_data($data);
$anova>levene_test()>dump();
# This prints: F(1, 38) = 4.87100593921132, p = 0.0344251996755789
# As this suggests significantly different variances ...
require Statistics::FisherPitman;
my $fishpit = Statistics::FisherPitman>new();
$fishpit>load_data($data);
$fishpit>test(resamplings => 10000)>dump(conf_int => 1, precision_p => 3);
# This prints, e.g.: T = 56062045.0525, p = 0.014 (95% CI: 0.011, 0.016)
Hence a difference is indicated, which can be identified from the means. The data being cached as Statistics::Descriptive objects (see load), the means can be got at thus:
print "District 1 mean = ", $fishpit>{'data'}>{'dist1'}>mean(), "\n"; # 203.935
print "District 2 mean = ", $fishpit>{'data'}>{'dist2'}>mean(), "\n"; # 1661.78
So beware District 2, it seems. The module naturally produces the same Tvalue as reported by Berry and Mielke, and they obtained p = .0148 from their 10 million resamplings.
Pointing to the value of the test, Berry and Mielke also showed that common alternatives for the unequal variances situation  such as the pooled variance ttest for independent samples, and oneway ANOVA with logarithmic transformation of the data  failed to detect a significant difference between the samples; not a negligible failure given the social health implications.
REFERENCES
Berry, K. J., & Mielke, P. W., Jr., (2002). The FisherPitman permutation test: An attractive alternative to the F test. Psychological Reports, 90, 495502.
SEE ALSO
Statistics::ANOVA Firstly test your independent groups data with the Levene's or O'Brien's equality of variances test in this package to see if they satisfy assumptions of the ANOVA; if not, happily use FisherPitman instead.
LIMITATIONS/TO DO
Optimisation welcomed.
Do auto number of resamplings based on N possible permutations.
Randomization procedure can always be improved.
REVISION HISTORY
See CHANGES in installation distribution.
AUTHOR/LICENSE
 Copyright (c) 20062009 Roderick Garton

rgarton AT cpan DOT org
This program is free software. It may be used, redistributed and/or modified under the same terms as Perl5.6.1 (or later) (see http://www.perl.com/perl/misc/Artistic.html).
 Disclaimer

To the maximum extent permitted by applicable law, the author of this module disclaims all warranties, either express or implied, including but not limited to implied warranties of merchantability and fitness for a particular purpose, with regard to the software and the accompanying documentation.
This ends documentation for a Perl implementation of the FisherPitman permutation test alternative to oneway ANOVA.