The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Statistics::Table::F - Perl module for computing the statistical F-ratio

SYNOPSIS

  use Statistics::Table::F;

  if (($F = anova($list_of_lists)) >= F(@$list_of_lists - 1,
                                        count_elements($list_of_lists - @$list_of_lists),
                                        0.05)) {
      print "F is $F; the difference between your data sets is significant.\n";
  } else {
      print "F is $F; the difference between your data sets is not significant.\n";
  }
                                                       

DESCRIPTION

See Orwant, Hietaniemi, and Macdonald, Mastering Algorithms in Perl, O'Reilly 1999. From Chapter 15:

The significance tests covered so far can only pit one group against another. Sure, we could do a t-test of every possible pair of web design firms, but we'd have trouble integrating the results.

An analysis of variance, or ANOVA, is necessary when you need to consider not just the variance of one data set but the variance between data sets. The sign, ChiSquare, and t-tests all involved computing intrasample descriptive statistics; we'd speak of the means and variances of individual samples. Now we can jump up a level of abstraction and start thinking of entire data sets as elements in a larger data set -- a data set of data sets.

For our test of web designs, our null hypothesis is that the design has no effect on the size of the average sale. Our alternative is simply that some design is different from the rest. This isn't a very strong statement; we'd like a little matrix that show us how each design compares to one another and to no design at all. Unfortunately, ANOVA can't do that.

The key to the particular analysis of variance we'll study here, a one-way ANOVA, is computing the F-ratio. The F-ratio is defined as the mean square between (the variance between the means of each data set) divided by the mean square within (the mean of the variance estimates). This is the most complex significance test we've seen so far. Here's a Perl program that computes the analysis of variance for all four designs. Note that since ANOVA is ideal for multiple data sets with varying numbers of elements, we choose a data structure to reflect that: $designs, a list of lists.

AUTHOR

Jon Orwant, orwant@media.mit.edu

SEE ALSO

Statistics::ChiSquare, Statistics::Table::t