The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Statistics::Multtest - Control false discovery rate in multiple test problem

SYNOPSIS

  use Statistics::Multtest qw(bonferroni holm hommel hochberg BH BY qvalue);
  use Statistics::Multtest qw(:all);
  use strict;
  
  my $p;
  # p-values can be stored in an array by reference
  $p = [0.01, 0.02, 0.05,0.41,0.16,0.51];
  # @$res has the same order as @$p
  my $res = BH($p);
  print join "\n", @$res;
  
  # p-values can also be stored in a hash by reference
  $p = {"a" => 0.01,
        "b" => 0.02,
        "c" => 0.05,
        "d" => 0.41,
        "e" => 0.16,
        "f" => 0.51 };
  # $res is also a hash reference which is the same as $p
  $res = holm($p);
  foreach (sort {$res->{a} <=> $res->{$b}} keys %$res) {
      print "$_ => $res->{$_}\n";
  }
  
  # since qvalue does not always run successfully,
  # it should be embeded in 'eval'
  $res = eval 'qvalue($p)';
  if($@) {
      print $@;
  }
  else {
      print join "\n", @$res;
  }

DESCRIPTION

For statistical test, p-value is the probability of false positives. While there are many hypothesis for testing simultaneously, the probability of getting at least one false positive would be large. Therefore the origin p-values should be adjusted to decrease the false discovery rate.

Seven procedures to controlling false positive rates is provided. The names of the methods are derived from p.adjust in stat package and qvalue in qvalue package (http://www.bioconductor.org/packages/release/bioc/html/qvalue.html) in R. Code is translated directly from R to Perl using List::Vectorize module.

All seven subroutines receive one argument which can either be an array reference or a hash reference, and return the adjusted p-values in corresponding data structure. The order of items in the array does not change after the adjustment.

Subroutines

bonferroni($pvalue)

Bonferroni single-step process.

hommel($pvalue)

Hommel singlewise process.

Hommel, G. (1988). A stagewise rejective multiple test procedure based on a modified Bonferroni test. Biometrika, 75, 383¨C386.

holm($pvalue)

Holm step-down process.

Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6, 65¨C70.

hochberg($pvalue)

Hochberg step-up process.

Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple tests of significance. Biometrika, 75, 800¨C803.

BH($pvalue)

Benjamini and Hochberg, controlling the FDR.

Benjamini, Y., and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B, 57, 289¨C300.

BY($pvalue)

Use Benjamini and Yekutieli.

Benjamini, Y., and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics 29, 1165¨C1188.

qvalue($pvalue, %setup)

Storey and Tibshirani.

Storey JD and Tibshirani R. (2003) Statistical significance for genome-wide experiments. Proceedings of the National Academy of Sciences, 100: 9440-9445.

The default method for estimating pi0 in the origin qvalue package is to utilize cubic spline interpolation. However, there is no suitable perl module to do such work (external libraries should be installed if using Math::GSL::Spline and there seems to be some mistakes when I using Math::Spline). Therefore, in this module, we only provide 'bootstrap' method to estimate pi0, which is also the second pi0 method in qvalue package. Some arguments which are the same in qvalue package can be set in %setup as follows.

  lambda => multiply(seq(0, 90, 5), # The value of the tuning parameter
                     0.01),         # to estimate pi_0. Must be in [0,1).
                                    # It should be an array reference
  robust => 0, # An indicator of whether it is desired to make the estimate 
               # more robust for small p-values and a direct finite sample
               # estimate of pFDR

For details, please see the Storey (2003) and the qvalue document in R.

NOTE: The results of this subroutine are not always exactly consistent to the qvalue package due to the floating point number calculation.

In some circumstance, the estimated pi0 <= 0, and the subroutine would throw an error. (try p-value list: [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]). So you should embeded this subroutine in 'eval' such as:

  my $qvalue;
  eval '$qvalue = qvalue($pvalue, %setup)';
  if($@) {
    # do something
  }
  else {
      print join ", ", @$qvalue;
  }
               

AUTHOR

Zuguang Gu <jokergoo@gmail.com>

COPYRIGHT AND LICENSE

Copyright 2012 by Zuguang Gu

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.12.1 or, at your option, any later version of Perl 5 you may have available.

SEE ALSO

List::Vectorize

1 POD Error

The following errors were encountered while parsing the POD:

Around line 460:

Non-ASCII character seen before =encoding in '383¨C386.'. Assuming CP1252