Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/jokergoo/statistics-multtest

Control false discovery rate in multiple test
https://github.com/jokergoo/statistics-multtest

Last synced: 23 days ago
JSON representation

Control false discovery rate in multiple test

Awesome Lists containing this project

README

        

## Control false discovery rate in multiple test

### install

```
cpan -i Statistics::Multtest
```

### usage

```perl
use Statistics::Multtest qw(bonferroni holm hommel hochberg BH BY qvalue);
use Statistics::Multtest qw(:all);
use strict;

my $p;
# p-values can be stored in an array by reference
$p = [0.01, 0.02, 0.05,0.41,0.16,0.51];
# @$res has the same order as @$p
my $res = BH($p);
print join "\n", @$res;

# p-values can also be stored in a hash by reference
$p = {"a" => 0.01,
"b" => 0.02,
"c" => 0.05,
"d" => 0.41,
"e" => 0.16,
"f" => 0.51 };
# $res is also a hash reference which is the same as $p
$res = holm($p);
foreach (sort {$res->{a} <=> $res->{$b}} keys %$res) {
print "$_ => $res->{$_}\n";
}

# since qvalue does not always run successfully,
# it should be embeded in 'eval'
$res = eval 'qvalue($p)';
if($@) {
print $@;
}
else {
print join "\n", @$res;
}
```

### description

For statistical test, p-value is the probability of false positives. While there are many hypothesis for testing simultaneously, the probability of getting at least one false positive would be large. Therefore the origin p-values should be adjusted to decrease the false discovery rate.

Seven procedures to controlling false positive rates is provided. The names of the methods are derived from `p.adjust` in `stat` package and `qvalue` in `qvalue` package (http://www.bioconductor.org/packages/release/bioc/html/qvalue.html) in R. Code is translated directly from R to Perl using `List::Vectorize` module.

All seven subroutines receive one argument which can either be an array reference or a hash reference, and return the adjusted p-values in corresponding data structure. The order of items in the array does not change after the adjustment.

### subroutines

- `bonferroni($pvalue)`

Bonferroni single-step process.

- `hommel($pvalue)`

Hommel singlewise process.

Hommel, G. (1988). A stagewise rejective multiple test procedure based on a modified Bonferroni test. Biometrika, 75, 383¨C386.

- `holm($pvalue)`

Holm step-down process.

Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6, 65¨C70.

- `hochberg($pvalue)`

Hochberg step-up process.

Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple tests of significance. Biometrika, 75, 800¨C803.

- `BH($pvalue)`

Benjamini and Hochberg, controlling the FDR.

Benjamini, Y., and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B, 57, 289¨C300.

- `BY($pvalue)`

Use Benjamini and Yekutieli.

Benjamini, Y., and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics 29, 1165¨C1188.

- `qvalue($pvalue, %setup)`

Storey and Tibshirani.

Storey JD and Tibshirani R. (2003) Statistical significance for genome-wide experiments. Proceedings of the National Academy of Sciences, 100: 9440-9445.

The default method for estimating `pi0` in the origin `qvalue` package is to utilize cubic spline interpolation. However, there is no suitable perl module to do such work (external libraries should be installed if using `Math::GSL::Spline` and there seems to be some mistakes when I using `Math::Spline`). Therefore, in this module, we only provide 'bootstrap' method to estimate `pi0`, which is also the second `pi0` method in `qvalue` package. Some arguments which are the same in `qvalue` package can be set in `%setup` as follows.

```perl
lambda => multiply(seq(0, 90, 5), # The value of the tuning parameter
0.01), # to estimate pi_0. Must be in [0,1).
# It should be an array reference
robust => 0, # An indicator of whether it is desired to make the estimate
# more robust for small p-values and a direct finite sample
# estimate of pFDR
```

For details, please see the Storey (2003) and the qvalue document in R.

NOTE: The results of this subroutine are not always exactly consistent to the `qvalue` package due to the floating point number calculation.

In some circumstance, the estimated `pi0 <= 0`, and the subroutine would throw an error. (try p-value list: `[0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]`). So you should embeded this subroutine in 'eval' such as:

```perl
my $qvalue;
eval '$qvalue = qvalue($pvalue, %setup)';
if($@) {
# do something
}
else {
print join ", ", @$qvalue;
}
```