Endre Sebestyen
and 1 contributors

NAME

Bio::DOOP::Util::Run::GeneMerge - GeneMerge based GO analyzer

VERSION

Version 0.02

SYNOPSIS

  #!/usr/bin/perl -w

  use Bio::DOOP::DOOP;

  $test = Bio::DOOP::Util::Run::GeneMerge->new();

  if ($test->getDescFile("GO/use/GO.BP.use") < 0){
     print"Desc error\n"
  }

  if ($test->getAssocFile("GO/assoc/A_thaliana.converted.BP") < 0){
     print"Assoc error\n"
  }

  if ($test->getPopFile("GO/pop.500") < 0){
     print"Pop error\n"
  }

  if ($test->getStudyFile("GO/study.500/combined1314.list") < 0){
     print"Study error\n"
  }

  $results = $test->getResults();

  foreach $res (@{$results}) {
     print $$res{'GOterm'}," ",$$res{'RawEs'},"\n";
  }

DESCRIPTION

This is a module based on GeneMerge v1.2.

Original program described in:

Cristian I. Castillo-Davis and Daniel L. Hartl GeneMerge - post-genomic analysis, data mining, and hypothesis testing Bioinformatics Vol. 19 no. 7 2003, Pages 891-892

The original program is not really good for large scale analysis, because the design uses a lot of I/O processes. This version fetches everything into memory at start.

AUTHORS

Tibor Nagy, Godollo, Endre Sebestyen, Martonvasar,

METHODS

new

Create new GeneMerge object.

   $genemerge = Bio::DOOP::Util::Run::GeneMerge->new;

getAssocFile

The method loads the GO association file and stores it in memory. The file format is the following. Each line starts with a cluster id, and after some whitespace the associated GO ids are enumerated, separated by semicolons.

81001020 GO:0016020;GO:0003674;GO:0008150 81001110 GO:0005739;GO:0003674

   $genemerge->getAssocFile('/tmp/assoc.txt');

getPopFile

The method loads the population file and stores it in memory. The file format is the following. Each line contains one and only one cluster id.

81001020 81001110

   $genemerge->getPopFile('/tmp/pop.txt');

popFreq

The method calculates the population frequency. Do not use it directly.

getDescFile

The method loads the GO description file. The file format is the following. Each line starts with the GO id, and separated by a tab, the description of the GO id.

GO:0000007 low-affinity zinc ion transporter activity GO:0000008 thioredoxin

   $genemerge->getDescFile('/tmp/desc.txt');

getStudyFile

The method loads the study data set, counts GO frequencies, calculates P values based on the hypergeometric distribution, and corrects P values, based on the Bonferroni method.

The file format of the study file is the following. Each line contains one and only one cluster id.

81001020 81001110

   $genemerge->getStudyFile('/tmp/study.txt');

getResults

The method gives back all the results as an arrayref of hashes.

  $results = $genemerge->getResults();
  foreach $result (@{$results}) {
    $goterm       = $$result{'GOterm'};
    $popfreq      = $$result{'PopFreq'};
    $popfrac      = $$result{'PopFrac'};
    $studyfrac    = $$result{'StudyFrac'};
    $studyfracall = $$result{'StudyFracAll'};
    $raw_escore   = $$result{'RawEs'};
    $escore       = $$result{'EScore'};
    $desc         = $$result{'Desc'};
    @contrib      = @{$$result{'Contrib'}};
  }

hypergeometric

This is an internal function to calculate the hypergeometric distribution. Do not use it directly.

logNchooseK

Another internal function for the correct statistical results. Do not use it directly.

lFactorial

Factorial calculating function. Do not use it directly.