analyze.pl - batch processor to find terms for lists of genes in various files
This program takes a list of files, each of which contain a list of genes, with one gene per line. It will findTerms for the lists of genes in each of the GO aspects, outputting the results to a file named for the original file, but with a .terms extension. It will only output terms with a corrected P-value of <= 0.05.
It will use the first supplied argument as the annotation file, the second argument as the expected number of genes within the organism, the third argument is the name of the obo file, and all subsequent files as ones containing lists of genes.
Usage:
analyze.pl <annotation_file> <numGenes> <obofile> <file1> <file2> <file3> ... <fileN>
e.g.
analyze.pl ../t/gene_association.sgd 7200 ../t/gene_ontology_edit.obo genes.txt genes2.txt
An example output file might look like this:
The following gene(s) will be considered: YDL235C YPD1 YDL224C WHI4 YDL225W SHS1 YDL226C GCS1 YDL227C HO YDL228C YDL228C YDL229W SSB1 YDL230W PTP1 YDL231C BRE4 YDL232W OST4 YDL233W YDL233W YDL234C GYP7 Finding terms for P Finding terms for C Finding terms for F -- 1 of 15-- GOID GO:0005096 TERM GTPase activator activity CORRECTED P-VALUE 0.0113038452336839 UNCORRECTED P-VALUE 0.00113038452336839 NUM_ANNOTATIONS 2 of 12 in the list, vs 31 of 7272 in the genome The genes annotated to this node are: YDL234C, YDL226C -- 2 of 15-- GOID GO:0008047 TERM enzyme activator activity CORRECTED P-VALUE 0.0316194107645226 UNCORRECTED P-VALUE 0.00316194107645226 NUM_ANNOTATIONS 2 of 12 in the list, vs 52 of 7272 in the genome The genes annotated to this node are: YDL234C, YDL226C -- 3 of 15-- GOID GO:0005083 TERM small GTPase regulatory/interacting protein activity CORRECTED P-VALUE 0.0340606972468798 UNCORRECTED P-VALUE 0.00340606972468798 NUM_ANNOTATIONS 2 of 12 in the list, vs 54 of 7272 in the genome The genes annotated to this node are: YDL234C, YDL226C -- 4 of 15-- GOID GO:0030695 TERM GTPase regulator activity CORRECTED P-VALUE 0.0475469908576535 UNCORRECTED P-VALUE 0.00475469908576535 NUM_ANNOTATIONS 2 of 12 in the list, vs 64 of 7272 in the genome The genes annotated to this node are: YDL234C, YDL226C
Gavin Sherlock, sherlock@genome.stanford.edu
To install GO::TermFinder, copy and paste the appropriate command in to your terminal.
cpanm
cpanm GO::TermFinder
CPAN shell
perl -MCPAN -e shell install GO::TermFinder
For more information on module installation, please visit the detailed CPAN module installation guide.