The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

obogaf::parser

SYNOPSIS

use obogaf::parser;

my ($graph, $subonto, $stat, $res);

$graph= build_edges(obofile);

$subonto= build_subonto(edgesfile, namespace);

$stat= make_stat(edgesfile, parentIndex, childIndex);

($res, $stat)= gene2biofun(annfile, geneIndex, classIndex);

($res, $stat)= map_OBOterm_between_release(obofile, annfile, classIndex);

DESCRIPTION

obogaf::parser is a perl5 module desinged to handle obo and gene association file.

1. build_edges: extract edges from an obo file.
    my $graph= build_edges(obofile);

obofile: any obo file listed in OBO foundry. The file extension must be ".obo".

output: the graph is returned as tuple: subdomain <tab> source <tab> destination <tab> relationship. This means that the graph is returned as a list of edges, where each edge is represented as a pair of vertices in the form source <tab> destination. For each couple of nodes, the subdomain (if any) and the relationships for which is safe group annotations (i.e. is_a and part_of) are returned as well. The graph is stored as an anonymous scalar.

2. build_subonto: extract edges of a specified sub-ontology domain.
    my $subonto= build_subonto(edgesfile, namespace);

edgesfile: a graph in the form: subdomain <tab> source <tab> destination <tab> relationship. This file can be obtained by calling the subroutine build_edges.

namespace: name of the subontology for which the edges must be extracted.

output: the graph is returned as a tuple>: source <tab> destination <tab> relationship. In other words the graph is returned as a list of edges, where each edge is represented as a pair of vertices in the form source <tab> destination. For each couple of nodes the relationships is_a and part_of are also returned. The graph is stored as an anonymous scalar.

3. make_stat: make basic statistic on graph.
    my $stat= make_stat(edgesfile, parentIndex, childIndex);

edgesfile: a graph represented as a list of edges, where each edge is stored as a pair of vertices <tab> separated. This file can be obtained by calling the subroutine build_edges.

parentIndex: index referring to the column containing the parent (source) vertices in edgesfile file.

childIndex: index referring to the column containing the child vertices (destination) in the edgesfile file.

output: statistics about the graph are printed on the shell. More precisely, for each vertex of the graph degree, in-degree and out-degree are printed. The vertex are sorted in a decreasing order on the basis of degree, from the higher degree to the smaller degree. Finally, the following statistics are returned as well: 1) number of nodes and edges of the graph; 2) maximum and minimum degree; 3) average and median degree; 4) density of the graph.

4. gene2biofun: make annotations adjacency list.
    my ($res, $stat)= gene2biofun(annfile, geneIndex, classIndex);

annfile: an annotations file. The file extension can be either plain format (".txt") or compressed (".gz"). An example of the format of this file can be taken from GOA website (file with ".gaf.gz" extension) or HPO website. More in general any file structured as those aforementioned can be used (basically a ".csv" file using <tab> as separator).

geneIndex: index referring to the column containing the samples (genes/proteins).

classIndex: index referring to the column containing the ontology terms.

output: a list of two anonymous references. The first is an anonymous hash storing for each gene (or protein) all the associated ontology terms (pipe separated). The second is an anonymous scalar containing basic statistics, such as the total unique number of genes/proteins and annotated ontology terms.

5. map_OBOterm_between_release: map ontology terms between different releases.
    my ($res, $stat)= map_OBOterm_between_release(obofile, annfile, classIndex);

obofile: an obo file (a new release). This file is used to make the alt_id - id pairing, by using alt_id as key. The file extension must be ".obo".

annfile: an annotation file (an old release). The file extension can be either plain format (".txt") or compressed (".gz").

classIndex: index referring to the column of the annfile containing the ontology terms to be mapped.

output: a list of two anonymous references. The first is an anonymous scalar storing the annotations file in the same format of the input file but with the obsolete ontology terms replaced with the updated ones. The second reference is an anonymous scalar containing some basic statistics, such as the total unique number of ontology terms and the total number of mapped and not mapped altID ontology terms. Finally, all the found pairs alt_id - id are returned (if any).

BUGS

Please report any bugs here.

COPYRIGHT

Copyright (C) 2019 Marco Notaro, all rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl 5 programming language system itself.

This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose.

AUTHOR

Marco Notaro (https://marconotaro.github.io)

SEE ALSO

A step-by-step tutorial showing how to apply obogaf::parser to real case studies in Computational Biology and Precision Medicine is situated at the following link https://github.com/marconotaro/obogaf-parser.