The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

go-db-reasoner.pl

SYNOPSIS

  go-db-reasoner.pl -d mygo -h 127.0.0.1

DESCRIPTION

This script builds the "graph_path" table in the GO Database.

Previously, graph_path was constructed as a closure of the GO graph, ignoring edge labels (relations)

This script takes into account the formal semantics of the properties of relations. Pairs of relations are not traversed unless they explictly compose together. This removes erroneous inferences.

This script works in two steps

First a relation composition table is constructed
Then a forward chaining reasoner is executed, iteratively finding all inferred relations

Completion of relation composition table

As a first step, the script completes the "relation_composition" table. This table may already be pre-populated by normal GO loading if the ontology contains "transitive_over" or "holds_over_chain_tags".

For example, in the Gene Ontology, the "regulates" relation has the property of being transitive_over part_of. This means the relation_composition table will be pre-populated with:

  R1        | R2       | INFERRED
  ----------+----------+----------
  regulates . part_of -> regulates

Transitivity

If R is_transitive, then the following composition is added:

  R . R -> R

Transitivity over and under is_a

If R is an all-some relation, then the following compositions are added:

  R . is_a -> R
  is_a . R -> R

Sub-relations

If Ra is a sub-relation (direct or transitive) of Rb then:

Add a composition Ra . R2 -> R for every composition Rb . R2 -> R
Add a composition R1 . Ra -> R for every composition R1 . Rb -> R

Forward chaining

After the relation_composition table is fully populated, the reasoner will attempt to apply compositions to derived new inferred relations (i.e. entries in graph_path). As a first step, the term2term table is copied in to graph_path

For example, given the following asserted links in term2term:

 A regulates B
 B is_a C
 C is_a D
 D part_of E
 E regulates F

And the relation compositions:

 1. regulates . is_a -> regulates
 2. is_a . part_of -> part_of

The reasoner will infer a graph_path for "A" as follows:

pass 1 (using regulates . is_a)
 A regulates B
 B is_a C
 ---
 A regulates C
pass 2 (using regulates . is_a)
 A regulates C
 C is_a D
 ---
 A regulates D
 
pass 2 (using regulates . part_of)
 A regulates D
 D part_of E
 ---
 A regulates E
 

Note that in this example regulates is not declared transitive, so no path is inferred between A and F

Compositions are applied repeatedly until graph_path is saturated, and no new inferences can be made

Distances

Using the old population method "graph_path.distance" was a count of the number of "hops" along the asserted graph the path takes. This meaning is retained with go-db-reasoner.pl (note that now redundant paths are not calculated)

Now, in addition there is a new table "graph_path.relation_distance". This is the number of steps using the specified relationship type. For example, in the example above, [A regulates E] has distance=4 and relation_distance=1

SEE ALSO

http://wiki.geneontology.org/index.php/Transitive_closure

http://wiki.geneontology.org/index.php/Relation_composition

http://www.geneontology.org/GO.database.schema.shtml#go-optimisations.table.graph-path