# NAME

Bio::ConnectDots::SimpleGraph

# SYNOPSIS

```
use SimpleGraph;
my $graph=new Bio::ConnectDots::SimpleGraph;
# read pairs of nodes from STDIN
while (<>) {
my($node1,$node2)=split;
$graph->add_edge($node1,$node2);
}
my @nodes=graph->nodes; # get list of nodes
my @edges=graph->edges; # get list of edges
for each $node (@nodes) {
my @neighbors=$node->neighbors; # get list of neighboring nodes
}
```

# DESCRIPTION

This is a simple, hopefully fast undirected graph package. The only reason this exists is that the standard CPAN Graph pacakge, Graph::Base, is seriously broken. The package implements a small and eclectic assortment of standard graph algorithms that we happened to need for our applications.

This module is a subclass of Class::AutoClass (available at CPAN). AutoClass auotgenerates simple accessor and mutator methods (aka get and set methods). It also automates class initialization.

Nodes can be any Perl values, including object references. Edges are pairs of nodes.

(Caveat: be careful with values that contain embedded instances of $; (the character Perl uses to separate components of multi-dimensional subscripts), because we use this in the text representation of edges.

The main data structures are:

```
An edge (x,y) is represented canonically as a two element list in
which the lexically smaller value is first. Eg, the node ('b','a')
is represented as ['a','b'].
The graph contains
1) A hash mapping the text representation of a node to the node
itself. This is mostly relevant when the node is a reference.
2) A hash mapping the text representation of a node to a list of the node's neighbors.
3) A hash mapping the text representation of an edge to the edge itself.
```

# KNOWN BUGS AND CAVEATS

This is still a work in progress.

# AUTHOR - Nat Goodman

Email natg@shore.net

# COPYRIGHT

Copyright (c) 2003 Institute for Systems Biology (ISB). All Rights Reserved. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

# APPENDIX

## Conventions for nodes and edges

A node can be any Perl values, including an object, ARRAY, or HASH reference. When nodes are references, the software often works with the text representaion of the reference, ie, what you get if you print the reference. This can be confusing. Sorry. For example if a node is the HASH

` {name=>'caspase-9',symbol=>'CASP9'}`

The text representation would be something like

` HASH(0x804c830)`

When nodes are scalar values, eg, a string, the value and the text representation are the same. This is a common case in test programs and examples, but less common in real applications.

An edge is represented internally as an ARRAY ref of two nodes, in which the lexically smaller value is first. Actually, the first node is the one whose text representation is lexically smaller.

When passing edges as arguments to SimpleGraph methods, the edge can be represented in several ways.

```
1) An ARRAY ref of the nodes, eg, ['a','b'].
2) A list of the two nodes, eg, ('a','b')
3) Form (1) or (2) using the text represention of the node instead of the node itself
```

You needn't worry about which node is lexically smaller. SimpleGraph performs this calculation internally.

When SimpleGraph returns edges as results, they are always in form (1), ie, as ARRAY refs of nodes in correct lexical order.

## General conventions for methods

When methods return lists, we generally check the context (via wantarray) and return an ARRSY or ARRAY ref as appropriate. We're not 100% consistent in this (sorry), so check the code if you have doubts.

We often define singular and plural forms of methods, eg, node and nodes. These differ in how they behave in a scalar context. The singular form assumes you want one answer and returns that, while the plural form assumes you want a list of answers are returns it as an ARRAY ref. We're not 100% consistent in this (sorry), so check the code if you have doubts.

The rest of the documentation describes the methods.

## Constructors

```
Title : new (inherited from Class::AutoClass)
Usage : my $graph=new Bio::ConnectDots::SimpleGraph;
Function: Create new Bio::ConnectDots::SimpleGraph object
Returns : Newly created object
Args : (optional)
nodes=>ARRAY of nodes, eg, ['a','b','c']
edges=>ARRAY of edges, see add_edges for details
```

## Basic node and edge operations

```
Title : add_nodes, add_node
Usage : $graph->add_nodes('a','b');
$graph->add_node('a')) {
Function: Add nodes to graph. Nodes that are already in graph
are ignored.
Args : ARRAY of nodes.
Returns : Nothing useful
Note : Singular and plural forms are synonymous
Title : add_edges, add_edge
Usage : $graph->add_edges('a','b',['b','c']);
$graph->add_edge('c','d')) {
Function: Add edges to graph.
Edges that are already in graph are not added again, but
are placed in a separate 'duplicate edges' list.
Automatically adds any nodes that are not yet in the graph.
Args : ARRAY of edges in any of the forms described in the
previous section. The forms can be mixed as shown in
the Usage here.
Returns : Nothing useful
Note : Singular and plural forms are synonymous
Title : nodes
Usage : my @nodes=$graph->nodes;
if (@{$graph->nodes('a','b')}==2) {
print "a, b are both nodes\n";
}
Function: Return all nodes or the given ones.
With no args returns all nodes.
With args, returns the nodes corresponding to each arg, or
undef if the arg is not a node. Useful for testing whether
a given value is a node in the graph.
Args : (optional)
ARRAY of nodes or text representations of nodes
Returns : ARRAY or ARRAY ref of nodes (for args that correspond to
nodes), or undef (for args that are not nodes)
Title : edges
Usage : my @edges=$graph->edges;
if (@{$graph->edges('a','b',['b','c'])}==2) {
print "[a,b] and [b,c] are both edges\n";
}
Function: Return all edges or the given ones.
With no args returns all edges.
With args, returns the edges corresponding to each arg, or
undef if the arg is not a edge. Useful for testing whether
a given value is a edge in the graph.
Args : (optional)
One or more edges in any of the forms described in the
previous section. The forms can be mixed as shown in
the Usage here.
Returns : ARRAY or ARRAY ref of edges for args that correspond to
edges), or undef (for args that are not edges)
Title : node
Usage : if ($graph->node('a')) {
print "a is a node\n";
}
Function: Test whether a value is a node in the graph, or map the
text representation of a node to the node itself. The
method can also be fed a list of values (like the 'nodes'
method) and it will test all of them.
Args : Usually, a single node.
The function also accepts a list of nodes.
Returns : In scalar context (the usual case): the node corresponding
to the arg (if there's just one), or the node corresponding
to the first arg (if a list of args were provided, which is
kind of dumb in this case), or undef if the arg is not a
node.
In array context, it behaves just like 'nodes', returning
an ARRAY of nodes (for args that correspond to nodes), or
undef (for args that are not nodes)
Title : edge
Usage : if ($graph->edge('a','b')) {
print "a,b is a edge\n";
}
if ($graph->edge(['a','b'])) {
print "[a,b] is a edge\n";
}
Function: Test whether a value is a edge in the graph, or map the
text representation of a edge to the edge itself. The
method can also be fed a list of edges (like the 'edges'
method) and it will test all of them.
Args : Usually, a single edge. Same format as 'edges'
The function also accepts a list of edges, exactly like
'edges'
Returns : In scalar context (the usual case): the edge corresponding
to the arg (if there's just one), or the or the edge
corresponding to the first arg (if a list of args were
provided, which is kind of dumb in this case), or undef if
the arg is not a edge.
In array context, it behaves just like 'edge's, returning
an ARRAY of edges (for args that correspond to edges), or
undef (for args that are not edges)
Title : has_nodes, has_node
Usage : if ($graph->has_nodes('a','b')) {
print "a, b are both nodes\n";
}
if ($graph->has_node('a')) {
print "a is a node\n";
}
Function: Return true is all args are nodes.
Args : ARRAY of nodes or text representations of nodes
Returns : Boolean
Note : Singular and plural forms are synonymous
Title : has_edges
Usage : if ($graph->has_edges('a','b',['b','c'])) {
print "[a,b] and [b,c] are both edges\n";
}
if ($graph->has_edge('a','b')) {
print "[a,b] is an edge\n";
}
Function: Return true is all args are edges.
Args : ARRAY of edges in the forms described in the section above
Returns : Boolean
Note : Singular and plural forms are synonymous
Title : neighbors, neighbor
Usage : my @nodes=$graph->neighbors($node)
my @nodes=$graph->neighbors($node,'node')
my @edges=$graph->neighbors($edge,'edge');
Function: Return the node or edge neighbors of a given node or edge.
Args : (mandatory)
$source: node or edge whose neighbors are sought
(optional)
$what: the word 'node' or 'edge' (actually, anything starting
with 'n' or 'e' will do)
default: 'node'
Returns : ARRAY or ARRAY ref of nodes or edges
Note : Singular and plural forms are synonymous. This may not be
right.
Title : dup_edges
Usage : my @dups=$graph->dup_edges;
Function: Return duplicate edges
Args : None
Returns : ARRAY or ARRAY ref of edges that have been added more than
once.
```

## Graph properties

```
Title : is_connected
Usage : if ($graph->is_connected) {
print "graph has only one connected component\n";
}
Function: Return true if the graph is connected
Args : None
Returns : Boolean
Title : is_empty
Usage : if ($graph->is_empty) {
print "graph has no nodes or edges\n";
}
Function: Return true if the graph is empty, ie, has no nodes or edges
Args : None
Returns : Boolean
Title : is_tree
Usage : if ($graph->is_tree) {
print "graph is a tree\n";
}
Function: Return true if the graph is a tree, ie, it's connected and
has no cycles
Args : None
Returns : Boolean
Title : is_forest
Usage : if ($graph->is_forest) {
print "graph is a forest\n";
}
Function: Return true if the graph is a forest, ie, it has no cycles
but may not be connected
Args : None
Returns : Boolean
Title : is_cyclic
Usage : if ($graph->is_cyclic) {
print "graph contains at least one cycle\n";
}
Function: Return true if the graph is a cyclic.
Args : None
Returns : Boolean
Title : density
Usage : my $density=$graph->density
Function: Compute graph 'density' which is the number of edges
divided by the maximum possible number of edges
Args : None
Returns : Number
```

## Graph operations

```
Title : subgraph
Usage : my $subgraph=$graph->subgraph('a','b','c');
Function: Compute node subgraph. Constructs a new graph whose nodes
are the arguments, and whose edges are the edges of the
original graph that only involve the given nodes.
Args : ARRAY of nodes or text representations of nodes
Returns : New graph
Title : neighbor_subgraph
Usage : my $subgraph=$graph->subgraph('a');
Function: Construct node subgraph graph whose nodes are the given
node and its neighbors. are the arguments, and whose edges
are the edges of the original graph that only involve the
given nodes.
Args : Node or text representations of node
Returns : New graph
Title : union
Usage : my $union=$graph->union($other_graph);
Function: Construct new graph whose nodes are the union of the nodes
of the current graph and $other_graph, and whose edges are
the union of the edges of the current graph and
$other_graph.
Args : $other_graph: a graph
Returns : New graph
Title : intersection
Usage : my $intersection=$graph->intersection($other_graph);
Function: Construct new graph whose nodes are the intersection of the
nodes of the current graph and $other_graph, and whose
edges are the intersection of the edges of the current
graph and $other_graph.
Args : $other_graph: a graph
Returns : New graph
```

## Graph algorithms

```
Title : traversal
Usage : my $traversal=$graph->traversal('a','depth first','node');
my @nodes;
while (my $node=$traversal->get_next) {
push(@nodes,$node);
}
my $traversal=$graph->traversal('a','depth first','node');
my @nodes=$traversal->get_all;
Function: Do node or edge traversal in depth or breadth first order.
Args : (optional)
$start: starting node or edge for traversal
default: software picks arbitrary start
$order: 'depth first' or 'breadth first' (actually,
anything starting with 'd' or 'b' will do)
default: 'depth first'
$what: 'node' or 'edge' (actually, anything starting
with 'n' or 'e' will do)
default: 'node'
Returns : SimpleGraph::Traversal object
This is an iterator with the following methods:
get_next: get next item in traversal or undef if
traversal is exhausted
get_this: get current item in traversal
get_all : get all remaining items in traversal as
ARRAY (in array context) or ARRAY ref
has_next: return true if there are more items in
traversal, else undef
reset : restart traversal
Note : It's also possible, and perhaps easier, to perform a
traversal by creating a SimpleGraph::Traversal object
directly. The constructor is
new Bio::ConnectDots::SimpleGraph::Traversal(-graph=>$graph,-start=>$start,
-order=>$order,-what=>$what)
Title : node_traversal
Usage : my $traversal=$graph->node_traversal('a','depth first');
my @nodes;
while (my $node=$traversal->get_next) {
push(@nodes,$node);
}
my $traversal=$graph->node_traversal('a','depth first');
my @nodes=$traversal->get_all;
Function: Do node traversal in depth or breadth first order.
Wrapper for 'traversal' method. See above.
Args : (optional)
$start: starting node for traversal
default: software picks arbitrary start
$order: 'depth first' or 'breadth first' (actually,
anything starting with 'd' or 'b' will do)
default: 'depth first'
Returns : SimpleGraph::Traversal object
Title : edge_traversal
Usage : my $traversal=$graph->edge_traversal(['a','b'],'depth first');
my @edges;
while (my $edge=$traversal->get_next) {
push(@edges,$edge);
}
my $traversal=$graph->edge_traversal(['a','b'],'depth first');
my @edges=$traversal->get_all;
Function: Do edge traversal in depth or breadth first order.
Wrapper for 'traversal' method. See above.
Args : (optional)
$start: starting edge for traversal
default: software picks arbitrary start
$order: 'depth first' or 'breadth first' (actually,
anything starting with 'd' or 'b' will do)
default: 'depth first'
Returns : SimpleGraph::Traversal object
Title : components
Usage : my @components=$graph->components;
for my $component (@components) {
my @nodes=$component->nodes;
my @edges=$component->edges;
}
Function: Compute the connected components of the graph. A connected
component is a maximal connected subgraph. 'Connected'
means you can get from any node of the component to any
other by following a path. 'Maximal' means that every node
you can reach from the component is in the component.
Args : None
Returns : ARRAY or ARRAY ref of SimpleGraphs
Note : The software caches the components once computed, so it's efficient
to call this repeatedly.
Title : shortest_paths
Usage : my $paths=$graph->shortest_paths;
while(my($endpoints,$path)=each %$paths) {
my($start,$stop)=split($;,$endpoints);
my @nodes_on_path=@$path;
print "Path from $start to $stop: @$nodes_on_path\n";
}
-- OR --
my ($paths,$distances)=$graph->shortest_paths(-want_distances=>1);
while(my($endpoints,$path)=each %$paths) {
my $distance=$distances->{$endpoints};
my($start,$stop)=split($;,$endpoints);
my @nodes_on_path=@$path;
print "Path from $start to $stop, distance $distance: @$nodes_on_path\n";
}
Function: Compute shortest path between each pair of nodes.
Args : (optional)
-max=>Maximum path length to compute
Paths longer than this are not found
-want_distances=>Boolean. If true, distances are also returned
Returns : HASH whose keys are pairs of nodes (encoded in the standard
Perl manner as two strings joined by $;) and whose values
are paths. Each path is an ARRAY ref of nodes starting at
the first endpoint and ending at the second.
The first endpoint is always lexically smaller (lt) than
the second.
If -want_distances is specified, the result is a pair of
HASHes, one containing paths and the other containing
distances.
Title : distances
Usage : my $distances=$graph->distances;
while(my($endpoints,$distance)=each %$distances) {
my($start,$stop)=split($;,$endpoints);
print "Path from $start to $stop has distance $distance\n";
}
-- OR --
my ($paths,$distances)=$graph->distances(-want_paths=>1);
while(my($endpoints,$distance)=each %$distances) {
my $path=$paths->{$endpoints};
my($start,$stop)=split($;,$endpoints);
my @nodes_on_path=@$path;
print "Path from $start to $stop, distance $distance: @$nodes_on_path\n";
}
Function: Compute distance between each pair of nodes. This is the
length of the shortest path
Args : (optional)
-max=>Maximum path length to compute
Distances longer than this are not found
-want_paths=>Boolean. If true, paths are also returned
Returns : HASH whose keys are pairs of nodes (encoded in the standard
Perl manner as two strings joined by $;) and whose values
are distances.
The first endpoint is always lexically smaller (lt) than
the second.
If -want_paths is specified, the result is a pair of
HASHes, one containing paths and the other containing
distances.
Title : connected_nodesets
Usage : my @nodesets=$graph->connected_nodesets;
for my $nodeset (@nodesets) {
my @nodes=@$nodeset;
}
Function: Compute all sets of nodes that form connected subgraphs.
A connected nodeset is a set of nodes such that it's
possible to get from any node to any other by following a
path that only includes nodes in the set.
Args : None
Returns : ARRAY or ARRAY ref of nodeset, where each nodeset is an ARRAY
ref of nodes.
Note : Use with caution. The number of nodesets is very
large for graphs that are highly connected.
Title : connected_subgraphs
Usage : my @subgraphs=$graph->connected_subgraphs;
Function: Compute all connected subgraphs of the current graph.
Args : None
Returns : ARRAY or ARRAY ref of subgraphs
Note : Use with caution. The number of connected subgraphs is
very large for graphs that are highly connected.
```