The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Bio::GMOD::StandardURLs - Discover and fetch Standard URLs from MODs

SYNPOSIS

  my $mod = Bio::GMOD::StandardURLS->new(-mod => 'WormBase');
  my @species  = $mod->available_species;

This module provides a programmatic interface to the common URLs provided by Model Organism Databases. These URLs simplify the retrieval of common datasets from using standard URLs. The full specification is described at the end of this document.

PUBLIC METHODS

$mod->available_species();

Fetch a list of available species available by the Standard URL mechanism at the current MOD. Called in array context, returns a list of species in the form "G_species" (e.g. C_elegans). These abbreviated binomial names conform to the specification for subsequent requests. Called in scalar context, this method returns the number of species available. If passed the optional "-expanded" parameters, this method returns a hash reference of full binomial names pointing to their abbreviated name.

This method is a programmatic equivalent to accessing the standard URL:

    http://your.site/genome
$mod->releases(-species=>'Caenorhabditis elegans',-status=>'available');

Fetch all of the available releases for a provided species. Called in array context, releases() returns an array of all available releases for the given species. Species can be either the full binomial name (e.g. "Caenorhabditis elegans") or the abbreviated short form (e.g. "C_elegans").

Provided with the optional '--expanded' method, this method returns an array of arrays containing the version, date released, and availability of the release. The optional '-status' parameter filters the returned releases. Options are 'available' to return only those that are currently available, 'unavailable' to return those no longer available. If not supplied, all known releases will be returned.

This method is a programmatic equivalent to accessing the standard URL:

    http://your.site/genome/Binomial_name
$mod->data_sets(-species=>$species,-release=>$release);

Fetch all of the available urls for a given species and data release. If release is not provided, defaults to the current release (or you may explictly request "current". Returns a hash reference where the keys are symbolic names of datasets and values are URLs to the dataset.

This method is a programmatic equivalent to accessing the standard URL:

    http://your.site/genome/Binomial_name/release_name
$mod->fetch(@options);

Fetch the specified dataset. Note: this could be a very large file! Available options.

 Options:
 -url      The full URL to the dataset
   OR specify a dataset with species, release, and dataset:
 -species  The binomial name or abbreviated form of the species
 -release  The version to fetch
 -dataset  The symbolic name of the dataset (dna, mrna, etc)

This method is a programmatic equivalent to accessing the standard URL:

    http://your.site/genome/Binomial_name/release_name/[dataset]
$mod->supported_datasets();

Fetch a list of symbolic names of supported datasets. This typically will be a list like "dna", "mrna", "ncrna", "protein", and "feature".

Standard URL Specification

PHASE I

Substitutions:

        your.site       Host address, e.g. www.wormbase.org
        Binomial_name   NCBI Taxonomy scientific name, e.g.
                        Caenorhabditis_elegans
        release_name    Data release, in whatever is the local
                        format (e.g. release date, release number)
http://your.site/genome/

Leads to index page for species. This should be an HTML-format page that contains links to each of the species whose genomes are available for download.

http://your.site/genome/Binomial_name/

Leads to index for releases for species Binomial_name. This will be an HTML-format page containing links to each of the genome releases.

http://your.site/genome/Binomial_name/release_name/

Leads to index for the named release. It should be an HTML-format page containing links to each of the data sets described below.

http://your.site/genome/Binomial_name/current/

Leads to the index for the most recent release, symbolic link style.

http://your.site/genome/Binomial_name/current/dna

Returns a FASTA file containing big DNA fragments (e.g. chromosomes). MIME type is application/x-fasta.

http://your.site/genome/Binomial_name/current/mrna

Returns a FASTA file containing spliced mRNA transcript sequences. MIME type is application/x-fasta.

http://your.site/genome/Binomial_name/current/ncrna

Returns a FASTA file containing non-coding RNA sequences. MIME type is application/x-fasta.

http://your.site/genome/Binomial_name/current/protein

Returns a FASTA file containing all the protein sequences known to be encoded by the genome. MIME type is application/x-fasta

http://your.site/genome/Binomial_name/current/feature

Returns a GFF3 file describing genome annotations. MIME type is application/x-gff3.

PHASE II

In the phase 2 URL scheme, we'll be able to attach ?format=XXXX to each of the URLs:

http://your.site/genome/?format=HTML
    Same as default for phase I.
http://your.site/genome/?format=RSS

Return RSS feed indicating what species are available.

http://your.site/genome/Binomial_name/?format=RSS

Return RSS feed indicating what releases are available.

http://your.site/genome/Binomial_name/release_name/?format=RSS

Return RSS feed indicating what data sets are available.

http://your.site/genome/Binomial_name/current/protein?format=XXX

Alternative formats for sequence data. E.g. XXX could be FASTA, RAW, or whatever (for further discussion).

1 POD Error

The following errors were encountered while parsing the POD:

Around line 340:

'=item' outside of any '=over'

=over without closing =back