The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

dbfetch - generic CGI program to retrieve biological database entries in various formats and styles (using SRS)

SYNOPSIS

  # URL examples:

  # prints the interactive page with the HTML form
  http://www.ebi.ac.uk/cgi-bin/dbfetch

  # for backward compatibility, implements <ISINDEX>
  # single entry queries defaulting to EMBL sequence database
  http://www.ebi.ac.uk/cgi-bin/dbfetch?J00231

  # retrieves one or more entries in default format
  # and default style (html)
  # returns nothing for IDs which are not valid
  http://www.ebi.ac.uk/cgi-bin/dbfetch?id=J00231.1,hsfos,bum

  # retrieve entries in fasta format without html tags
  http://www.ebi.ac.uk/cgi-bin/dbfetch?format=fasta&style=raw&id=J00231,hsfos,bum

  # retrieve a raw Ensembl entry
  http://www.ebi.ac.uk/cgi-bin/dbfetch?db=ensembl&style=raw&id=AL122059

DESCRIPTION

This program generates a page allowing a web user to retrieve database entries from a local SRS in two styles: html and raw. Other database engines can be used to implement te same interfase.

At this stage, on unique identifier queries are supported. Free text searches returning more than one entry per query term are not in these specs.

In its default setup, type one or more EMBL accession numbers (e.g. J00231), entry name (e.g. BUM) or sequence version into the seach dialog to retieve hypertext linked enties.

Note that for practical reasons only the first 50 identifiers submitted are processed.

Additional input is needed to change the sequence format or suppress the HTML tags. The styles are html and raw. In future there might be additional styles (e.g. xml). Currently XML is a 'raw' format used by Medline. Each style is implemented as a separate subroutine.

MAINTANENCE

A new database can be added simply by adding a new entry in the global hash %IDS. Additionally, if the database defines new formats add an entry for each of them into the hash %IDMATCH. After modifying the hash, run this script from command line for some sanity checks with parameter debug set to true (e.g. dbfetch debug=1 ).

Finally, the user interface needs to be updated in the print_prompt subroutine.

VERSIONS

Version 3 uses EBI SRS server 6.1.3. That server is able to merge release and update libraries automatically which makes this script simpler. The other significant change is the way sequence versions are indexed. They used to be indexed together with the string accession (e.g. 'J00231.1'). Now they are indexed as integers (e.g. '1').

Version 3.1 changes the command line interface. To get the debug information use attribute 'debug' set to true. Also, it uses File::Temp module to create temporary files securely.

Version 3.2 fixes fasta format parsing to get the entry id.

Version 3.3. Adds RefSeq to the database list.

Version 3.4. Make this compliant to BioFetch specs.

AUTHOR - Heikki Lehvaslaiho

Email: heikki@ebi.ac.uk Address:

     EMBL Outstation, European Bioinformatics Institute
     Wellcome Trust Genome Campus, Hinxton
     Cambs. CB10 1SD, United Kingdom
 Title   : print_prompt
 Usage   :
 Function: Prints the default page with the query form
           to STDOUT (Web page)
 Args    :
 Returns :

protect

 Title   : protect
 Usage   : $value = protect($q->param('id'));
 Function:

           Removes potentially dangerous characters from the input
           string.  At the same time, converts word separators into a
           single space character.

 Args    : scalar, string with one or more IDs or accession numbers
 Returns : scalar

input_error

 Title   : input_error
 Usage   : input_error($q, 'html', "Error message");
 Function: Standard error message behaviour
 Args    : reference to the CGI object
           scalar, string to display on input error.
 Returns : scalar

no_entries

 Title   : no_entries
 Usage   : no_entries($q, "Message");
 Function: Standard behaviour when no entries found
 Args    : reference to the CGI object
           scalar, string to display on input error.
 Returns : scalar

raw

 Title   : raw
 Usage   :
 Function: Retrieves a single database entry in plain text
 Args    : scalar, an ID
           scaler, format
 Returns : scalar

html

 Title   : html
 Usage   :
 Function: Retrieves a single database entry with HTML
           hypertext links in place. Limits retieved enties to 
           ones with correct version if the string has '.' in it.
 Args    : scalar, a UID
           scalar, format
 Returns : scalar

xml

 Title   : xml
 Usage   : 
 Function: Retrieves an entry formatted as XML
 Args    : array, UID
           scalar, format 
 Returns : scalar

debugging

 Title   : debugging
 Usage   : 'perl dbfetch'
 Function:

           Performs sanity checks on global hash %IDS when this script
           is run from command line. %IDS holds the description of
           formats and other crusial info for each database accessible
           through the program.

           Note that hash key 'version' is not tested as it should 
           only be in sequence databases.

 Args    : none
 Returns : error messages to STDOUT