Ewan Birney


Bio::DB::NCBIHelper - A collection of routines useful for queries to NCBI databases.


 Do not use this module directly.
 # get a Bio::DB::NCBIHelper object somehow
 my $seqio = $db->get_Stream_by_acc(['MUSIGHBA1']);
 foreach my $seq ( $seqio->next_seq ) {
  # process seq


Provides a single place to setup some common methods for querying NCBI web databases. This module just centralizes the methods for constructing a URL for querying NCBI GenBank and NCBI GenPept and the common HTML stripping done in postprocess_data().

The NCBI query URLs used are http://www.ncbi.nlm.nih.gov as the base URL, /cgi-bin/Entrez/qserver.cgi as the query interface for batch mode, and /entrez/utils/qmap.cgi for single-query mode.


AUTHOR - Jason Stajich

Email jason@bioperl.org


The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _


 Title   : get_params
 Usage   : my %params = $self->get_params($mode)
 Function: Returns key,value pairs to be passed to NCBI database
           for either 'batch' or 'single' sequence retrieval method
 Returns : a key,value pair hash
 Args    : 'single' or 'batch' mode for retrieval


 Title   : default_format
 Usage   : my $format = $self->default_format
 Function: Returns default sequence format for this module
 Returns : string
 Args    : none


 Title   : get_request
 Usage   : my $url = $self->get_request
 Function: HTTP::Request
 Returns : 
 Args    : %qualifiers = a hash of qualifiers (ids, format, etc)


  Title   : get_Stream_by_batch
  Usage   : $seq = $db->get_Stream_by_batch($ref);
  Function: Retrieves Seq objects from Entrez 'en masse', rather than one
            at a time.  For large numbers of sequences, this is far superior
            than get_Stream_by_[id/acc]().
  Example :
  Returns : a Bio::SeqIO stream object
  Args    : $ref : either an array reference, a filename, or a filehandle
            from which to get the list of unique ids/accession numbers.


 Title   : postprocess_data
 Usage   : $self->postprocess_data ( 'type' => 'string',
                                     'location' => \$datastr);
 Function: process downloaded data before loading into a Bio::SeqIO
 Returns : void
 Args    : hash with two keys - 'type' can be 'string' or 'file'
                              - 'location' either file location or string 
                                           reference containing data


 Title   : request_format
 Usage   : my ($req_format, $ioformat) = $self->request_format;
 Function: Get/Set sequence format retrieval. The get-form will normally not
           be used outside of this and derived modules.
 Returns : Array of two strings, the first representing the format for
           retrieval, and the second specifying the corresponding SeqIO format.
 Args    : $format = sequence format


 Title   : get_Seq_by_version
 Usage   : $seq = $db->get_Seq_by_version('X77802.1');
 Function: Gets a Bio::Seq object by sequence version
 Returns : A Bio::Seq object
 Args    : accession.version (as a string)
 Throws  : "acc.version does not exist" exception


  Title   : get_Stream_by_version
  Usage   : 
  Function: DO NOT USE. HACK.
            Reuses the method defined by the interface file to retrieve
            a HTML table with all GIs (versions) for a accession number.
  Returns : a HTTP::Request object
  Args    : $ref : a reference to an array of accession.version strings for
                   the desired sequence entries

Bio::DB::WebDBSeqI methods

Overriding WebDBSeqI method to help newbies to retrieve sequences


  Title   : get_Stream_by_acc
  Usage   : $seq = $db->get_Seq_by_acc([$acc1, $acc2]);
  Function: Gets a series of Seq objects by accession numbers
  Returns : a Bio::SeqIO stream object
  Args    : $ref : a reference to an array of accession numbers for
                   the desired sequence entries
  Note    : For GenBank, this just calls the same code for get_Stream_by_id()


  Title   : _check_id
  Usage   : 
  Returns : A Bio::DB::RefSeq reference or throws
  Args    : $id(s), $string