The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Bio::SeqIO::GenBank - GenBank sequence input/output stream

SYNOPSIS

It is probably best not to use this object directly, but rather go through the SeqIO handler system. Go:

    $stream = Bio::SeqIO->new(-file => $filename, -format => 'GenBank');

    while ( my $seq = $stream->next_seq() ) {
        # do something with $seq
    }

DESCRIPTION

This object can transform Bio::Seq objects to and from GenBank flat file databases.

There is alot of flexibility here about how to dump things which I need to document fully.

Mapping of record properties to object properties

This section is supposed to document which sections and properties of a GenBank databank record end up where in the Bioperl object model. It is far from complete and presently focuses only on those mappings which may be non-obvious. $seq in the text refers to the Bio::Seq::RichSeqI implementing object returned by the parser for each record.

GI number

$seq->primary_id

Optional functions

_show_dna()

(output only) shows the dna or not

_post_sort()

(output only) provides a sorting func which is applied to the FTHelpers before printing

_id_generation_func()

This is function which is called as

   print "ID   ", $func($seq), "\n";

To generate the ID line. If it is not there, it generates a sensible ID line using a number of tools.

If you want to output annotations in genbank format they need to be stored in a Bio::Annotation::Collection object which is accessible through the Bio::SeqI interface method annotation().

The following are the names of the keys which are polled from a Bio::Annotation::Collection object.

reference - Should contain Bio::Annotation::Reference objects comment - Should contain Bio::Annotation::Comment objects

segment - Should contain a Bio::Annotation::SimpleValue object origin - Should contain a Bio::Annotation::SimpleValue object

Where does the data go?

Data parsed in Bio::SeqIO::genbank is stored in a variety of data fields in the sequence object that is returned. More information in the HOWTOs about exactly what each field means and where it goes. Here is a partial list of fields.

Items listed as RichSeq or Seq or PrimarySeq and then NAME() tell you the top level object which defines a function called NAME() which stores this information.

Items listed as Annotation 'NAME' tell you the data is stored the associated Bio::Annotation::Colection object which is associated with Bio::Seq objects. If it is explictly requested that no annotations should be stored when parsing a record of course they won't be available when you try and get them. If you are having this problem look at the type of SeqBuilder that is being used to contruct your sequence object.

Comments Annotation 'comment' References Annotation 'reference' Segment Annotation 'segment' Origin Annotation 'origin'

Accessions PrimarySeq accession_number() Secondary accessions RichSeq get_secondary_accessions() Keywords RichSeq keywords() Dates RichSeq get_dates() Molecule RichSeq molecule() Seq Version RichSeq seq_version() PID RichSeq pid() Division RichSeq division() Features Seq get_SeqFeatures() Alphabet PrimarySeq alphabet() Definition PrimarySeq description() or desc() Version PrimarySeq version()

Sequence PrimarySeq seq()

FEEDBACK

Mailing Lists

User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one of the Bioperl mailing lists. Your participation is much appreciated.

  bioperl-l@bioperl.org                  - General discussion
  http://www.bioperl.org/MailList.shtml  - About the mailing lists

Reporting Bugs

Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via email or the web:

  bioperl-bugs@bio.perl.org
  http://bugzilla.bioperl.org/

AUTHOR - Elia Stupka

Email elia@tll.org.sg

CONTRIBUTORS

Ewan Birney birney@ebi.ac.uk Jason Stajich jason@bioperl.org Chris Mungall cjm@fruitfly.bdgp.berkeley.edu Lincoln Stein lstein@cshl.org Heikki Lehvaslaiho, heikki@ebi.ac.uk Hilmar Lapp, hlapp@gmx.net Donald G. Jackson, donald.jackson@bms.com

APPENDIX

The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _

next_seq

 Title   : next_seq
 Usage   : $seq = $stream->next_seq()
 Function: returns the next sequence in the stream
 Returns : Bio::Seq object
 Args    :

write_seq

 Title   : write_seq
 Usage   : $stream->write_seq($seq)
 Function: writes the $seq object (must be seq) to the stream
 Returns : 1 for success and 0 for error
 Args    : array of 1 to n Bio::SeqI objects

_print_GenBank_FTHelper

 Title   : _print_GenBank_FTHelper
 Usage   :
 Function:
 Example :
 Returns : 
 Args    :

_read_GenBank_References

 Title   : _read_GenBank_References
 Usage   :
 Function: Reads references from GenBank format. Internal function really
 Returns : 
 Args    :

_read_GenBank_Species

 Title   : _read_GenBank_Species
 Usage   :
 Function: Reads the GenBank Organism species and classification
           lines.
 Example :
 Returns : A Bio::Species object
 Args    : a reference to the current line buffer

_read_FTHelper_GenBank

 Title   : _read_FTHelper_GenBank
 Usage   : _read_FTHelper_GenBank($buffer)
 Function: reads the next FT key line
 Example :
 Returns : Bio::SeqIO::FTHelper object 
 Args    : filehandle and reference to a scalar

_write_line_GenBank

 Title   : _write_line_GenBank
 Usage   :
 Function: internal function
 Example :
 Returns : 
 Args    :

_write_line_GenBank_regex

 Title   : _write_line_GenBank_regex
 Usage   :
 Function: internal function for writing lines of specified
           length, with different first and the next line 
           left hand headers and split at specific points in the
           text
 Example :
 Returns : nothing
 Args    : file handle, first header, second header, text-line, regex for line breaks, total line length

_post_sort

 Title   : _post_sort
 Usage   : $obj->_post_sort($newval)
 Function: 
 Returns : value of _post_sort
 Args    : newvalue (optional)

_show_dna

 Title   : _show_dna
 Usage   : $obj->_show_dna($newval)
 Function: 
 Returns : value of _show_dna
 Args    : newvalue (optional)

_id_generation_func

 Title   : _id_generation_func
 Usage   : $obj->_id_generation_func($newval)
 Function: 
 Returns : value of _id_generation_func
 Args    : newvalue (optional)

_ac_generation_func

 Title   : _ac_generation_func
 Usage   : $obj->_ac_generation_func($newval)
 Function: 
 Returns : value of _ac_generation_func
 Args    : newvalue (optional)

_sv_generation_func

 Title   : _sv_generation_func
 Usage   : $obj->_sv_generation_func($newval)
 Function: 
 Returns : value of _sv_generation_func
 Args    : newvalue (optional)

_kw_generation_func

 Title   : _kw_generation_func
 Usage   : $obj->_kw_generation_func($newval)
 Function: 
 Returns : value of _kw_generation_func
 Args    : newvalue (optional)