The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Bio::Seq - bioperl sequence object

SYNOPSIS

Object Creation

 $seq = Bio::Seq->new;
 
 $seq = Bio::Seq->new(-seq=>'ACTGTGGCGTCAACTG');
 
 $seq = Bio::Seq->new(-seq=>$sequence_string);
 
 $seq = Bio::Seq->new(-seq=>@character_list);
 
 
 $seq = Bio::Seq->new($file,$seq,$id,$desc,$names,
                     $numbering,$type,$ffmt,$descffmt);

Object Creation from files

There are two ways to create Bio::Seq objects from files. One is using internal Sequence reading routines in this object, which can handle a few formats. The second is to use the newer SeqIO system, which can handle slightly more formats, can handle multiple sequences in one file, and can be easily extended to new formats.

Try to use the new style. It does give you more flexibility and stability.

  # old-style and deprecated,

  $seq = Bio::Seq->new($filename); # guesses Fasta format 

  $seq = Bio::Seq->new(-file=>'seqfile.aa',
                      -desc=>'Sample Bio::Seq sequence',
                      -start=>'1',
                      -ffmt=> 'Fasta',
                      -type=>'Amino',
                      );

  # new style, better, but somewhat more wordy
  # notice this loops over multiple sequences

  $stream = Bio::SeqIO->new(-file => 'myfile' -fmt => 'Fasta');

  while $seq ( $stream->next_seq() ) {
       # $seq is a Bio::Seq object
  }

Object Manipulation

 $seq->[METHOD];

 $result = $seq->[METHOD];
 
 
 
 Accessors
 --------------------------------------------------------
 There are a wide variety of methods designed to give easy
 and flexible access to the contents of sequence objects
 
 The following accessors can be invoked upon a sequence object

 ary()        - access sequence (or slice of sequence) as an array
 str()        - access sequence (or slice of sequence) as a string
 getseq()     - access sequence (or slice) as string or array
 seq_len()    - access sequence length
 id()         - access/change object id 
 desc()       - access/change object description
 names()      - access/change object names
 start()      - access/change start point of the sequence (see note below) 
 end()        - access/change end point of the sequence (see note below)
 numbering()  - access/change sequence numbering offset (deprecated)
 origin()     - access/change sequence origin
 type()       - access/change sequence type
 setseq()     - change sequence

 Deprecated format changes.

 ffmt()       - access/change default output format
 descffmt()   - access/change description format
 

 Methods
 --------------------------------------------------------
 The following methods can be invoked upon a sequence object

 copy()        - returns an exact copy of an object
 alphabet_ok() - check sequence against genetic alphabet  
 alphabet()    - returns the genetic alphabet currently in use
 layout()      - sequence formatter for output
 revcom()      - reverse complement of sequence
 complement()  - complement of sequence  
 reverse()     - reverse of sequence
 Dna_to_Rna()  - translate Dna seq to Rna
 Rna_to_Dna()  - translate Rna seq to Dna
 translate()   - protein translation of Dna/Rna sequence

  
 copy, revcom and translate all return new Bio::Seq objects. This
 makes it easy to use these objects in other Bioperl modules and/or
 use all the new SeqIO system for format dumping.

 complement, reverse, Dna_to_Rna and Rna_to_Dna all return strings,
 as it is less likely that you want these things as real Seq objects

OBJECT IN TRANSITION

The Bio::Seq object is by far the oldest object in the bioperl set of modules, and it shows, with around 4/5 people developing methods and much of the documentation focused on general bioperl issues. The bioperl core group have a commitment to eventually rewrite the Bio::Seq object with some more sensible design principles, but this rewrite will

    a) be heavily tested against old uses of the code
    b) aim to be as backwardly compatible as possible
    c) be well signposted that it is occuring.

For more information read the bioperl web page, projects, sequence object,

     http://bio.perl.org/Projects/Sequence/

INSTALLATION

This module is included with the central Bioperl distribution:

   http://bio.perl.org/Core/Latest
   ftp://bio.perl.org/pub/DIST

Follow the installation instructions included in the README file.

DESCRIPTION

This module is the generic sequence object which lies at the core of the bioperl project. It stores Dna, Rna, or Protein sequence information and annotation. It has associated methods to perform various manipulations of sequences and support for a reading and writing sequence data in a variety of file formats.

Bio::Seq has completly superceeded Bio::PreSeq.pm.

The older PreSeq.pm code can be found at Chris Dagdigian's site: http://www.sonsorol.org/dag/bioperl/top.html

Sequence Types

Currently the following sequence types are recognized:

 Dna
 Rna
 Amino

Alphabets

This module uses the standard extended single-letter genetic alphabets to represent nucleotide and amino acid sequences.

In addition to the standard alphabet, the following symbols are also acceptable in a biosequence:

 ?  (a missing nucleotide or amino acid)
 -  (gap in sequence)

Extended Dna / Rna alphabet

 (includes symbols for nucleotide ambiguity)
 ------------------------------------------
 Symbol       Meaning      Nucleic Acid
 ------------------------------------------
  A            A           Adenine
  C            C           Cytosine
  G            G           Guanine
  T            T           Thymine
  U            U           Uracil
  M          A or C  
  R          A or G   
  W          A or T    
  S          C or G     
  Y          C or T     
  K          G or T     
  V        A or C or G  
  H        A or C or T  
  D        A or G or T  
  B        C or G or T   
  X      G or A or T or C 
  N      G or A or T or C 

 
 IUPAC-IUB SYMBOLS FOR NUCLEOTIDE NOMENCLATURE:
   Cornish-Bowden (1985) Nucl. Acids Res. 13: 3021-3030.

Amino Acid alphabet

 ------------------------------------------
 Symbol           Meaning   
 ------------------------------------------
 A        Alanine
 B        Aspartic Acid, Asparagine
 C        Cystine
 D        Aspartic Acid
 E        Glutamic Acid
 F        Phenylalanine
 G        Glycine
 H        Histidine
 I        Isoleucine
 K        Lysine
 L        Leucine
 M        Methionine
 N        Asparagine
 P        Proline
 Q        Glutamine
 R        Arginine
 S        Serine
 T        Threonine
 V        Valine
 W        Tryptophan
 X        Unknown
 Y        Tyrosine
 Z        Glutamic Acid, Glutamine
 *        Terminator

 
 IUPAC-IUP AMINO ACID SYMBOLS:
   Biochem J. 1984 Apr 15; 219(2): 345-373
   Eur J Biochem. 1993 Apr 1; 213(1): 2

Sequence IO Formats

You are encouraged to use the SeqIO system of IO, which in essence looks like:

   use Bio::SeqIO;

   $instream = Bio::SeqIO->new( -file => 'my.file', -format => 'Fasta' );
   $outstream = Bio::SeqIO->new( -fh => \*STDOUT, -format => 'Raw' );

   while $seq ( $instream->next_seq ) {
      $outstream->write_seq($seq);
   }

   The available formats can be found by listing the SeqIO directory
in the distribution that this comes with (as new SeqIO formats are
very easy to add, it is better to go to the directory, not try to list them
here).

Notice that the SeqIO system will only convert information which the Seq object stores. The Seq object is a lightweight object, and does not contain annotation or feature table information. This information is stored in a development object, called AnnSeq, which will be available in the 0.06 releases and later.

USAGE

Using Bio::Seq in your perl programs

Seq.pm is invoked via the perl 'use' command

   use Bio::Seq;

Creating a biosequence object

The "constructor" method in Bio::Seq.pm is the new() function.

The proper syntax for accessing the new() function in Seq.pm is as follows:

   $myseq = Bio::Seq->new;

Of course, objects are only useful if they have something in them so you would probably want to pass along some additional information or arguments to the constructor. The foundation of any biosequence object is course the sequence itself.

You can address new() with a sequence directly:

   $myseq = Bio::Seq->new(-seq=>'AACTGGCGTTCGTG');

Or you can pass in a string or a list:

   $myseq = Bio::Seq->new(-seq=>$sequence_string);
   $myseq = Bio::Seq->new(-seq=>@sequence_list);

It is also possible to create a new sequence object based on a sequence contained in a file. You can tell constructor where to find the sequence file by passing in the 'file' parameter:

   $myseq  = Bio::Seq->new(-file=>'seqfile.gcg');

Because there are so many different conventions or formats for storing sequence information in files, it would be polite (although not absolutely necessary) to tell the constructor what format the sequence file is in. We can provide that information via the file-format or 'ffmt' field. To create a sequence object based upon a GCG-formatted sequence file:

   $myseq  = Bio::Seq->new(-file=>'seqfile.gcg',-ffmt=>'GCG');

We've already introduced three different object attributes or arguments that can be passed to the new() object constructor ('seq','file' and 'ffmt') so now would be a good time to introduce them all:

BioSeq Constructor Arguments

file: The "file" argument should be a string value containing path and filename information for a sequence file that is to be read into an object.

seq: The "seq" argument is for passing in sequence directly instead of reading in a sequence file. The sequence should consist of RAW info (no whitespace, newlines or formatting) and can be passed in as either an array/list or string.

id: The "id" argument should be a ONE-WORD string value giving a short name for the sequence.

desc: The "desc" argument should be a string containing a description of the sequence. This field is not limited to one word.

names: The "names" argument should be a hash or reference to a hash that contains any number of user generated key-value pairs. Various bits of identifying information can be stored here including name(s), database locations, accession numbers, URL's, etc.

type: The "type" argument should be a string value describing the sequence type eg; "Dna", "Rna" or "Amino".

origin: The "origin" argument should be a string value describing sequence origin info

start: The start point, in biological coordinates of the sequence

end: The end point, in biological coordinates of the last residue in the sequence

start/end attributes are not strongly tied to what is actually in the sequence (ie, $seq->start()+length($seq->getseq()) doesn't necessarily equal $seq->end()-1 - most of the time it should).

This is to allow some oddities to be stored in the Seq object sensibly.

The numbering convention is 'biological' coordinates. ie the sequence ATG would start at 1 (A) and finish at 3 (G). (NB - this is different from how perl represents ranges in sequences).

numbering() is equivalent to start() (old version). Eventually it will be removed. numbering() accesses the same attribute as start()

numbering: (Deprecated) The "numbering" argument should be an integer value containing the sequence numbering offset value. By default all sequence are numbered starting with 1.

ffmt:

This documentation describes the old format system: you are encouraged to use the newer SeqIO system described separately in the SeqIO documentation.

The "ffmt" argument should be a string describing sequence file-format. If a sequence is being read from a file via the "file" argument, "ffmt" is used to invoke the proper parsing code. "ffmt" is also the default format for sequence output when the layout method is called. See elsewhere in this documentation for info regarding recognized sequence file-formats.

If most of these arguments were used at once to create a sequence object, it would look something like this:

   #Set up the name hash
   %names = (
   'CloneID','DB1',
   'Isolate','5',
   'Tissue','Xenopus',
   'Location','/usr2/users/dag/bioperl/sample.tfa'
   );

   $name_ref = \%names;

   #Create the object
   $myseq = new Bio::Seq(-file=>'sample.tfa',
                         -names=>$name_ref,
                         -type=>'Dna',
                         -origin=>'Xenopus mesoderm',
                         -start=>'1',
                         -desc=>'Sample Bio::Seq sequence',
                         -ffmt=>'Fasta');

Methods

Once an object has been created, there are defined ways to go about accessing the information -- users are encouraged to poke around "under the hood" of Seq.pm to see what is going on but it is considered bad form to bypass the defined accession methods and mess around with the internal code. Bypassing the defined methods "voids the warrantee" of the module and can lead to problems down the road. The implied agreement between module creators and users is that the creators will strive to keep the interface standard and backwards-compatible while the users will avoid becoming dependent on bits of internal code that may change or disappear in future revisions.

Detailed information about each method described here can be found in the Appendix.

Accessing information

For each defined way to access information from a biosequence object, there is a corresponding "method" that is invoked. What follows is a brief description of each accessor method. For more detailed information see the individual annotations for each method near the end of this document.

  • Sequence

    The sequence can be accessed in several ways via the getseq() method. Depending on how it is invoked, it can return either a string or a list value.

    Both examples are appropriate:

       @sequence_list   = $myseq->getseq;
       $sequence_string = $myseq->getseq;

    Sequence "slices" can be accessed by passing start and stop integer position arguments to getseq():

       @slice = $myseq->getseq($start,$stop);
       @slice = $myseq->getseq(1,50);
       @slice = $myseq->getseq(100);

    If no stop value is passed in, getseq() will return a slice from the start position to the end of the sequence. Slices are returned in the context of the object "start" attribute, not absolute position so be aware of the objects numbering scheme.

    Sequences can also be accessed in with the ary() and str() methods. The ary() method will always return a list value and str() will always return a string. Otherwise they are functionally identical to the getseq() method.

       $sequence = $myseq->str;
       @sequence = $myseq->ary;
     
       @slice = $myseq->ary($start,$stop);
       $slice = $myseq->str($start,$stop);
  • Sequence length

    The sequence length can be accessed using the seq_len() method

       $len = $myseq->seq_len;
  • Sequence ID

    The ID field can be accessed using the id() method

       $ID = $myseq->id;
  • Description

    The object description field can be accessed using the desc() method

       $description = $myseq->desc;
  • Names

    The associative array (hash) that contains flexible information regarding alternative sequence names, database locations, accession numbers, etc. can be accessed by

       %name_hash = $myseq->names;
  • Sequence start

    The biological position of the first residue in the sequence sequence can be accessed via start()

       $start = $myseq->start;
  • Sequence end

    The biological position of the last residue in the sequence sequence can be accessed via end()

       $end = $myseq->end;
  • Sequence Origin

    The object origin (source organism) field can be accessed via origin()

      $seq_origin = $myseq->origin;
  • File input format / default output format

    The object format field can be accessed using the ffmt() method

       $format = $myseq->ffmt;

Changing Information in Sequence Objects

In the previous section it was shown how object attributes and values could be retrieved from a sequence object by calling upon various methods. Many of the above methods will also allow the user to CHANGE object attributes by passing in additional arguments. Detailed information on each method can be found in the Appendix.

  • Changing the sequence

    The sequence information for an object can be changed by passing a string or list value to the setseq() method. Here are some ways that sequence information can be changed

       $myseq->seqseq($new_sequence_string);
       $myseq->setseq(@new_sequence_list);
       $myseq->setseq("aaccttgcctgc");

    The setseq() method checks sequence elements and warns if it finds non-standard characters. Because of this, arbitrary sequence compositions are not supported at this time. This method is considered slightly 'insecure' because the 'id','desc' and 'type' fields are not updated along with the sequence. If necessary, the user must make the appropriate changes to these fields whenever sequence information is updated or changed.

  • Changing the sequence ID

    The ID field can be changed by passing in a new ID argument to id()

       $myseq->id($new_id);
  • Changing the object description

    The object description field can be changed by passing in a new argument to desc()

       $myseq->desc($new_desc);
  • Changing the object names hash

    The associative array (hash) that contains flexible information regarding alternative sequence names, database locations, accession numbers, etc. can be changed by passing in a reference to a new hash to names()

       $hash_ref = \%name_hash;
       $myseq->names($hash_ref);
  • Changing the sequence start or end

    The default numbering offset for the sequence can be changed by passing in a new value to start() or end()

       $myseq->start(1);
       $myseq->start($new_value);
  • Sequence Origin

    The object origin field can be changed by passing in a new string value to origin()

      $myseq->origin("mitochondrial");
      $myseq->origin($origin_string);
  • File input format / default output format

    The object format field can be accessed by passing in a new value to ffmt()

       $myseq->ffmt("GCG"); 

Manipulating sequences

Creating, accessing and changing biosequence objects and fields is all well and good, but eventually you are going to want to actually do some work.

Included with Seq.pm are some commonly used utility methods for manipulating sequence data. So far Seq.pm contains methods for:

  • Copying a biosequence object

    using copy()

        # NB - new_obj is a Bio::Seq object
    
        $new_obj = $myseq->copy;
  • Reversing a sequence

    using reverse()

        $reversed_seq = $myseq->reverse;
  • Complementing a sequence

    The 2nd strand, or "complement" of a biosequence can be obtained by calling upon the complement() method.

        $comp_seq = $myseq->complement;
  • Reverse complementing a sequence

    using revcom()

        # NB - rev_comp is a Bio::Seq object
     
        $rev_comp = $myseq->revcom;
  • Translating Dna to Rna

    using Dna_to_Rna()

        $rna_seq = $myseq->Dna_to_Rna;
  • Translating Rna to Dna

    using Rna_to_Dna()

        $dna_seq = $myseq->Rna_to_Dna;
  • Translating Dna or Rna to protein

    using translate()

        # NB - peptide_seq is a Bio::Seq object
    
        $peptide_seq = $myseq->translate;
  • Checking the sequence alphabet

    To check if any nonstandard characters are present in a biosequence, an alphabet_ok() method is provided. The method returns "1" if everything is OK, otherwise it returns a "0".

       if($myseq->alphabet_ok) { print "OK!!\n"; }
        else { print "Not OK! \n"; }

    To get alphabet itself, use the alphabet() method, which will return a string containing all characters in the current alphabet.

        $alph = $myseq->alphabet;

    To use restrictive alphabets that do not permit ambiguity codes, include '-strict => 1' in the parameters sent to new(). Or, for any existing sequence object, try:

        $myseq->strict(1); 
        $myseq->alphabet_ok() or die "alphabet not okay.\n";

FEEDBACK

Mailing Lists

User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one of the Bioperl mailing lists. Your participation is much appreciated.

    vsns-bcd-perl@lists.uni-bielefeld.de          - General discussion
    vsns-bcd-perl-guts@lists.uni-bielefeld.de     - Technically-oriented discussion
    http://bio.perl.org/MailList.html             - About the mailing lists

Reporting Bugs

Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via email or the web:

    bioperl-bugs@bio.perl.org                   
    http://bio.perl.org/bioperl-bugs/           

ACKNOWLEDGEMENTS

Some pieces of the code were contributed by Steven E. Brenner, Steve Chervitz, Ewan Birney, Tim Dudgeon, David Curiel, and other Bioperlers. Thanks !!!!

REFERENCES

BioPerl Project Page http://bio.perl.org/

VERSION

Bio::Seq.pm, beta 0.051

COPYRIGHT

 Copyright (c) 1996-1998 Chris Dagdigian, Georg Fuellen, Richard
 Resnick, and others All Rights Reserved. This module is free
 software; you can redistribute it and/or modify it under the same
 terms as Perl itself.

Appendix

The following documentation describes the various functions contained in this module. Some functions are for internal use and are not meant to be called by the user; they are preceded by an underscore ("_").

new

 Title     : new
 Usage     : $mySeq = Bio::Seq->new($file,$seq,$id,$desc,$names,
                         $start,$end,$type,$ffmt,$descffmt);
           :                - or -
           : $mySeq = Bio::Seq->new(-file=>$file,
                                   -seq=>$seq,
                                   -id=>$id,
                                   -desc=>$desc,
                                   -names=>$names,
                                   -start=>$start,
                                   -end=>$end,
                                   -type=>$type,
                                   -origin=>$origin,
                                   -ffmt=>$ffmt,
                                   -descffmt=>$descffmt);
 Function  : The constructor for this class, returns a new object.
 Example   : See usage
 Returns   : Bio::Seq object
 Argument  : $file: file from which the sequence data can be read; all
               the other arguments will overwrite the data read in.
               "_nofile" is recommanded if no file is given.
             $seq: String or array of characters
             $id: String describing the ID the user wishes to assign.
             $desc: String giving a description of the sequence
             $names: A reference to a hash which stores {loc,name}
                     pairs of other database locations and corresponding names
                     where the sequence is located.
             $start: The offset of the sequence, as an integer
             $end: The end point of the sequence, as an integer
             $type: The type of the sequence, see type()
             $origin: The sequence origin
             $ffmt: Sequence format, see ffmt()
             $descffmt: format of $desc, see descffmt()
    

## Internal methods ##

_initialize

 Title     : _initialize
 Usage     : n/a (internal function)
 Function  : Assigns initial parameters to a blessed object.
 Example   : 
 Returns   : 
 Argument  : As Bio::Seq->new, allows for named or listed parameters.
             See ->new for the legal types of these values.

_seq

 Title     : _seq()
 Usage     : n/a, internal function
 Function  : called by new() to set sequence field. Checks
           : alphabet before setting.
           :
 Returns   : n/a
 Argument  : sequence string

_monomer

 Title     : _monomer()
 Usage     : n/a, internal function
 Function  : Returns the internal monomer that represents
           : sequence type.
           :
           : Sequence type is treated internally as a monomer
           : defined by the %SeqAlph hash. The type field
           : is a list of format [monomer,origin]. For any
           : output outside the module, the monomer is resolved
           : back into string form via the %TypeSeq hash.
           :
 Returns   : original type setting [as monomer]
 Argument  : none

_file_read

 Title     : _file_read()
 Usage     : n/a (Internal Function)
 Function  : _file_read is called whenever the constructor is called 
           : with the name of a sequence to be read from disk.
           :
           : This function is now DEPRECATED. you should use the SeqIO
           : system
           :
 Example   : n/a, only called upon by _initialize()
 Returns   : 
 Argument  : 

## ACCESSORS ##

seq_len

 Title       : seq_len()
 Usage       : $len = $myseq->seq_len;
 Function    : Returns a value representing the sequence
             : length
             :
 Example     : see above
 Arguments   : none
 Returns     : integer

ary

 Title     : ary
 Usage     : ary([$start,[$end]])
 Function  : Returns the sequence of the object as an array, or a substring
             of the sequence if $start/$end are defined. If $start is
             defined and $end isn't, the substring is from $start to the
             end of the sequence.
 Example   : @slice = $myObject->ary(3,9);
 Returns   : array of characters
 Argument  : $start,$end (both integers). They are interpreted w.r.t. the
             specific numeration of the sequence!! ($self->{start})

str

 Title     : str
 Usage     : str([$start,[$end]])
 Function  : Returns the sequence of the object as a string, or a slice
             of the sequence if $start/$end are defined. If $start is
             defined and $end isn't, the slice is from $start to the
             end of the sequence.
 Example   : $slice = $myObject->str(3,9);
 Returns   : string scalar
 Argument  : $start,$end (both integers). They are interpreted w.r.t. the
             specific numeration of the sequence!! ($self->{start})

seq

 Title     : seq
 Usage     : seq([$start,[$end]])
 Function  : Returns the sequence of the object as an array or a char
             string, depending on the value of wantarray. Will rtn a slice
             of the sequence if $start/$end are defined. If $start is
             defined and $end isn't, the slice is from $start to the
             end of the sequence.
 Example   : @slice = $myObject->seq(3,9);
 Returns   : regular array of characters, or a scalar string
 Argument  : $start,$end (both integers). They are interpreted w.r.t. the
             specific numeration of the sequence!! ($self->{start})
 Comments  : 

getseq

 Title     : getseq
 Usage     : getseq([$start,[$end]])
 Function  : Returns the sequence of the object as an array or a char
             string, depending on the value of wantarray. Will rtn a slice
             of the sequence if $start/$end are defined. If $start is
             defined and $end isn't, the slice is from $start to the
             end of the sequence.
 Example   : @slice = $myObject->seq(3,9);
 Returns   : regular array of characters, or a scalar string
 Throws    : Warning about deprecated method.
 Argument  : $start,$end (both integers). They are interpreted w.r.t. the
             specific numeration of the sequence!! ($self->{start})

id

 Title     : id()
 Usage     : $seq_id = $myseq->id; 
           : $myseq->id($id_string);
           :
 Function  : Sets field if an ID argument string is
           : passed in. If no arguments, returns ID value for
           : object.
           :
 Returns   : original ID value
 Argument  : sequence string

desc

 Title     : desc()
 Usage     : $description = $myseq->desc; 
           : $myseq->desc($desc_string);
           :
 Function  : Sets field if an argument string is
           : passed in. If no arguments, returns original value for
           : object description field.
           :
 Returns   : original value for description
 Argument  : sequence string

names

 Title     : names()
 Usage     : %names = $myseq->names; 
           : $myseq->names($hash_ref);
           :
 Function  : Sets field if a name hash refrence is
           : passed in. If no arguments, returns original 
           : names hash.
           :
 Returns   : hash refrence (associative array)
 Argument  : refrence to a hash (associative array)

numbering

 Title     : numbering()
 Usage     : $num_start = $myseq->start; 
           : $myseq->start($value);
           :
 Function  : Sets field if an argument is
           : passed in. If no arguments, returns original value.
           :
           : (Deprecated - should switch to start())
 Returns   : original value 
 Argument  : new value

start

 Title     : start
 Usage     : $start = $myseq->start(); #get
           : $myseq->start($value); #set
 Function  : the set/get for the start position
 Example   :
 Returns   : start value 
 Arguments : new value

end

 Title     : end
 Usage     : $end = $myseq->end(); #get
           : $myseq->end($value); #set
 Function  : The set/get for the end position
 Example   :
 Returns   : end value 
 Arguments : new value

get_nse

 Title    : get_nse
 Usage    : $tag = $myseq->get_nse() #
 Function : gets a string like "name/start-end". This is likely
          : to be unique in an alignment/database
          : Used alot by SimpleAlign
 Example  :
 Returns  : A string
 Arguments: Two optional arguments - first being the name/ separator, second the
            start-end separator

origin

 Title     : origin()
 Usage     : myseq->origin($value) 
 Function  : Sets the origin field which is actually the second
           : field of the Type list. The {type} field is a 2 value list
           : with a format of ["Monomer","Origin"]
           :
 Returns   : Original value
 Argument  : string
 Comments  : SAC: Consider renaming this method to "organism()" or "species()". 
           : "origin" is ambiguous and can be easily confused with 
           : a coordinate data (0,0).

type

 Title     : type()
 Usage     : myseq->type($value) 
 Function  : Sets the type field which is the first
           : field of the Type list. The {type} field is a 2 value list
           : with a format of ["Monomer","Origin"]
           :
 Returns   : String containing one of the recognized sequence types:
           : 'unknown', 'dna', 'rna', 'amino', 'otherseq', 'aligned'
           : See the %Seq::SeqAlph hash for the current types.
 Argument  : string containing a valid sequence type
           : SAC: case of user-supplied argument does not matter

ffmt

 Title     : ffmt()
 Usage     : $format = $myseq->ffmt;
           : $myseq->ffmt("Fasta");
           : 
 Function  : The file format field is used by the internal
           : sequence parsing code when trying to read 
           : in a sequence file. It is also what is used
           : as a default output format if the layout
           : method is called without an argument.
           :
           : If a sequence object is created without
           : reading in a file, or if the file is read
           : in with the use of the ReadSeq package then
           : the ffmt field can be set to indicate any default
           : output-format preference.
           :
           : If a sequence is read from a file and parsed
           : by internal code (ReadSeq not used) then the ffmt
           : field should describe the format of the sequence
           : file. The ffmt field is used to send the sequence
           : to the correct internal parsing code.
           :
 Returns   : original ffmt value
 Argument  : recognized ffmt string value (see list of recognized 
           : formats) # SAC: What are they?! This list should be obvious.
           : Valid strings: 
           :    RAW, FASTA, GCG, IG, GENBANK, NBRF, EMBL, 
           :    MSF, PIR, GCG_SEQ, GCG_REF, STRIDER, ZUKER,
           : SAC: case of user-supplied argument does not matter

descffmt

 Title     : descffmt()
 Usage     : $desc = $myseq->descffmt;
           : $myseq->descffmt($new_value); 
 Function  : 
           :
 Returns   : original value
 Argument  : $new_value (one of the formats as defined in $SeqForm).
           : SAC: case of $new_value argument does not matter.

setseq

 Title     : setseq()
 Usage     : $self->setseq($new_sequence);
 Function  : Changes the sequence inside a bioseq object
           :
 Returns   : sequence string 
 Argument  : sequence string

parse

 Title     : parse
 Usage     : parse($ent,[$ffmt]);
 Function  : Invokes the proper parsing code depending on
           : the value of the object 'ffmt' field.
 Example   : $self->parse;
 Returns   : n/a
 Argument  : the prospective sequence to be parsed, 
           : and optionally its format so that it doesn't need to
           : be estimated
           : SAC: case of $ffmt argument does not matter.

parse_raw

 Title     : parse_raw
 Usage     : parse_raw;
 Function  : parses $ent into the $self->{"seq"} field, using Raw
           : file format.
 Example   : $self->parse_raw;
 Returns   : n/a
 Argument  : n/a

parse_genbank

 Title    : parse_genbank
 

= cut

sub parse_genbank { my ($self) = shift; my ($ent) = @_; my $seqstart = false; my $defstart = false;

  my @lines = split("\n", $ent);
  for ( @lines ) {
    chomp;
    
    m/LOCUS\s*(\S+)/ and $self->{"id"} = $1;
    
    m/DEFINITION\s*(.+)/ and do { $self->{"desc"} = $1; $defstart = true; };
    $defstart and do {
      m/^ {11}( .+)/ or $defstart = false;
      $defstart and $self->{"desc"} .= $1; };
    
    m/ORIGIN/ and do { $seqstart = true; next; };
    m!//! and $seqstart = false;
    $seqstart and do { s/[\s|\d]//g; $self->{"seq"} .= $_; };
  }
 
  return 1;
}

#_______________________________________________________________________

parse_fasta

 Title     : parse_fasta
 Usage     : parse_fasta;
 Function  : parses $ent into the "seq" field, using Fasta
           : file format.
           :
 To-do     : use benchmark module to find best/fastest parse
           : method
           :
 Example   : $self->parse_fasta;
 Returns   : n/a
 Argument  : n/a

parse_gcg

 Title    : parse_gcg
 Usage    : used by internal code
 Function : Parses the sequence out of a gcg-format string and
          : sets the object sequence field accordingly. This is
          : a simple, ineffecient method for grabbing JUST the
          : sequence.
          :
 To-do    : - parse out more info than just sequence 
          : - implement alphabet checking
          : - better regular expressions/efficiency
          : - carp on unexpected / wrong-format situations
          :
 Version  : .01 / 16 Jan 1997 
 Returns  : 1
 Argument : gcg-formatted sequence string

## METHODS FOR FILE FORMAT AND OUTPUT ##

#_______________________________________________________________________

layout

  Title    : layout()
 Usage     : layout([$format]);
 Function  : Returns the sequence in whichever format the user specifies,
             or in the "ffmt" field if the user does not specify a format.
 Example   : $fastaFormattedSeq = $myObj->layout("Fasta");
 Returns   : varies
 Argument  : $format (one of the formats as defined in $SeqForm).
           : SAC: case of $ffmt argument does not matter.

out_raw

 Title     : out_raw
 Usage     : out_raw;
 Function  : Returns the sequence in Raw format.
 Example   : $self->out_raw;
 Returns   : string sequence, in raw format
 Argument  : n/a

out_fasta

 Title     : out_fasta
 Usage     : out_fasta;
 Function  : Returns the sequence as a string in FASTA format.
 Example   : $self->out_fasta;
           :
 To-do     : benchmark code / find fastest method
           :
 Returns   : string sequence in Fasta format
 Argument  : n/a

alphabet_ok

 Title     : alphabet_ok
 Usage     : $myseq->alphabet_ok;
 Function  : Checks the sequence for presence of any characters
           : that are not considered valid members of the genetic
           : alphabet. In addition to the standard genetic alphabet
           : (see documentation), "?" and "-" characters are
           :  considered valid.
           :
 Example   : if($myseq->alphabet_ok) { print "OK!!\n"; }
           :     else { print "Not OK! \n"; }
           :
 Note      : Does not handle '\' characters in sequence robustly
           :
 Returns   : 1 if OK / 0 if not OK
 Argument  : none

alphabet

 Title     : alphabet
 Usage     : $myseq->alphabet;
 Function  : Returns the characters in the alphabet in use for the sequence.
 Example   : print "Alphabet: ".$myseq->alphabet;
 Returns   : string containing alphabet characters
 Argument  : none

GCG_checksum

 Title     : GCG_checksum
 Usage     : $myseq->GCG_checksum;
 Function  : returns a gcg checksum for the sequence
 Example   : 
 Returns   : 
 Argument  : none

trunc

 Title     : trunc
 Usage     : $trunc_seq = $mySeq->trunc(12,20);
 Function  : Returns a truncated part of the sequence, truncation
             happening by the ->str() call. This is just a convience call
             therefore for this object

 Returns   : Bio::Seq object ref.
 Argument  : start point, end point in biological coordinates

copy

 Title     : copy
 Usage     : $copyOfObj = $mySeq->copy;
 Function  : Returns an identical copy of the object.
 Example   :
 Returns   : Bio::Seq object ref.
 Argument  : n/a

revcom

 Title       : revcom
 Usage       : $reverse_complemented_seq = $mySeq->revcom;
 Function    : Returns a Bio::Seq object with the reverse
             : complement of a nucleotide object sequence
 Example     : $reverse_complemented_seq = $mySeq->revcom;
 Source      : Guts from Jong's <jong@mrc-lmb.cam.ac.uk>
             : library of molbio perl routines
 Note        :
             : The letter codes and compliment translations
             : are those proposed by IUB (Nomenclature Committee,
             : 1985, Eur. J. Biochem. 150; 1-5) and are also
             : used by the GCG package. The IUB/GCG letter codes
             : for nucleotide ambiguity are compatible with
             : EMBL, GenBank and PIR database formats but are
             : *NOT* compatible with Stadem/Sanger ambiguity
             : symbols. Staden/Sanger use different symbols to
             : represent uncertainty and frame abiguity.
             :
             : Currently Staden/Sanger are not recognized
             : sequence types.
             :
             : GCG Documentation on sequence symbols:
 URL         : http://www.neb.com/gcgdoc/GCGdoc/Appendices/appendix_iii.html
             :
 Translation :
             : GCG/IUB    Meaning        Complement
             : ------------------------------------
             :  A            A                T
             :  C            C                G
             :  G            G                C
             :  T            T                A
             :  U            U                A
             :  M          A or C             K
             :  R          A or G             Y
             :  W          A or T             W
             :  S          C or G             S
             :  Y          C or T             R
             :  K          G or T             M
             :  V        A or C or G          B
             :  H        A or C or T          D
             :  D        A or G or T          H
             :  B        C or G or T          V
             :  X      G or A or T or C       X
             :  N      G or A or T or C       N
             :--------------------------------------
 Revision    : 0.01 / 3 Jun 1997
 Returns     : A new sequence object
               to get the actual sequence go
               $actual_reversed_sequence = $seq->revcom()->str()
 Argument    : n/a

complement

 Title       : complement
 Usage       : $complemented_seq = $mySeq->compliment;
 Function    : Returns a char string containing 
             : the complementary sequence (eg; other strand)
             : of the original sequence. The translation method
             : is identical to revcom() but the nucleotide order
             : is not reversed. 
             :
             : To be honest *most* of the time you will want
             : to use revcom not this. Be careful!
             :
 Example     :  $complemented_seq = $mySeq->complement;
             :
 Source      : Guts from Jong's <jong@mrc-lmb.cam.ac.uk>
             : library of molbio perl routines
 Note        :
             : The letter codes and complement translations
             : are those proposed by IUB (Nomenclature Committee,
             : 1985, Eur. J. Biochem. 150; 1-5) and are also
             : used by the GCG package. The IUB/GCG letter codes
             : for nucleotide ambiguity are compatible with
             : EMBL, GenBank and PIR database formats but are
             : *NOT* compatible with Stadem/Sanger ambiguity
             : symbols. Staden/Sanger use different symbols to
             : represent uncertainty and frame abiguity.
             :
             : Currently Staden/Sanger are not recognized
             : sequence types.
             :
             : GCG Documentation on sequence symbols:
 URL         : http://www.neb.com/gcgdoc/GCGdoc/Appendices
             : /appendix_iii.html
             :
 Translation :
             : GCG/IUB    Meaning        Complement
             : ------------------------------------
             :  A            A                T
             :  C            C                G
             :  G            G                C
             :  T            T                A
             :  U            U                A
             :  M          A or C             K
             :  R          A or G             Y
             :  W          A or T             W
             :  S          C or G             S
             :  Y          C or T             R
             :  K          G or T             M
             :  V        A or C or G          B
             :  H        A or C or T          D
             :  D        A or G or T          H
             :  B        C or G or T          V
             :  X      G or A or T or C       X
             :  N      G or A or T or C       N
             :--------------------------------------
             :
 Revision    : 0.01 / 6 Dec 1996
 Returns     : char string
 Argument    : n/a

#_______________________________________________________________________'

reverse

 Title     : reverse
 Usage     : $reversed_seq = $mySeq->reverse;
 Function  : Returns a char string containing the
           : reverse of the object sequence
           :
           : Does *NOT* complement it. If you want
           : the other strand, use $mySeq->revcom()
           : 
 Example   :  $reversed_seq = $mySeq->reverse;
           :
 Revision  : 0.01 / 6 Dec 1996
 Returns   : char string
 Argument  : n/a

Dna_to_Rna

 Title     : Dna_to_Rna
 Usage     : $translated_seq = $mySeq->Dna_to_Rna;
 Function  : Returns a char string containing the
           : Rna translation of the Dna nucleotide sequence
           : (Replaces T with U)
           : 
 Example   : $translated_seq = $mySeq->Dna_to_Rna;
           :
 Source    : modified from Jong's <jong@mrc-lmb.cam.ac.uk>
           : library of molbio perl routines
           :
 Revision  : 0.01 / 6 Dec 1996
 Returns   : char string
 Argument  : n/a

Rna_to_Dna

 Title     : Rna_to_Dna
 Usage     : $translated_seq = $mySeq->Rna_to_Dna;
 Function  : Returns a char string containing the
           : Dna translation of the Rna nucleotide sequence
           : (Replaces U with T)
           : 
 Example   : $translated_seq = $mySeq->Rna_to_Dna;
           :
 Revision  : 0.01 / 16 MAR 1997
 Returns   : char string
 Argument  : n/a

translate

 Title     : translate
 Usage     : 
 Function  : Returns a new Bio::Seq object with the protein
           : translation from this sequence
           :
           : "*" is the default symbol for a stop codon
           : "X" is the default symbol for an unknown codon
           :
 Example   : $translation = $mySeq->translate;
           :   -or- with user defined stop/unknown codon symbols:
           : $translation = $mySeq->translate($stop_symbol,$unknown_symbol);
           : 
 Source    : modified from Jong's <jong@mrc-lmb.cam.ac.uk>
           : library of molbio perl routines
           :
 To-do     : - allow named parameters (just like new and out_GCG )
           : - allow "frame" parameter to pick translation frame
           :
 Revision  : 0.01 / 6 Dec 1996
 Returns   : new Sequence object. Its id is the original id.trans
 Argument  : n/a

dump

 Title     : dump
 Usage     : @results = $mySeq->dump; -or- 
           : $results = $mySeq->dump;
           :
 Function  : Returns a formatted array or string (depending on how it
           : is invoked) containing the contents of a 
           : Bio::Seq object. Useful for debugging
           :
           : ***This is used by Chris Dagdigian for debugging ***
           : ***Probably should be removed before distribution***
           :
 Example   :  @results = $mySeq->dump;
           :  foreach(@results){print;}
           :     -or-
           :  print $myseq->dump;
           :
 Returns   : Array or string depending on value of wantarray
 Argument  : n/a

out_bad

 Title     : out_bad()
 Usage     : out_bad;
 Function  : Throws a fatal error if we don't know the output format.
 Example   : $self->out_bad;
 Returns   : n/a
 Argument  : n/a

out_primer

 Title     : out_primer()
 Usage     : $formatted_seq = $myseq->out_primer;
           : @formatted_seq = $myseq->out_primer;
           :
           : print $myseq->out_primer(-id=>'New ID',
           :                          -header=>'This is my header');
           :
 Function  : outputs a sequence in primer format
           :
 Note      : Not a supported output type -  (cant be invoked via layout)
           : Use at your own risk :)
           : 
 Example   : see usage
           :
 Revision  : 0.01 / 20 Dec 1996
 Returns   : string or list, depending on how it is invoked
 Argument  : named list parameters for "id" and "header" are alowed

out_pir

 Title     : out_pir()
 Usage     : $formatted_seq = $myseq->layout("PIR");
           : $formatted_seq = $myseq->out_pir;
           : @formatted_seq = $myseq->out_pir;
           :
           : print $myseq->out_pir(-title=>'New TITLE',
           :                       -entry=>'New ENTRY',
           :                       -acc=>'User defined accession',
           :                       -date=>'User defined date',
           :                       -reference=>'User defined ref info');
           :
 Function  : Returns a string or an array depending on how it
           : is invoked. Can be easily accessed via the layout()
           : method, or if more output control is desired it can
           : be called directly with the folowing named parameters:
           :
           :  -entry      PIR entry
           :  -title      PIR title
           :  -acc        user defined accession number
           :  -reference  user defined reference
           :  -date       user defined date/time info
           :
           : All named parameters will take precedance over any
           : default behavior. When there are no user arguments,
           : the default output is as follows:
           :
           : PIR 'ENTRY'     = sequence object "id" field
           : PIR 'TITLE'     = sequence object "desc" field
           : PIR 'DATE'      = curent date/time
           : PIR 'ACC'       = not used in default output
           : PIR 'REFERENCE' = not used in default output
           :
 Note      : Not tested stringently.
           :
 WARNING   : Does not deal with numbering issue
           :
 To-do     : - Allow user to pass in hash of additional fields/values
           : - Deal with numbering issue
           :
 Example   : see usage
           :
 Revision  : 0.02 / 12 Jan 1997
 Returns   : string or list, depending on how it is invoked
 Argument  : named list parameters are allowed, see above

out_genbank

 Title     : out_genbank()
 Usage     : $formatted_seq = $myseq->out_genbank;
           : @formatted_seq = $myseq->out_genbank;
           : print $myseq->out_genbank(-id=>'New ID',
           :                           -def=>'User defined definition',
           :                           -acc=>'User defined accession',
           :                           -origin=>'User defined origin info',
           :                           -spacing=>'single',
           :                           -caps=>'up',
           :                           -date=>'DATE GOES HERE',
           :                           -type=>'mRna');
           :   
 Function  : Returns a GenBank formatted sequence array or string
           : depending on the value of wantarray when invoked via layout(). 
           : If more control is desired over output format, out_genbank() 
           : can be addressed directly with the following named parameters:
           :
           : def          - Sequence definition information
           : acc          - Sequence accession number
           : origin       - Sequence origin information
           : id           - short name 
           : date         - new date info
           : type         - sequence type (Dna, mRna, Amino, etc.)
           : spacing      - "single" or "double" sequence line spacing
           : caps         - "up" or "down" sequence capitalization
           :
           : When invoked via layout() or called directly with no 
           : arguments, the following default behaviours apply:
           :  DATE = Current date and time
           :  DEFINITION = object's description field
           :  ID = object's ID field
           :  SPACING = single
           :
           : All named parameters must be strings. Passed in parameters will
           : always take precedence over any fields with default settings.
           :
 Note      : Format not stringently tested for accuracy. Sequence is numbered
           : according to the integer specified in the object 'start' field
           : but the implementation has not been robustly tested.
           :
 To-do     : - allow user hash reference for additional format fields
           :
 Example   : see usage
           :
 Revision  : 0.02 / 12 Jan 1997
 Returns   : string or list, depending on how it is invoked
 Argument  : named list parameters are allowed, see above

out_GCG

 Title    : out_GCG
 Usage    : $formatted_seq = $mySeq->layout("GCG"); 
          : @formatted_seq = $mySeq->layout("GCG");
          : 
          : print $myseq->out_GCG(-id=>'New ID',
          :                      -spacing=>'single',
          :                      -caps=>'up',
          :                      -date=>'DATE GOES HERE',
          :                      -header=>'This is a user submitted header',
          :                      -type=>'n');
          :   
 Function : Returns a GCG formatted sequence array or string
          : depending on the value of wantarray when invoked via layout(). 
          : If more control is desired over output format, out_GCG() 
          : can be addressed directly with the following named parameters:
          :
          : header       - first line(s) of formatted sequence
          : id           - short name that appears before 'Length:' field
          : date         - overwrite default date info
          : type         - can be "N" or "P", for nucleotide/protein
          : spacing      - "single" or "double" sequence line spacing
          : caps         - "up" or "down" sequence capitalization
          :
          : When invoked via layout() or called directly with no 
          : arguments, the following default behaviours apply:
          :  DATE = Current date and time
          :  DEFINITION = object's description field
          :  ID = object's ID field
          :  SPACING = single
          :         
          : All named parameters must be strings. Passed in parameters will
          : always take precedence over any fields with default settings.
          :
 Example  :  
 Output   :
          :Sample Bio::Seq sequence
          : sample Length: 240  Wed Nov 27 13:24:28 EST 1996  Type: N Check: 5371  ..
          :
          :       1  aaaacctatg gggtgggctc tcaagctgag accctgtgtg cacagccctc
          :      51  tggctggtgg cagtggagac gggatnnnat gacaagcctg ggggacatga
          :     101  ccccagagaa ggaacgggaa caggatgagt gagaggaggt tctaaattat
          :     151  ccattagcac aggctgccag tggtccttgc ataaatgtat agagcacaca
          :     201  ggtgggggga aagggagaga gagaagaagc cagggtataa
          :
          :
 Note     : GCG formatted sequences contain a "Type:" field.
          : If Type cannot be internally determined and no
          : Type name-parameter is passed in then the Type: 
          : field is not printed.
          :
 Warning  : Unconventional numbering offsets may not
          : be robustly handled
          :
 Revision : 0.06 / 12 Jan 1997
 Source   : Found guts of this code on bionet.gcg, unknown author
 Returns  : Array or String
 Argument : n/a

out_nbrf

 Title     : out_nbrf()
 Usage     : $self->layout("NBRF") or $self->out_nbrf
           :
 Function  : FORMAT NOT INTERNALLY IMPLEMENTED YET!!!
           :
           : If the ReadSeq wrapper Parse.pm apppears 
           : to be configured properly it is used
           : to generate the output. 
           :
           : If Parse.pm cannot be used then this code
           : carps out with an error message.
           :
 To-do     : write internal output code
           :
 Version   : 1.0 /  16 MAR 1997
 Example   : see Usage
 Returns   : FORMATTED STRING (wantarray is not used here!)
 Argument  : 

out_gcgseq

 Title     : out_gcgseq
 Usage     : out_gcgseq;
 Function  : Returns the sequence as a string in GCG_SEQ format.
 Example   : $self->out_gcgseq;
           :
 Returns   : string sequence in GCG_SEQ format
 Argument  : n/a
 Comments  : SAC: Derived from out_fasta().
           : GCG_SEQ is a format that looks alot like Fasta and is used
           : for building GCG sequence datasets (.seq files).
           : It also has some similarities to NBRF format.

out_gcgref

 Title     : out_gcgref
 Usage     : out_gcgref;
 Function  : Returns the sequence as a string in GCG_REF format.
 Example   : $self->out_gcgref;
           :
 Returns   : string sequence in GCG_REF format
 Argument  : n/a
 Comments  : SAC: Derived from out_gcgseq().
           : GCG_REF is a companion format for GCG_SEQ that is used
           : for building GCG sequence datasets (.ref files).
           : The .ref file is identical to .seq file but without the sequence.

out_ig

 Title     : out_ig()
 Usage     : $self->layout("IG") or $self->out_ig
           :
 Function  : FORMAT NOT INTERNALLY IMPLEMENTED YET!!!
           :
           : If the ReadSeq wrapper Parse.pm apppears 
           : to be configured properly it is used
           : to generate the output. 
           :
           : If Parse.pm cannot be used then this code
           : carps out with an error message.
           :
 To-do     : write internal output code
           :
 Version   : 1.0 /  16 MAR 1997
 Example   : see Usage
 Returns   : FORMATTED STRING (wantarray is not used here!)
 Argument  : 

out_strider

 Title     : out_strider()
 Usage     : $self->layout("Strider") or $self->out_strider
           :
 Function  : FORMAT NOT INTERNALLY IMPLEMENTED YET!!!
           :
           : If the ReadSeq wrapper Parse.pm apppears 
           : to be configured properly it is used
           : to generate the output. 
           :
           : If Parse.pm cannot be used then this code
           : carps out with an error message.
           :
 To-do     : write internal output code
           :
 Version   : 1.0 /  16 MAR 1997
 Example   : see Usage
 Returns   : FORMATTED STRING (wantarray is not used here!)
 Argument  : 

out_zuker

 Title     : out_zuker()
 Usage     : $self->layout("Zuker") or $self->out_zuker
           :
 Function  : FORMAT NOT INTERNALLY IMPLEMENTED YET!!!
           :
           : If the ReadSeq wrapper Parse.pm apppears 
           : to be configured properly it is used
           : to generate the output. 
           :
           : If Parse.pm cannot be used then this code
           : carps out with an error message.
           :
 To-do     : write internal output code
           :
 Version   : 1.0 /  16 MAR 1997
 Example   : see Usage
 Returns   : FORMATTED STRING (wantarray is not used here!)
 Argument  : 

out_msf

 Title     : out_msf()
 Usage     : $self->layout("MSF") or $self->out_msf
           :
 Function  : FORMAT NOT INTERNALLY IMPLEMENTED YET!!!
           :
           : If the ReadSeq wrapper Parse.pm apppears 
           : to be configured properly it is used
           : to generate the output. 
           :
           : If Parse.pm cannot be used then this code
           : carps out with an error message.
           :
 To-do     : write internal output code
           :
 Version   : 1.0 /  16 MAR 1997
 Example   : see Usage
 Returns   : FORMATTED STRING (wantarray is not used here!)
 Argument  : 

parse_unknown

 Title     : parse_unknown
 Usage     : parse_unknown($ent);
 Function  : tries to figure out the format of $ent and then
           : calls the appropriate function to parse it into $self->{"seq"}.
 Example   : $self->parse_unknown;
 Returns   : n/a
 Argument  : $ent : the rough multi-line string to be parsed

parse_bad

 Title     : parse_bad
 Usage     : parse_bad;
 Function  : complains of un-parsable sequence, last-ditch attempt via
           : Parse.pm if sequence is being read from a file.
           :
 Example   : $self->parse_bad;
 Returns   : n/a
 Argument  : n/a

version

 Title     : version();
 Usage     : $myseq->version;
 Function  : prints Bio::Seq current version number

Bio::Seq Guts

Sequence Object

 The sequence object is merely a reference to a hash containing
 all or some of the following fields...

 Field         Value
 --------------------------------------------------------------
 seq           the sequence
 
 id            a short identifier for the sequence
 
 desc          a description of the sequence, in descffmt file-format
 
 names         a hash of identifiers that relate to the sequence..
               these could be Database ID's, Accession #'s, URL's,
               pathnames, etc. Currently there is no set format
               for the names hash and no formal definition of databases 
               or names
 
 start         start in bio-coords of the first residue of the sequence

 end           end in bio-coords of the first residue of the sequence
 
 type          the sequence type. Is actually a 2 value list of format
               ["monomer","origin"] where monomer is one of the
               recognized sequence types and origin is a string
               description of the sequences' origin (mitochondrial, etc)
 
 ffmt          file-format for the sequence
 
 descffmt      file-format of the description string