The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Seq::Parse - The Bioperl ReadSeq interface

SYNOPSIS

Simple perl interface/wrapper to D.G. Gilbert's ReadSeq program. Used by Seq.pm when internal parsing/formatting code fails.

**NOTE** Not currently used by any of the core bioperl modules. It can be used as a standalone interface to the readseq package but manual editing of is required. See the first few lines of the .pm file for details.

DESCRIPTION

This package was called upon by Seq.pm when internal attemts to format or parse a sequence fail. It is currently not used by any bioperl core module. Basically we decided to deal with sequence formatting in a different way.

Parse.pm is a simple interface to D.G. Gilbert's ReadSeq program, it is not meant to be particularly elegant or efficient. The interface should be abstract enough to allow future versions to seamlessly access other sequence conversion programs besides ReadSeq.

At this time the interface methods have not been fully thought out or implemented. Suggestions are welcome.

If ReadSeq is not on the local system, or this package is not properly configured, Seq.pm will (hopefully) realize this and not attempt to use this code.

USAGE

The ReadSeq executable needs to be installed on your system.

Readseq is freely distributed and is available in shell archive (.shar) form via FTP from ftp.bio.indiana.edu (129.79.224.25) in the molbio/readseq directory. (URL) ftp://ftp.bio.indiana.edu/molbio/readseq/

Standalone

 use Parse;

With Seq.pm

If properly configured, Seq.pm will automatically use this module when internal methods at parsing or formatting fail.

The correct path to the readseq executable is configured into this module during the 'make Makefile.PL' phase of installation.

Manual edits needed in Parse.pm if auto-configuration does not happen:

- Change the value of $READSEQ_PATH so that it defines a path to the ReadSeq executable on your system.

- Uncomment the line(s) containing $OK = "Y"

As a standalone module

Parse.pm should be usable is a standalone module. See the usage instructions.

Sequence Conversion/Formatting

ReadSeq has trouble with raw sequences so an explicit convert_from_raw() method has been written. The following code will return the sequence "GAATTCGTT" as a GCG formatted string.

 $reply  = &Parse::convert_from_raw(-sequence=>'GAATTCGTT',
                                    -fmt=>'gcg'); 

The "fmt" named-parameter field can be set for the following formats:

 IG        (or 'Stanford')
 GenBank   (or 'GB')
 NBRF
 EMBL
 GCG
 Strider
 Fitch
 Fasta
 Zuker
 Phylip3.2 (use 'Phylip3')
 Phylip
 Plain     (or 'Raw')
 PIR       (or 'CODATA')
 MSF
 ASN.1     (use 'ASN1')
 PAUP
 Pretty

The "options" named-parameter field can be used to pass switches directly to the ReadSeq executable. This option should only be used by people familiar with operating ReadSeq on the command-line. Use at your own risk as this has not been fully tested.

As an example, the ReadSeq switch -c will cause all of the characters in the formatted sequence to be returned in lowercase.

 $reply  = &Parse::convert_from_raw(-sequence=>"$seq_string",
                                    -options=>'-c', 
                                    -fmt=>'gcg'); 

Appendix

The following documentation describes the various functions contained in this package. Some functions are for internal use and are not meant to be called by the user; they are preceded by an underscore ("_").

## Internal methods ##

_rearrange()

 Title     : _rearrange
 Usage     : n/a (internal function)
 Function  : Rearranges named parameters to requested order.
 Example   : &_rearrange([SEQUENCE,ID,DESC],@p);
 Returns   : @params - an array of parameters in the requested order.
 Argument  : $order : a reference to an array which describes the desired
                      order of the named parameters.
             @param : an array of parameters, either as a list (in
                      which case the function simply returns the list),
                      or as an associative array (in which case the
                      function sorts the values according to @{$order}
                      and returns that new array.

_write_tmp_file()

 Title     : _write_tmp_file
 Usage     : n/a (internal function)
 Function  : Writes a temporary file to disk. Uses
           : the POSIX tmpnam() call to get path &
           : filename. Should be more portable than
           : just writing to /tmp. 
           :
 Example   : &_write_tmp_file("$formatted_sequence");
 Returns   : string containing the temp file path 
 Argument  : string that is to be written to disk
 

version()

 Title     : version
 Usage     : &Parse::version;
 Function  : Prints current package version 
 Example   : &Parse::version;
 Returns   : none
 Argument  : none
           :

convert_from_raw()

 Title     : convert_from_raw()
 Usage     : print &Parse::convert_from_raw(-sequence=>$raw_seq,
           :                                -fmt=>'asn1');
           :
           : $reply  = &Parse::convert_from_raw(-sequence=>'GAATTCGTT',
           :                                    -options=>'-c',
           :                                    -fmt=>'gcg'); 
           :
 Function  : ReadSeq does not function well when called upon 
           : to read or convert "raw" or unformatted sequence 
           : strings or files. This code will take a given 
           : raw sequence and manipulate it into FASTA
           : format before invoking ReadSeq.
           :
           : The following named paramters may be used as
           : arguments:
           :
           :  -sequence=>     Sequence string.
           :  -fmt=>          Format sequence will be converted to. 
           :  -options=>      String containing command-line
           :                  switches for ReadSeq. Passed
           :                  directly.
           :
 Example   : see usage
 Returns   : Formatted sequence string 
 Argument  : named parameters, see function
           :

convert()

 Title     : convert
           :
 Usage     : print &Parse::convert(-sequence=>$raw_seq,
           :                       -fmt=>'asn1');
           :
           : $reply  = &Parse::convert(-sequence=>'GAATTCGTT',
           :                           -options=>'-c',
           :                           -fmt=>'gcg'); 
           :
           : $reply  = &Parse::convert(-location=>'/tmp/a.seq',
           :                           -fmt=>'raw'); 
           :
 Note      : ReadSeq does not function well when called upon 
           : to read or convert "raw" or unformatted sequence 
           : strings or files. User beware.
           : 
 Function  : Will read/parse a given sequence string *OR* a given
           : sequence file.
           :
           : If a sequence string AND a sequence file path are
           : both passed in, the file path will be used with no
           : complaint.
           :
           : The following named paramters may be used as
           : arguments:
           : 
           :  -sequence=>     Sequence string.
           :  -location=>     Sequence file path.
           :  -fmt=>          Format sequence will be converted to. 
           :  -options=>      String containing command-line
           :                  switches for ReadSeq. Passed
           :                  directly.
           :
 Example   : see usage
 Returns   : Formatted sequence string 
 Argument  : named parameters, see function
           :

ACKNOWLEDGEMENTS

SEE ALSO

 Core bioperl modules

REFERENCES

Bioperl Project http://bio.perl.org

COPYRIGHT

Copyright (c) 1997-1998 Chris Dagdigian, Georg Fuellen, Steven E. Brenner and others. All Rights Reserved. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.