The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

FASTAParse - A light-weight parsing module for handling FASTA formatted sequence within larger perl applications.

VERSION

This document describes FASTAParse version 0.0.2

SYNOPSIS

    # Manually creating a FASTA object:
    use FASTAParse;
    my $fasta = FASTAParse->new();
    $fasta->format_FASTA(
                         id          => 'example_0.0.1',
                         sequence    => 'ACGTCTCTCTCGAGAGGAGAGCTTCTCTCTAGGAGAG',
                         descriptors => ['Fake example sequence.', 'nucleotide'],
                         comments    => ['sequence is for illustration only'],
                         );
    $fasta->print();

    # Loading a FASTA object from a block of captured text:
    use FASTAParse;
    my $text = "
    >gi|55416189|gb|AAV50056.1| NADH dehydrogenase subunit 1 [Dasyurus hallucatus]   
    ;Taken from nr GenBank                                                           
    MFTINLLIYIIPILLAVAFLTLIERKMLGYMQFRKGPNIVGPYGLLQPFADAVKLFTKEPLRPLTSSISIFIIAPILALT 
    IALTIWTPLPMPNTLLDLNLGLIFILSLSGLSVYSILWSGWASNSKYALIGALRAVAQTISYEVSLAIILLSIMLINGSF 
    TLKTLSITQENLWLIITTWPLAMMWYISTLAETNRAPFDLTEGESELVSGFNVEYAAGPFAMFFLAEYANIIAMNAITTI 
    LFLGPSLTPNLSHLNTLSFMLKTLLLTMVFLWVRASYPRFRYDQLMHLLWKNFLPMTLAM
    ";
    my $fasta = FASTAParse->new();
    $fasta->load_FASTA( fasta => $text );
    my $id          = $fasta->id();
    my $sequence    = $fasta->sequence(); # Flat sequence.
    my @descriptors = @{ $fasta->descriptors() };

DESCRIPTION

FASTAParse is pretty simple in that it does one of two things: 1) loads a FASTA object from a chunk of text; 2) formats a FASTA object given explicit user input. See SYNOPSIS for example code for both functions. Once populated, individual sections of the FASTA entry may be pulled from the object. For further information on FASTA format, please see:

 http://en.wikipedia.org/wiki/Fasta_format
 http://blast.wustl.edu/doc/FAQ-Indexing.html

INTERFACE

new

new: Class constructor for FASTA.

    use FASTAParse;
    my $fasta = FASTAParse->new();

load_FASTA

load_FASTA: Method to populate the FASTA class with information. The "fasta" attribute passed to this method should be a chunk of FASTA text for a single entry. The text should retain all of the FASTA formatting, including the > header tag, line returns, ^A seperators, etc.

    $fasta->load_FASTA( fasta => $text );

format_FASTA

format_FASTA: Method to manually populate the FASTA class. Only ID and SEQUENCE are required. The SEQUENCE attribute should be a single, non-gapped line of text. The COLS attribute may be set to alter the column which line-wraps occur; default will be 60, 0 indicates no wrapping, and >80 is not recommeded as a general practice. The COMMENTS attribute is provided for placement after the header line: one or more comments, distinguished by a semi-colon at the beginning of the line, may occur. Most databases and bioinformatics applications do not recognize these comments so their use is discouraged, but they are part of the official format.

    $fasta->format_FASTA(
                         id          => 'example_0.0.1',
                         sequence    => 'ACGTCTCTCTCGAGAGGAGAGCTTCTCTCTAGGAGAG',
                         descriptors => ['Fake example sequence.', 'nucleotide'],
                         comments    => ['sequence is for illustration only'],
                         cols        => '75',
                         );

dump_FASTA

dump_FASTA: Method to dump the FASTA object back into a text chunk, retaining formatting. Returns a scalar.

    my $dumped = $fasta->dump_FASTA();

save_FASTA

dump_FASTA: Method to save the FASTA entry to a specified file, retaining formatting. Multiple calls to the same file will concatenate entries in the file.

    $fasta->save_FASTA( save => '/tmp/revised.fa' );

print

print: Method to print the object's contents in standard FASTA format to STDOUT.

    $fasta->print();

id

id: Accessor method to return the (scalar) FASTA ID.

    my $id = $fasta->id();

sequence

sequence: Accessor method to returen the (scalar) FASTA sequence. Sequence is returned as a single, non-gapped string.

    my $sequence = $fasta->sequence();

descriptors

descriptors: Accessor method to return an array reference to the list of descriptors in the FASTA object. Incoming FASTA text should have multi-part descriptors seperated by the ^A character on a single header line. As taken from http://blast.wustl.edu/doc/FAQ-Indexing.html

 A compound definition is a concatenation of multiple component
 definitions, each separated from the next by a single Control-A
 character (sometimes symbolized ^A; hex 0x01; or ASCII SOH [start of
 header]). Compound definitions are frequently seen
 (quasi-non-redundant) databases, where multiple instances of the
 exact same sequence are replaced by a single instance of the sequence
 with a concatenated definition line.

    my $descriptors_aref = $fasta->descriptors();

comments

comments: Accessor method to return an array reference to the list of comments in the FASTA object. Incoming FASTA text should have multi-part comments on their own lines, starting with the ; character. After the header line, one or more comments, distinguished by a semi-colon at the beginning of the line, may occur. Most databases and bioinformatics applications do not recognize these comments so their use is discouraged, but they are part of the official format.

    my $comments_aref = $fasta->comments();

CONFIGURATION AND ENVIRONMENT

FASTAParse requires no configuration files or environment variables.

DEPENDENCIES

None.

INCOMPATIBILITIES

None reported.

BUGS AND LIMITATIONS

No bugs have been reported.

Please report any bugs or feature requests to bug-fastaparse@rt.cpan.org, or through the web interface at http://rt.cpan.org.

AUTHOR

Todd Wylie

<perldev@monkeybytes.org>

http://www.monkeybytes.org

LICENSE AND COPYRIGHT

Copyright (c) 2006, Todd Wylie <perldev@monkeybytes.org>. All rights reserved.

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See perlartistic.

DISCLAIMER OF WARRANTY

BECAUSE THIS SOFTWARE IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE SOFTWARE, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE SOFTWARE "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE SOFTWARE IS WITH YOU. SHOULD THE SOFTWARE PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR, OR CORRECTION.

IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE SOFTWARE AS PERMITTED BY THE ABOVE LICENSE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE SOFTWARE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE SOFTWARE TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

NOTE

This software was written using the latest version of GNU Emacs, the extensible, real-time text editor. Please see http://www.gnu.org/software/emacs for more information and download sources.