The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

BioX::Seq - a (very) basic biological sequence object

SYNOPSIS

    use BioX::Seq;

    my $seq = BioX::Seq->new();

    for (qw/AATG TAGG CCAT TTGA/) {
        $seq .= $_;
    }

    $seq->id( 'test_seq' );

    my $rc = $seq->rev_com(); # original untouched
    print $seq->as_fasta();

    # >test_seq
    # AATGTAGGCCATTTGA

    $seq->rev_com(); # original modified in-place
    print $seq->as_fastq(22);

    # @test_seq
    # TCAAATGGCCTACATT
    # +
    # 7777777777777777

    print $seq->range(3,6)->as_fasta();

    # >test_seq
    # AAAT

DESCRIPTION

BioX::Seq is a simple sequence class that can be used to represent biological sequences. It was designed as a compromise between using simple strings and hashes to hold sequences and using the rather bloated objects of Bioperl. Features (or, depending on your viewpoint, bugs) include auto-stringification and context-dependent transformations. It is meant be used primarily as the return object of the BioX::Seq::Stream and BioX::Seq::Fetch parsers, but there may be occasions where it is useful in its own right.

BioX::Seq current implements a small subset of the transformations most commonly used by the author (reverse complement, translate, subrange) - more methods may be added in the future as use suggests and time permits, but the core object will be kept as simple as possible and should be limited to the four current properties - sequence, ID, description, and quality - that satisfy 99% of the author's needs.

Some design decisions have been made for the sake of speed over ease of use. For instance, there is no sanity-checking of the object properties upon creation of a new object or use of the accessor methods. Parameters to the constructor are positional rather than named (testing indicates that this reduces execution times by ~ 40%).

METHODS

new
new SEQUENCE
new SEQUENCE ID
new SEQUENCE ID DESCRIPTION
new SEQUENCE ID DESCRIPTION QUALITY

Create a new BioX::Seq object (empty by default). All arguments are optional but are positional and, if provided, must be given in order.

    $seq = BioX::Seq->new( SEQ, ID, DESC, QUALITY );

Returns a new BioX::Seq object.

seq, id, desc, qual

Accessors to the object properties named accordingly. Properties can also be accessed directly as hash keys. This is probably frowned upon by some, but can be useful at times e.g. to perform substution on a property in-place.

    $seq->{id} =~ s/^Unnecessary_prefix//;

Takes zero or one arguments. If an argument is given, assigns that value to the property in question. Returns the current value of the property.

range START END

Extract a subsequence from START to END. Coordinates are 1-based.

Returns a new BioX::Seq object, or undef if the coordinates are outside the limits of the parent sequence.

rev_com

Reverse complement the sequence.

Behavior is context-dependent. In scalar or list context, returns a new BioX::Seq object containing the reverse-complemented sequence, leaving the original sequence untouched. In void context, updates the original sequence in-place and returns TRUE if successful.

translate
translate FRAME

Translate a nucleic acid sequence to a peptide sequence.

FRAME specifies the starting point of the translation. The default is zero. A FRAME value of 0-2 will return the translation of each of the three forward reading frames, respectively, while a value of 3-5 will return the translation of each of the three reverse reading frames, respectively.

as_fasta
as_fasta LINE_LENGTH

Returns a string representation of the sequence in FASTA format. Requires that, at a minimum, the <seq> and <id> properties be defined. LINE_LENGTH, if given, specifies the line length for wrapping purposes (default: 60).

as_fastq
as_fastq DEFAULT_QUALITY

Returns a string representation of the sequence in FASTQ format. Requires that, at a minimum, the <seq> and <id> properties be defined. DEFAULT_QUALITY, if given, specifies the default Phred quality score to be assigned to each base if missing - for instance, if converting from FASTA to FASTQ (default: 20).

as_input
as_input ARGUMENT

If the sequence object comes from a BioX::Seq::Stream instance, this method will format the sequence to match the input format, calling either BioX::Seq::as_fasta or BioX::Seq::as_fastq as appropriate. The optional argument, if given, will be passed on to the appropriate method and evaluated in that context. Throws an error if the input format cannot be deduced (probably because the object was not created by a BioX::Seq::Stream parser).

CAVEATS AND BUGS

No input validation is performed during construction or modification of the object properties.

Performing certain operations (for instance, s///) on a BioX::Seq object relying on auto-stringification may convert the object into a simple unblessed scalar containing the sequence string. You will likely know if this happens (you are using strict and using warnings, right?) because your script will throw an error if you try to perform a class method on the (now) unblessed scalar.

Please reports bugs or feature requests through the issue tracker at https://github.com/jvolkening/p5-BioX-Seq/issues.

AUTHOR

Jeremy Volkening <jeremy.volkening *at* base2bio.com>

COPYRIGHT AND LICENSE

Copyright 2014-2022 Jeremy Volkening

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.