NAME
BioX::SeqUtils::RandomSequence - Creates a random nuc or prot sequence with given nuc frequencies
VERSION
This document describes BioX::SeqUtils::RandomSequence version 0.9.3
SYNOPSIS
The package includes scripts for random nucleotide, dinucleotide, protein, and protein set. The length and frequency parameters should always be integers.
To create a nucleotide:
./random-nucleotide.pp # Defaults: length 60, all frequencies .25
./random-nucleotide.pp -l2200 -a23 -c27 -g27 -t23 # Enrich GC content with length 2200
To create a dinucleotide:
./random-dinucleotide.pp # Defaults: length 2, all frequencies .25
./random-dinucleotide.pp -a225 -c275 -g275 -t225 # Enrich GC content ~ more
To create a protein:
./random-protein.pp # Defaults: length 60, all frequencies .25
./random-protein.pp -l2200 -a23 -c27 -g27 -t23 # Enrich underlying GC content, aa length 2200
To create a protein set (with common DNA shifted by one base):
./random-protein-set.pp # Defaults: length 60, all frequencies .25
./random-protein-set.pp -l2200 -a23 -c27 -g27 -t23 # Enrich underlying GC content
Additionally, a "master script" uses a tYpe parameter for any:
./random-sequence.pp -yn -l100 # Type n nucleotide
./random-sequence.pp -yd # Type d dinucleotide
./random-sequence.pp -yp -l100 # Type p protein
./random-sequence.pp -ys -l100 # Type s protein set
This module uses Bio::Tools::CodonTable for translations, and the parameter s can be used to change from the default (1) Standard:
./random-protein.pp -l2200 -s2 # Non-standard codon table
In script, each sequence type can be accessed using the "y" (tYpe) parameter with rand_seq(). The default is "nucleotide". The type may be set in new() or any of the rand_X() methods. All four frequencies are set to "1" by default ( so that the probablity of each A, C, G, T is 0.25 ).
use BioX::SeqUtils::RandomSequence;
my $randomizer = BioX::SeqUtils::RandomSequence->new({ l => $length,
s => 1,
y => "nucleotide",
a => $a_frequency,
c => $c_frequency,
g => $g_frequency,
t => $t_frequency });
print $randomizer->rand_seq(), "\n";
You can use the same randomizer object to create all types of sequences, by passing the changing parameters with each call.
my $nuc_short = $randomizer->rand_seq({ y => 'n', l => 21 });
my $nuc_long = $randomizer->rand_seq({ l => 2200 }); # Still nucleotide
my $nuc_richer = $randomizer->rand_seq({ a => 225,
c => 275,
g => 275,
t => 225 }); # Still length 2200
my $protein_now = $randomizer->rand_seq({ y => 'p' }); # Still richer GC
my $dinuc_for_fun = $randomizer->rand_seq({ y => 'd',
a => 1 }); # Missing bases resets all freq to 1
my $protein_new = $randomizer->rand_seq({ y => 'p',
s => 3 }); # Use codon table 'Yeast Mitochondrial'
Type "protein" creates a protein of the given length l by creating a random nucleotide sequence with the given nucleotide frequencies of length l * 3, which is translated into a protein. The default length is 60.
my $randomizer = BioX::SeqUtils::RandomSequence->new();
print $randomizer->rand_seq({ y = "protein" }), "\n";
Type "set" creates a test protein set each with the given length l by creating a random nucleotide sequence with the given nucleotide frequencies of length l * 3 + 1, removing the first base for sequence 1 and removing the last base for sequence 2, then translating them into proteins.
print join( " ", @{ $randomizer->rand_seq({ y = "set" }) }, "\n";
The indvidual methods may be preferred:
my $nucleotide = $randomizer->rand_nuc();
my $dinucleotide = $randomizer->rand_nuc({ l => 2 });
my $protein = $randomizer->rand_pro();
The rand_pro_set() method uses wantarray(), and will either return a list or list reference (scalar) depending on the context:
my ($pro1, $pro2) = $randomizer->rand_pro_set();
my $protein_set = $randomizer->rand_pro_set();
DESCRIPTION
Create random nucleotide and protein sequences.
CONFIGURATION AND ENVIRONMENT
None.
DEPENDENCIES
Class::Std; Class::Std::Utils; Bio::Tools::CodonTable;
INCOMPATIBILITIES
None reported.
BUGS AND LIMITATIONS
No bugs have been reported.
Please report any bugs or feature requests to bug-biox-sequtils-randomsequence@rt.cpan.org
, or through the web interface at http://rt.cpan.org.
AUTHOR
Roger A Hall <rogerhall@cpan.org>
LICENSE AND COPYRIGHT
Copyleft (c) 2009, Roger A Hall <rogerhall@cpan.org>
. All rights reserved.
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See perlartistic.
DISCLAIMER OF WARRANTY
BECAUSE THIS SOFTWARE IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE SOFTWARE, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE SOFTWARE "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE SOFTWARE IS WITH YOU. SHOULD THE SOFTWARE PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR, OR CORRECTION.
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE SOFTWARE AS PERMITTED BY THE ABOVE LICENCE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE SOFTWARE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE SOFTWARE TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.