The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

BioX::SeqUtils::RandomSequence - Creates a random nuc or prot sequence with given nuc frequencies

VERSION

This document describes BioX::SeqUtils::RandomSequence version 0.9.2

SYNOPSIS

The package includes scripts for random nucleotide, dinucleotide, protein, and protein set. The length and frequency parameters should always be integers.

To create a nucleotide:

    ./random-nucleotide.pp                               # Defaults: length 60, all frequencies .25
    ./random-nucleotide.pp -l2200 -a23 -c27 -g27 -t23    # Enrich GC content with length 2200

To create a dinucleotide:

    ./random-dinucleotide.pp                             # Defaults: length 2, all frequencies .25
    ./random-dinucleotide.pp -a225 -c275 -g275 -t225     # Enrich GC content ~ more 

To create a protein:

    ./random-protein.pp                                  # Defaults: length 60, all frequencies .25
    ./random-protein.pp -l2200 -a23 -c27 -g27 -t23       # Enrich underlying GC content, aa length 2200

To create a protein set (with common DNA shifted by one base):

    ./random-protein-set.pp                              # Defaults: length 60, all frequencies .25
    ./random-protein-set.pp -l2200 -a23 -c27 -g27 -t23   # Enrich underlying GC content 

Additionally, a "master script" uses a tYpe parameter for any:

    ./random-sequence.pp -yn -l100                       # Type n nucleotide
    ./random-sequence.pp -yd                             # Type d dinucleotide
    ./random-sequence.pp -yp -l100                       # Type p protein
    ./random-sequence.pp -ys -l100                       # Type s protein set

In script, each sequence type can be accessed using the "y" (tYpe) parameter with rand_seq(). The default is "nucleotide". The type may be set in new() or any of the rand_X() methods. All four frequencies are set to "1" by default ( so that the probablity of each A, C, G, T is 0.25 ).

    use BioX::SeqUtils::RandomSequence;

    my $randomizer = BioX::SeqUtils::RandomSequence->new({ l => $length, 
                                                           y => "nucleotide",
                                                           a => $a_frequency,
                                                           c => $c_frequency,
                                                           g => $g_frequency,
                                                           t => $t_frequency });
    print $randomizer->rand_seq(), "\n";

You can use the same randomizer object to create all types of sequences, by passing the changing parameters with each call.

    my $nuc_short     = $randomizer->rand_seq({ y => 'n', l => 21 });
    my $nuc_long      = $randomizer->rand_seq({ l => 2200 });          # Still nucleotide
    my $nuc_richer    = $randomizer->rand_seq({ a => 225, 
                                                c => 275, 
                                                g => 275, 
                                                t => 225 });           # Still length 2200
    my $protein_now   = $randomizer->rand_seq({ y => 'p' });           # Still richer GC
    my $dinuc_for_fun = $randomizer->rand_seq({ y => 'd',
                                                a => 1 });             # Missing bases resets all freq to 1

Type "protein" creates a protein of the given length l by creating a random nucleotide sequence with the given nucleotide frequencies of length l * 3, which is translated into a protein. The default length is 60.

    my $randomizer = BioX::SeqUtils::RandomSequence->new();
    print $randomizer->rand_seq({ y = "protein" }), "\n";
    

Type "set" creates a test protein set each with the given length l by creating a random nucleotide sequence with the given nucleotide frequencies of length l * 3 + 1, removing the first base for sequence 1 and removing the last base for sequence 2, then translating them into proteins.

    print join( " ", @{ $randomizer->rand_seq({ y = "set" }) }, "\n";
  

The indvidual methods may be preferred:

    my $nucleotide    = $randomizer->rand_nuc();
    my $dinucleotide  = $randomizer->rand_nuc({ l =>2 });
    my $protein       = $randomizer->rand_pro();

    my ($pro1, $pro2) = @{ $randomizer->rand_pro_set() };

DESCRIPTION

Create random nucleotide and protein sequences.

CONFIGURATION AND ENVIRONMENT

None.

DEPENDENCIES

Class::Std; Class::Std::Utils; Bio::Tools::CodonTable;

INCOMPATIBILITIES

None reported.

BUGS AND LIMITATIONS

No bugs have been reported.

Please report any bugs or feature requests to bug-biox-sequtils-randomsequence@rt.cpan.org, or through the web interface at http://rt.cpan.org.

AUTHOR

Roger A Hall <rogerhall@cpan.org>

LICENSE AND COPYRIGHT

Copyleft (c) 2009, Roger A Hall <rogerhall@cpan.org>. All rights reserved.

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See perlartistic.

DISCLAIMER OF WARRANTY

BECAUSE THIS SOFTWARE IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE SOFTWARE, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE SOFTWARE "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE SOFTWARE IS WITH YOU. SHOULD THE SOFTWARE PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR, OR CORRECTION.

IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE SOFTWARE AS PERMITTED BY THE ABOVE LICENCE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE SOFTWARE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE SOFTWARE TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.