The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Bio::CUA::CUB::Calculator -- A module to calculate codon usage bias (CUB) indice for protein-coding sequences

SYNOPSIS

        use Bio::CUA::CUB::Calculator;

        my $calc = Bio::CUA::CUB::Calculator->new(
                   -codon_table => 1,
                           -tAI_values  => 'tai.out' # from Bio::CUA::CUB::Builder
                           );

        # calculate tAI for each sequence
        my $io = Bio::CUA::SeqIO->new(-file => "seqs.fa");
        or
        my $io = Bio::CUA::SeqIO->new(-file => "seqs.fa", -format => 'fasta');

        while(my $seq = $io->next_seq)
        {
                my $tai = $calc->tai($seq);
                printf("%10s: %.7f\n", $seq->id, $tai);
        }

DESCRIPTION

Codon usage bias (CUB) can be represented at two levels, codon and sequence. The latter is often computed as the geometric means of the sequence's codons. This module caculates CUB metrics at sequence level.

Supported CUB metrics include CAI (codon adaptation index), tAI (tRNA adaptation index), Fop (Frequency of optimal codons), ENC (Effective Number of Codons) and their variants. See the methods below for details.

METHODS

new

 Title   : new
 Usage   : my $calc=Bio::CUA::CUB::Calculator->new(@args);
 Function: initialize the calculator
 Returns : an object of this class
 Args    : a hash with following acceptable keys:
 
 B<Mandatory options>:
-codon_table
 the genetic code table applied for following sequence analyses. It
 can be specified by an integer (genetic code table id), an object of
 L<Bio::CUA::CodonTable>, or a map-file. See the method
 L<Bio::CUA::Summarizer/new> for details.
 B<options needed by FOP method>
-optimal_codons
 a file contains all the optimal codons, one codon per line. Or a
 hashref with keys being the optimal codons
 B<options needed by CAI method>
-CAI_values
 a file containing CAI values for each codon, excluding 3
 stop codons, so 61 lines with each line containing a codon and its
 value separated by space or tab.
 or
 a hashref with each key being a codon and each value being CAI index
 for the codon.
 B<options needed by tAI method>
-tAI_values
 similar to C<-CAI_values>, a file or a hash containing tAI value 
 for each codon.
 B<options needed by ENC method>
-base_background
 optional. 
 an arrayref containing base frequency of 4 bases (in the order 
 A,T,C, and G) derived from background data such as introns. 
 Or one of the following values: 'seq', 'seq3', which will lead to
 estimating base frequencies from each analyzed sequence in whol or
 its 3rd codon position, respectively.

 It can also be specified for each analyzed sequence with the methods
 L</encp> and L</encp_r>

sequence input

all the following methods accept one of the following formats as sequence input

  1.  string of nucleotide sequence with length of 3N, 
  2.  sequence object which has a method I<seq> to get the sequence string,
  3.    a sequence file in fasta format
  4.    reference to a codon count hash, like
       $codons = { 
               AGC => 50, 
           GTC => 124,
               ...    ...
               }.

cai

 Title   : cai
 Usage   : $caiValue = $self->cai($seq);
 Function: calculate the CAI value for the sequence
 Returns : a number, or undef if failed
 Args    : see L</"sequence input">
 Note: codons without synonymous competitors are excluded in
 calculation.

fop

 Title   : fop
 Usage   : $fopValue = $self->fop($seq[,$withNonDegenerate]);
 Function: calculate the fraction of optimal codons in the sequence
 Returns : a number, or undef if failed
 Args    : for sequence see L</"sequence input">.
 if optional argument '$withNonDegenerate' is true, then
 non-degenerate codons (those do not have synonymous partners) are
 included in calculation. Default is excluding these codons.

tai

 Title   : tai
 Usage   : $taiValue = $self->tai($seq);
 Function: calculate the tAI value for the sequence
 Returns : a number, or undef if failed
 Args    : for sequence see L</"sequence input">.

 Note: codons which do not have tAI values are ignored from input
 sequence

enc

 Title   : enc
 Usage   : $encValue = $self->enc($seq,[$minTotal]);
 Function: calculate ENC for the sequence using the original method 
 I<Wright, 1990, Gene>
 Returns : a number, or undef if failed
 Args    : for sequence see L</"sequence input">.
 Optional argument I<minTotal> specifies minimal count 
 for an amino acid; if observed count is smaller than this count, this
 amino acid's F will not be calculated but inferred. Deafult is 5.

 Note: when the F of a redundancy group is unavailable due to lack of
 sufficient data, it will be estimated from other groups following Wright's
 method, that is, F3=(F2+F4)/2, and for others, F=1/r where r is the
 degeneracy degree of that group.

enc_r

 Title   : enc_r
 Usage   : $encValue = $self->enc_r($seq,[$minTotal]);
 Function: similar to the method L</enc>, except that missing F values
 are estimated in a different way.
 Returns : a number, or undef if failed
 Args    : for sequence see L</"sequence input">.
 Optional argument I<minTotal> specifies minimal count 
 for an amino acid; if observed count is smaller than this count, this
 amino acid's F will not be calculated but inferred. Deafult is 5.

 Note: for missing Fx of degeneracy class 'x', we first estimated the
 ratio (1/Fx-1)/(x-1) by averaging the ratios of other degeneracy
 classes with known F values. Then Fx is obtained by solving the simple
 equation.

encp

 Title   : encp
 Usage   : $encpValue = $self->encp($seq,[$minTotal,[$A,$T,$C,$G]]);
 Function: calculate ENC for the sequence using the updated method 
 by Novembre I<2002, MBE>, which corrects the  background nucleotide 
 composition.
 Returns : a number, or undef if failed
 Args    : for sequence see L</"sequence input">.
 
 Optional argument I<minTotal> specifies minimal count 
 for an amino acid; if observed count is smaller than this count, this
 amino acid's F will not be calculated but inferred. Deafult is 5.

 another optional argument gives the background nucleotide composition
 in the order of A,T,C,G in an array, if not provided, it will use the
 default one provided when calling the method L</new>. If stil
 unavailable, error occurs.

encp_r

 Title   : encp_r
 Usage   : $encpValue =
 $self->encp_r($seq,[$minTotal,[$A,$T,$C,$G]]);
 Function: similar to the method L</encp>, except that missing F values
 are estimated using a different way.
 Returns : a number, or undef if failed
 Args    : for sequence see L</"sequence input">.
 
 Optional argument I<minTotal> specifies minimal count 
 for an amino acid; if observed count is smaller than this count, this
 amino acid's F will not be calculated but inferred. Deafult is 5.

 another optional argument gives the background nucleotide composition
 in the order of A,T,C,G in an array, if not provided, it will use the
 default one provided when calling the method L</new>. If stil
 unavailable, error occurs.

 Note: for missing Fx of degeneracy class 'x', we first estimated the
 ratio (1/Fx-1)/(x-1) by averaging the ratios of other degeneracy
 classes with known F values. Then Fx is obtained by solving the simple
 equation.

estimate_base_composition

 Title   : estimate_base_composition
 Usage   : @baseComp = $self->estimate_base_composition($seq,[$pos])
 Function: estimate base compositions in the sequence
 Returns : an array of numbers in the order of A,T,C,G, or its
 reference if in the scalar context
 Args    : a sequence string or a reference of hash containing codons
 and their counts (eg., AGG => 30), and optionally an integer; the integer
 specifies which codon position's nucleotide will be used instead of
 all three codon positions.

gc_fraction

 Title   : gc_fraction
 Usage   : $frac = $self->gc_fraction($seq,[$pos])
 Function: get fraction of GC content in the sequence
 Returns : a floating number between 0 and 1.
 Args    : a sequence string or a reference of hash containing codons
 and their counts (eg., AGG => 30), and optionally an integer; the integer
 specifies which codon position's nucleotide will be used for
 calculation (i.e., 1, 2, or 3), instead of all three positions.

expect_codon_freq

 Title   : expect_codon_freq
 Usage   : $codonFreq = $self->expect_codon_freq($base_composition)
 Function: return the expected frequency of codons
 Returns : reference to a hash in which codon is hash key, and
 fraction is hash value
 Args    : reference to an array of base compositions in the order of
 [A, T, C, G], represented as either counts or fractions

AUTHOR

Zhenguo Zhang, <zhangz.sci at gmail.com>

BUGS

Please report any bugs or feature requests to bug-bio-cua at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Bio-CUA. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc Bio::CUA::CUB::Calculator

You can also look for information at:

ACKNOWLEDGEMENTS

LICENSE AND COPYRIGHT

Copyright 2015 Zhenguo Zhang.

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.