Bio::CUA::CUB::Calculator -- A module to calculate codon usage bias (CUB) indice for protein-coding sequences
use Bio::CUA::CUB::Calculator; my $calc = Bio::CUA::CUB::Calculator->new( -codon_table => 1, -tAI_values => 'tai.out' # from Bio::CUA::CUB::Builder ); # calculate tAI for each sequence my $io = Bio::CUA::SeqIO->new(-file => "seqs.fa"); or my $io = Bio::CUA::SeqIO->new(-file => "seqs.fa", -format => 'fasta'); while(my $seq = $io->next_seq) { my $tai = $calc->tai($seq); printf("%10s: %.7f\n", $seq->id, $tai); }
Codon usage bias (CUB) can be represented at two levels, codon and sequence. The latter is often computed as the geometric means of the sequence's codons. This module caculates CUB metrics at sequence level.
Supported CUB metrics include CAI (codon adaptation index), tAI (tRNA adaptation index), Fop (Frequency of optimal codons), ENC (Effective Number of Codons) and their variants. See the methods below for details.
Title : new Usage : my $calc=Bio::CUA::CUB::Calculator->new(@args); Function: initialize the calculator Returns : an object of this class Args : a hash with following acceptable keys: B<Mandatory options>:
-codon_table
the genetic code table applied for following sequence analyses. It can be specified by an integer (genetic code table id), an object of L<Bio::CUA::CodonTable>, or a map-file. See the method L<Bio::CUA::Summarizer/new> for details.
B<options needed by FOP method>
-optimal_codons
a file contains all the optimal codons, one codon per line. Or a hashref with keys being the optimal codons
B<options needed by CAI method>
-CAI_values
a file containing CAI values for each codon, excluding 3 stop codons, so 61 lines with each line containing a codon and its value separated by space or tab. or a hashref with each key being a codon and each value being CAI index for the codon.
B<options needed by tAI method>
-tAI_values
similar to C<-CAI_values>, a file or a hash containing tAI value for each codon.
B<options needed by ENC method>
-base_background
optional. an arrayref containing base frequency of 4 bases (in the order A,T,C, and G) derived from background data such as introns. Or one of the following values: 'seq', 'seq3', which will lead to estimating base frequencies from each analyzed sequence in whol or its 3rd codon position, respectively. It can also be specified for each analyzed sequence with the methods L</encp> and L</encp_r>
all the following methods accept one of the following formats as sequence input
string of nucleotide sequence with length of 3N,
sequence object which has a method I<seq> to get the sequence string,
a sequence file in fasta format
reference to a codon count hash, like $codons = { AGC => 50, GTC => 124, ... ... }.
Title : cai Usage : $caiValue = $self->cai($seq); Function: calculate the CAI value for the sequence Returns : a number, or undef if failed Args : see L</"sequence input"> Note: codons without synonymous competitors are excluded in calculation.
Title : fop Usage : $fopValue = $self->fop($seq[,$withNonDegenerate]); Function: calculate the fraction of optimal codons in the sequence Returns : a number, or undef if failed Args : for sequence see L</"sequence input">. if optional argument '$withNonDegenerate' is true, then non-degenerate codons (those do not have synonymous partners) are included in calculation. Default is excluding these codons.
Title : tai Usage : $taiValue = $self->tai($seq); Function: calculate the tAI value for the sequence Returns : a number, or undef if failed Args : for sequence see L</"sequence input">. Note: codons which do not have tAI values are ignored from input sequence
Title : enc Usage : $encValue = $self->enc($seq,[$minTotal]); Function: calculate ENC for the sequence using the original method I<Wright, 1990, Gene> Returns : a number, or undef if failed Args : for sequence see L</"sequence input">. Optional argument I<minTotal> specifies minimal count for an amino acid; if observed count is smaller than this count, this amino acid's F will not be calculated but inferred. Deafult is 5. Note: when the F of a redundancy group is unavailable due to lack of sufficient data, it will be estimated from other groups following Wright's method, that is, F3=(F2+F4)/2, and for others, F=1/r where r is the degeneracy degree of that group.
Title : enc_r Usage : $encValue = $self->enc_r($seq,[$minTotal]); Function: similar to the method L</enc>, except that missing F values are estimated in a different way. Returns : a number, or undef if failed Args : for sequence see L</"sequence input">. Optional argument I<minTotal> specifies minimal count for an amino acid; if observed count is smaller than this count, this amino acid's F will not be calculated but inferred. Deafult is 5. Note: for missing Fx of degeneracy class 'x', we first estimated the ratio (1/Fx-1)/(x-1) by averaging the ratios of other degeneracy classes with known F values. Then Fx is obtained by solving the simple equation.
Title : encp Usage : $encpValue = $self->encp($seq,[$minTotal,[$A,$T,$C,$G]]); Function: calculate ENC for the sequence using the updated method by Novembre I<2002, MBE>, which corrects the background nucleotide composition. Returns : a number, or undef if failed Args : for sequence see L</"sequence input">. Optional argument I<minTotal> specifies minimal count for an amino acid; if observed count is smaller than this count, this amino acid's F will not be calculated but inferred. Deafult is 5. another optional argument gives the background nucleotide composition in the order of A,T,C,G in an array, if not provided, it will use the default one provided when calling the method L</new>. If stil unavailable, error occurs.
Title : encp_r Usage : $encpValue = $self->encp_r($seq,[$minTotal,[$A,$T,$C,$G]]); Function: similar to the method L</encp>, except that missing F values are estimated using a different way. Returns : a number, or undef if failed Args : for sequence see L</"sequence input">. Optional argument I<minTotal> specifies minimal count for an amino acid; if observed count is smaller than this count, this amino acid's F will not be calculated but inferred. Deafult is 5. another optional argument gives the background nucleotide composition in the order of A,T,C,G in an array, if not provided, it will use the default one provided when calling the method L</new>. If stil unavailable, error occurs. Note: for missing Fx of degeneracy class 'x', we first estimated the ratio (1/Fx-1)/(x-1) by averaging the ratios of other degeneracy classes with known F values. Then Fx is obtained by solving the simple equation.
Title : estimate_base_composition Usage : @baseComp = $self->estimate_base_composition($seq,[$pos]) Function: estimate base compositions in the sequence Returns : an array of numbers in the order of A,T,C,G, or its reference if in the scalar context Args : a sequence string or a reference of hash containing codons and their counts (eg., AGG => 30), and optionally an integer; the integer specifies which codon position's nucleotide will be used instead of all three codon positions.
Title : gc_fraction Usage : $frac = $self->gc_fraction($seq,[$pos]) Function: get fraction of GC content in the sequence Returns : a floating number between 0 and 1. Args : a sequence string or a reference of hash containing codons and their counts (eg., AGG => 30), and optionally an integer; the integer specifies which codon position's nucleotide will be used for calculation (i.e., 1, 2, or 3), instead of all three positions.
Title : expect_codon_freq Usage : $codonFreq = $self->expect_codon_freq($base_composition) Function: return the expected frequency of codons Returns : reference to a hash in which codon is hash key, and fraction is hash value Args : reference to an array of base compositions in the order of [A, T, C, G], represented as either counts or fractions
Zhenguo Zhang, <zhangz.sci at gmail.com>
<zhangz.sci at gmail.com>
Please report any bugs or feature requests to bug-bio-cua at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Bio-CUA. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
bug-bio-cua at rt.cpan.org
You can find documentation for this module with the perldoc command.
perldoc Bio::CUA::CUB::Calculator
You can also look for information at:
RT: CPAN's request tracker (report bugs here)
http://rt.cpan.org/NoAuth/Bugs.html?Dist=Bio-CUA
AnnoCPAN: Annotated CPAN documentation
http://annocpan.org/dist/Bio-CUA
CPAN Ratings
http://cpanratings.perl.org/d/Bio-CUA
Search CPAN
http://search.cpan.org/dist/Bio-CUA/
Copyright 2015 Zhenguo Zhang.
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.
To install Bio::CUA, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Bio::CUA
CPAN shell
perl -MCPAN -e shell install Bio::CUA
For more information on module installation, please visit the detailed CPAN module installation guide.