The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

cub_seq.pl - a program to calculate sequence codon usage bias metrics and other sequence parameters.

VERSION

VERSION: 0.12

SYNOPSIS

This program computes CUB metrics for each sequence; the types of computed CUB metrics depend on the provided options (see below).

In addition to CUB metrics, the program also computes some other features such as counts of amino acids, GC-content of the whole sequence and the 3rd codon positions.

  # compute ENC, ENC_r, CAI, and tAI for each sequence in file cds.fa
  cub_seq.pl --cai CAI_codon.top_200 --tai tAI_codon \
  --enc enc,enc_r --seq cds.fa -o CUB_seq.tsv

  # the same as above but not output GC content, AA counts and protein
  # lengths
  cub_seq.pl --cai CAI_codon.top_200 --tai tAI_codon \
  --enc enc,enc_r --seq cds.fa -o CUB_seq.tsv --lite

OPTIONS

Mandatory options

-s/--seq-file

file containing sequences in fasta format, from which each sequence's CUB metrics are computed.

Auxiliary options

-g/--gc-id

ID of genetic code table used for identifying amino acid encoded by each codon. Default is 1, i.e., standard code. See NCBI Genetic Code for valid IDs.

-t/--tai-param

file containing tAI value for each codon in the format 'codon<tab>tAI_value', which can be produced by tai_codon.pl. If not given, tAI values would not be computed.

-c/--cai-param

similar to --tai-param, except that CAI values are provided in the same format. This file may be produced by cai_codon.pl. If not given, CAI values would not be computed.

-f/--fop-param

a file containing pre-defined optimal codons, one codon per line. Optimal codons can be selected using different ways, such as selecting high-tAI codons or those preferred in highly expressed genes.

-e/--enc-methods

methods for ENC calculations. Available values are enc, enc_r, encp, and encp_r. encp* versions corrects background GC-content in calculations. *_r versions uses a new method to estimate missing F values. Check module Bio::CUA::CUB::Calculator to see details of these methods. Default is enc. Multiple methods can be specified as comma-separated string such as 'enc,encp,enc_r'.

-b/--base-comp

This option is needed when computing encp* versions of ENC. The base compositions are used as background base compositions to calculate expected codon frequency in the sequences. It may be helpful for one to exclude the effect of mutational bias on codon usage. This option has no effect unless encp* version methods are specified in --enc-methods.

Acceptable values are either a file or four numbers separated by comma. When provided is a file, it is assumed that sequence-specific background base compositions are given in the format like:

        seq_id1 #A      #T      #C      #G
        seq_id2 #A      #T      #C      #G
        ...   ...

where #A/#T/#C/#G are counts or fractions of each base type in background data (e.g., introns) for each sequence. For sequences without background base composition information, 'NA' will be returned from encp* methods.

When provided are numbers, it should be like

        0.2,0.3,0.3,0.2

giving the frequency of A/T/C/G in order.

-l/--lite

A switch option. In default, the program outputs counts of amino acids, GC content, and protein lengths. If this option is set these parameters will not be output.

-o/--out-file

the file to store the results. Default is to standard output, usually screen.

-h/--help

show the brief help message.

AUTHOR

Zhenguo Zhang, <zhangz.sci at gmail.com>

BUGS

Please report any bugs or feature requests to bug-bio-cua at rt.cpan.org or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Bio-CUA. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this class with the perldoc command.

        perldoc Bio::CUA

You can also look for information at:

ACKNOWLEDGEMENTS

UPDATES

0.12 - Thu Jun 4 11:31:03 EDT 2015

           1. update documentation

0.11 - Thu May 21 16:00:28 EDT 2015

           1. modify/add option --base-comp and --lite.

LICENSE AND COPYRIGHT

Copyright 2015 Zhenguo Zhang.

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.