NAME

Bio::Tools::Run::Alignment::TCoffee - Object for the calculation of a multiple sequence alignment from a set of unaligned sequences or alignments using the TCoffee program

SYNOPSIS

  # Build a tcoffee alignment factory
  @params = ('ktuple' => 2, 'matrix' => 'BLOSUM');
  $factory = new Bio::Tools::Run::Alignment::TCoffee (@params);

  # Pass the factory a list of sequences to be aligned.
  $inputfilename = 't/cysprot.fa';
  # $aln is a SimpleAlign object.
  $aln = $factory->align($inputfilename);

  # or where @seq_array is an array of Bio::Seq objects
  $seq_array_ref = \@seq_array;
  $aln = $factory->align($seq_array_ref);

  # Or one can pass the factory a pair of (sub)alignments
  #to be aligned against each other, e.g.:

  # where $aln1 and $aln2 are Bio::SimpleAlign objects.
  $aln = $factory->profile_align($aln1,$aln2);

  # Or one can pass the factory an alignment and one or more
  # unaligned sequences to be added to the alignment. For example:

  # $seq is a Bio::Seq object.
  $aln = $factory->profile_align($aln1,$seq);

There are various additional options and input formats available. See the DESCRIPTION section that follows for additional details.

DESCRIPTION

Note: this DESCRIPTION only documents the (Bio)perl interface to TCoffee.

There are a number of possible parameters one can pass in TCoffee. One should really read the online manual for the best explaination of all the features. See http://igs-server.cnrs-mrs.fr/~cnotred/Documentation/t_coffee/t_coffee_doc.html

These can be specified as parameters when instantiating a new TCoffee object, or through get/set methods of the same name (lowercase).

PARAMETERS FOR ALIGNMENT COMPUTATION

IN

 Title       : IN
 Description : (optional) input filename, this is specified when
               align so should not use this directly unless one
               understand TCoffee program very well.

TYPE

 Title       : TYPE
 Args        : [string] DNA, PROTEIN
 Description : (optional) set the sequence type, guessed automatically
               so should not use this directly

PARAMETERS

 Title       : PARAMETERS
 Description : (optional) Indicates a file containing extra parameters

EXTEND

 Title       : EXTEND
 Args        : 0, 1, or positive value
 Default     : 1
 Description : Flag indicating that library extension should be
               carried out when performing multiple alignments, if set
               to 0 then extension is not made, if set to 1 extension
               is made on all pairs in the library.  If extension is
               set to another positive value, the extension is only
               carried out on pairs having a weigth value superior to
               the specified limit.

DP_NORMALISE

 Title       : DP_NORMALISE
 Args        : 0 or positive value
 Default     : 1000
 Description : When using a value different from 0, this flag sets the
               score of the highest scoring pair to 1000.

DP_MODE

 Title       : DP_MODE
 Args        : [string] gotoh_pair_wise, myers_miller_pair_wise,
               fasta_pair_wise cfasta_pair_wise
 Default     : cfast_fair_wise
 Description : Indicates the type of dynamic programming used by
               the program

    gotoh_pair_wise : implementation of the gotoh algorithm
    (quadratic in memory and time)

    myers_miller_pair_wise : implementation of the Myers and Miller
    dynamic programming algorithm ( quadratic in time and linear in
    space). This algorithm is recommended for very long sequences. It
    is about 2 time slower than gotoh. It only accepts tg_mode=1.

    fasta_pair_wise: implementation of the fasta algorithm. The
    sequence is hashed, looking for ktuples words. Dynamic programming
    is only carried out on the ndiag best scoring diagonals. This is
    much faster but less accurate than the two previous.

    cfasta_pair_wise : c stands for checked. It is the same
    algorithm. The dynamic programming is made on the ndiag best
    diagonals, and then on the 2*ndiags, and so on until the scores
    converge. Complexity will depend on the level of divergence of the
    sequences, but will usually be L*log(L), with an accuracy
    comparable to the two first mode ( this was checked on BaliBase).

KTUPLE

 Title       : KTUPLE
 Args        : numeric value
 Default     : 1 or 2 (1 for protein, 2 for DNA )

 Description : Indicates the ktuple size for cfasta_pair_wise dp_mode
               and fasta_pair_wise. It is set to 1 for proteins, and 2
               for DNA. The alphabet used for protein is not the 20
               letter code, but a mildly degenerated version, where
               some residues are grouped under one letter, based on
               physicochemical properties:
               rk, de, qh, vilm, fy (the other residues are
               not degenerated).

NDIAGS

 Title       : NDIAGS
 Args        : numeric value
 Default     : 0
 Description : Indicates the number of diagonals used by the
               fasta_pair_wise algorithm. When set to 0,
               n_diag=Log (length of the smallest sequence)

DIAG_MODE

 Title       : DIAG_MODE
 Args        : numeric value
 Default     : 0


 Description : Indicates the manner in which diagonals are scored
              during the fasta hashing.

              0 indicates that the score of a diagonal is equal to the
              sum of the scores of the exact matches it contains.


              1 indicates that this score is set equal to the score of
              the best uninterrupted segment

              1 can be useful when dealing with fragments of sequences.

SIM_MATRIX

 Title       : SIM_MATRIX
 Args        : string
 Default     : vasiliky
 Description : Indicates the manner in which the amino acid is being
               degenerated when hashing. All the substitution matrix
               are acceptable. Categories will be defined as sub-group
               of residues all having a positive substitution score
               (they can overlap).

               If you wish to keep the non degenerated amino acid
               alphabet, use 'idmat'

MATRIX

 Title       : MATRIX
 Args        :
 Default     :
 Description : This flag is provided for compatibility with
               ClustalW. Setting matrix = 'blosum' is equivalent to
               -in=Xblosum62mt , -matrix=pam is equivalent to
               in=Xpam250mt . Apart from this, the rules are similar
               to those applying when declaring a matrix with the
               -in=X fl

GAPOPEN

 Title       : GAPOPEN
 Args        : numeric
 Default     : 0
 Description : Indicates the penalty applied for opening a gap. The
               penalty must be negative. If you provide a positive
               value, it will automatically be turned into a negative
               number. We recommend a value of 10 with pam matrices,
               and a value of 0 when a library is used.

GAPEXT

 Title       : GAPEXT
 Args        : numeric
 Default     : 0
 Description : Indicates the penalty applied for extending a gap.

COSMETIC_PENALTY

 Title       : COSMETIC_PENALTY
 Args        : numeric
 Default     : 100
 Description : Indicates the penalty applied for opening a gap. This
               penalty is set to a very low value. It will only have
               an influence on the portions of the alignment that are
               unalignable. It will not make them more correct, but
               only more pleasing to the eye ( i.e. Avoid stretches of
               lonely residues).

               The cosmetic penalty is automatically turned off if a
               substitution matrix is used rather than a library.

TG_MODE

 Title       : TG_MODE
 Args        : 0,1,2
 Default     : 1
 Description : (Terminal Gaps)
               0: indicates that terminal gaps must be panelized with
                  a gapopen and a gapext penalty.
               1: indicates that terminal gaps must be penalized only
                  with a gapext penalty
               2: indicates that terminal gaps must not be penalized.

WEIGHT

 Title       : WEIGHT
 Args        : sim or sim_<matrix_name or matrix_file> or integer value
 Default     : sim


 Description : Weight defines the way alignments are weighted when
               turned into a library.

               sim indicates that the weight equals the average
                   identity within the match residues.

               sim_matrix_name indicates the average identity with two
                   residues regarded as identical when their
                   substitution value is positive. The valid matrices
                   names are in matrices.h (pam250mt) . Matrices not
                   found in this header are considered to be
                   filenames. See the format section for matrices. For
                   instance, -weight=sim_pam250mt indicates that the
                   grouping used for similarity will be the set of
                   classes with positive substitutions. Other groups
                   include

                       sim_clustalw_col ( categories of clustalw
                       marked with :)

                       sim_clustalw_dot ( categories of clustalw
                       marked with .)


               Value indicates that all the pairs found in the
               alignments must be given the same weight equal to
               value. This is useful when the alignment one wishes to
               turn into a library must be given a pre-specified score
               (for instance if they come from a structure
               super-imposition program). Value is an integer:

                       -weight=1000

  Note       : Weight only affects methods that return an alignment to
               T-Coffee, such as ClustalW. On the contrary, the
               version of Lalign we use here returns a library where
               weights have already been applied and are therefore
               insensitive to the -weight flag.

SEQ_TO_ALIGN

 Title       : SEQ_TO_ALIGN
 Args        : filename
 Default     : no file - align all the sequences

 Description : You may not wish to align all the sequences brought in
               by the -in flag. Supplying the seq_to_align flag allows
               for this, the file is simply a list of names in Fasta
               format.

               However, note that library extension will be carried out
               on all the sequences.

PARAMETERS FOR TREE COMPUTATION AND OUTPUT

NEWTREE

 Title       : NEWTREE
 Args        : treefile
 Default     : no file
 Description : Indicates the name of the new tree to compute. The
               default will be <sequence_name>.dnd, or <run_name.dnd>.
               Format is Phylips tree format

USETREE

 Title       : USETREE
 Args        : treefile
 Default     : no file specified
 Description : This flag indicates that rather than computing a new
               dendrogram, t_coffee can use a pre-computed one. The
               tree files are in phylips format and compatible with
               ClustalW. In most cases, using a pre-computed tree will
               halve the computation time required by t_coffee. It is
               also possible to use trees output by ClustalW or
               Phylips. Format is Phylips tree format

TREE_MODE

 Title       : TREE_MODE
 Args        : slow, fast, very_fast
 Default     : very_fast
 Description : This flag indicates the method used for computing the
               dendrogram.
               slow : the chosen dp_mode using the extended library,
               fast : The fasta dp_mode using the extended library.
               very_fast: The fasta dp_mode using pam250mt.

QUICKTREE

 Title       : QUICKTREE
 Args        :
 Default     :
 Description : This flag is kept for compatibility with ClustalW.
               It indicates that:  -tree_mode=very_fast

PARAMETERS FOR ALIGNMENT OUTPUT

OUTFILE

 Title       : OUTFILE
 Args        : out_aln file, default, no
 Default     : default ( yourseqfile.aln)
 Description : indicates name of output alignment file

OUTPUT

 Title       : OUTPUT
 Args        : format1, format2
 Default     : clustalw
 Description : Indicated format for outputting outputfile
               Supported formats are:

               clustalw_aln, clustalw: ClustalW format.
               gcg, msf_aln : Msf alignment.
               pir_aln : pir alignment.
               fasta_aln : fasta alignment.
               phylip : Phylip format.
               pir_seq : pir sequences (no gap).
               fasta_seq : fasta sequences (no gap).
    As well as:
                score_html : causes the output to be a reliability
                             plot in HTML
                score_pdf : idem in PDF.
                score_ps : idem in postscript.

    More than one format can be indicated:
                -output=clustalw,gcg, score_html

CASE

 Title       : CASE
 Args        : upper, lower
 Default     : upper
 Description : triggers choice of the case for output

CPU

 Title       : CPU
 Args        : value
 Default     : 0
 Description : Indicates the cpu time (micro seconds) that must be
               added to the t_coffee computation time.

OUT_LIB

 Title       : OUT_LIB
 Args        : name of library, default, no
 Default     : default
 Description : Sets the name of the library output. Default implies
               <run_name>.tc_lib

OUTORDER

 Title       : OUTORDER
 Args        : input or aligned
 Default     : input
 Description : Sets the name of the library output. Default implies
               <run_name>.tc_lib

SEQNOS

 Title       : SEQNOS
 Args        : on or off
 Default     : off
 Description : Causes the output alignment to contain residue numbers
               at the end of each line:

PARAMETERS FOR GENERIC OUTPUT

RUN_NAME

 Title       : RUN_NAME
 Args        : your run name
 Default     :
 Description : This flag causes the prefix <your sequences> to be
               replaced by <your run name> when renaming the default
               files.

ALIGN

 Title       : ALIGN
 Args        :
 Default     :
 Description : Indicates that the program must produce the
               alignment. This flag is here for compatibility with
               ClustalW

QUIET

 Title       : QUIET
 Args        : stderr, stdout, or filename, or nothing
 Default     : stderr
 Description : Redirects the standard output to either a file.
              -quiet on its own redirect the output to /dev/null.

CONVERT

 Title       : CONVERT
 Args        :
 Default     :
 Description : Indicates that the program must not compute the
               alignment but simply convert all the sequences,
               alignments and libraries into the format indicated with
               -output. This flag can also be used if you simply want
               to compute a library ( i.e. You have an alignment and
               you want to turn it into a library).

FEEDBACK

Mailing Lists

User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one of the Bioperl mailing lists. Your participation is much appreciated.

  bioperl-l@bioperl.org          - General discussion
  http://bio.perl.org/MailList.html             - About the mailing lists

Reporting Bugs

Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via email or the web:

  bioperl-bugs@bio.perl.org
  http://bio.perl.org/bioperl-bugs/

AUTHOR - Jason Stajich, Peter Schattner

Email jason@chg.mc.duke.edu, schattner@alum.mit.edu

APPENDIX

The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _

exists_tcoffee

 Title   : exists_tcoffee
 Usage   : $coffeefound = Bio::Tools::Run::Alignment::TCoffee->exists_tcoffee()
 Function: Determine whether tcoffee program can be found on current host
 Example :
 Returns : 1 if tcoffee program found at expected location, 0 otherwise.
 Args    :  none

version

 Title   : version
 Usage   : exit if $prog->version() < 1.8
 Function: Determine the version number of the program
 Example :
 Returns : float or undef
 Args    : none

align

 Title   : align
 Usage   :
        $inputfilename = 't/cysprot.fa';
        $aln = $factory->align($inputfilename);
or
        $seq_array_ref = \@seq_array; @seq_array is array of Seq objs
        $aln = $factory->align($seq_array_ref);
 Function: Perform a multiple sequence alignment
 Example :
 Returns : Reference to a SimpleAlign object containing the
           sequence alignment.
 Args    : Name of a file containing a set of unaligned fasta sequences
           or else an array of references to Bio::Seq objects.

 Throws an exception if argument is not either a string (eg a
 filename) or a reference to an array of Bio::Seq objects.  If
 argument is string, throws exception if file corresponding to string
 name can not be found. If argument is Bio::Seq array, throws
 exception if less than two sequence objects are in array.

profile_align

 Title   : profile_align
 Usage   :
 Function: Perform an alignment of 2 (sub)alignments
 Example :
 Returns : Reference to a SimpleAlign object containing the (super)alignment.
 Args    : Names of 2 files containing the subalignments
         or references to 2 Bio::SimpleAlign objects.

Throws an exception if arguments are not either strings (eg filenames) or references to SimpleAlign objects.

_run

 Title   :  _run
 Usage   :  Internal function, not to be called directly        
 Function:  makes actual system call to tcoffee program
 Example :
 Returns : nothing; tcoffee output is written to a
           temporary file $TMPOUTFILE
 Args    : Name of a file containing a set of unaligned fasta sequences
           and hash of parameters to be passed to tcoffee

_setinput

 Title   :  _setinput
 Usage   :  Internal function, not to be called directly        
 Function:  Create input file for tcoffee program
 Example :
 Returns : name of file containing tcoffee data input
 Args    : Seq or Align object reference or input file name

_setparams

 Title   :  _setparams
 Usage   :  Internal function, not to be called directly        
 Function:  Create parameter inputs for tcoffee program
 Example :
 Returns : parameter string to be passed to tcoffee
           during align or profile_align
 Args    : name of calling object

To install Bio::Seq, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Bio::Seq

CPAN shell

perl -MCPAN -e shell
install Bio::Seq

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)