The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

SimpleAlign - Multiple alignments held as a set of sequences

SYNOPSIS

    $aln = new Bio::SimpleAlign;
   
    $aln->read_MSF(\*STDIN);
 
    $aln->write_fasta(\*STDOUT);

INSTALLATION

This module is included with the central Bioperl distribution:

   http://bio.perl.org/Core/Latest
   ftp://bio.perl.org/pub/DIST

Follow the installation instructions included in the README file.

DESCRIPTION

SimpleAlign handles multiple alignments of sequences. It is very permissive of types (it wont insist on things being all same length etc): really it is a SequenceSet explicitly held in memory with a whole series of built in manipulations and especially file format systems for read/writing alignments.

SimpleAlign basically views an alignment as an immutable block of text. SimpleAlign *is not* the object to be using if you want to manipulate an alignment (eg, truncate an alignment or remove columns that are all gaps). These functions are much better done by UnivAln by Georg Fuellen.

However for lightweight display/formatting - this is the one to use.

Tricky concepts. SimpleAlign expects name,start,end to be 'unique' in the alignment, and this is the key for the internal hashes. (name,start,end is abreviated nse in the code). However, in many cases people don't want the name/start-end to be displayed: either multiple names in an alignment or names specific to the alignment (ROA1_HUMAN_1, ROA1_HUMAN_2 etc). These names are called 'displayname', and generally is what is used to print out the alignment. They default to name/start-end

The SimpleAlign Module came from Ewan Birney's Align module

PROGRESS

SimpleAlign is being slowly converted to bioperl coding standards, mainly by Ewan.

Use Bio::Root::Object - done
Use proper exceptions - done
Use hashed constructor - not done!

FEEDBACK

Mailing Lists

User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one of the Bioperl mailing lists. Your participation is much appreciated.

    vsns-bcd-perl@lists.uni-bielefeld.de          - General discussion
    vsns-bcd-perl-guts@lists.uni-bielefeld.de     - Technically-oriented discussion
    http://bio.perl.org/MailList.html             - About the mailing lists

Reporting Bugs

Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via email or the web:

    bioperl-bugs@bio.perl.org                   
    http://bio.perl.org/bioperl-bugs/           

AUTHOR

Ewan Birney, birney@sanger.ac.uk

SEE ALSO

 Bio::Seq.pm - The biosequence object

 http://bio.perl.org/Projects/modules.html  - Online module documentation
 http://bio.perl.org/Projects/SeqAlign/     - Bioperl sequence alignment project
 http://bio.perl.org/                       - Bioperl Project Homepage

APPENDIX

The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _

id

 Title     : id
 Usage     : $myalign->id("Ig")
 Function  : Gets/sets the id field of the alignment
           :
 Returns   : An id string
 Argument  : An id string (optional)

addSeq

 Title     : addSeq
 Usage     : $myalign->addSeq($newseq);
           : 
           :
 Function  : Adds another sequence to the alignment
           : *doesn't* align it - just adds it to the
           : hashes
           :
 Returns   : nothing
 Argument  : 

removeSeq

 Title     : removeSeq
 Usage     : $aln->removeSeq($seq);
 Function  : removes a single sequence from an alignment

eachSeq

 Title     : eachSeq
 Usage     : foreach $seq ( $align->eachSeq() ) 
           : 
           :
 Function  : gets an array of Seq objects from the
           : alignment
           : 
           :
 Returns   : an array
 Argument  : nothing

consensus_string

 Title     : consensus_string
 Usage     : $str = $ali->consensus_string()
           : 
           :
 Function  : Makes a consensus
           : 
           : 
           :
 Returns   : 
 Argument  : 

read_MSF

 Title   : read_MSF
 Usage   : $al->read_MSF(\*STDIN);
 Function: reads MSF formatted files. Tries to read *all* MSF
          It reads all non whitespace characters in the alignment
          area. For MSFs with weird gaps (eg ~~~) map them by using
          $al->map_chars('~','-');
 Example :
 Returns : 
 Args    : filehandle

write_MSF

 Title     : write_MSF
 Usage     : $ali->write_MSF(\*FH)
           : 
           :
 Function  : writes MSF format output
           : 
           : 
           :
 Returns   : 
 Argument  : 

length_aln

 Title     : length_aln()
 Usage     : $len = $ali->length_aln() 
           : 
           :
 Function  : returns the maximum length of the alignment.
           : To be sure the alignment is a block, use is_flush
           : 
           :
 Returns   : 
 Argument  : 

is_flush

 Title     : is_flush
 Usage     : if( $ali->is_flush() )  
           : 
           :
 Function  : Tells you whether the alignment 
           : is flush, ie all of the same length
           : 
           :
 Returns   : 1 or 0
 Argument  : 

read_fasta

 Title     : read_fasta
 Usage     : $ali->read_fasta(\*INPUT)
           : 
           :
 Function  : reads in a fasta formatted
           : file for an alignment
           : 
           :
 Returns   : 
 Argument  : 

read_selex

 Title     : read_selex
 Usage     : $ali->read_selex(\*INPUT) 
           : 
           :
 Function  : reads selex (hmmer) format
           : alignments
           : 
           :
 Returns   : 
 Argument  : 

read_mase

 Title     : read_mase
 Usage     : $ali->read_mase(\*INPUT)
           : 
           :
 Function  : reads mase (seaview) 
           : formatted alignments
           : 
           :
 Returns   : 
 Argument  : 

read_Pfam_file

 Title     : read_Pfam_file
 Usage     : $ali->read_Pfam_file("thisfile");
           : 
 Function  : opens a filename, reads
           : a Pfam (mul) formatted alignment
           :
           : 
           :
 Returns   : 
 Argument  : 

read_Pfam

 Title     : read_Pfam
 Usage     : $ali->read_Pfam(\*INPUT)
           : 
           :
 Function  : reads a Pfam formatted
           : Alignment (Mul format).
           : - this is the format used by Belvu
           :
 Returns   : 
 Argument  : 

write_Pfam

 Title     : write_Pfam
 Usage     : $ali->write_Pfam(\*OUTPUT) 
           : 
           :
 Function  : writes a Pfam/Mul formatted
           : file
           : 
           :
 Returns   : 
 Argument  : 

write_clustalw

 Title     : write_clustalw
 Usage     : $ali->write_clustalw 
           : 
           :
 Function  : writes a clustalw formatted
           : (.aln) file
           : 
           :
 Returns   : 
 Argument  : 

write_fasta

 Title     : write_fasta
 Usage     : $ali->write_fasta(\*OUTPUT) 
           : 
           :
 Function  : writes a fasta formatted alignment
           : 
 Returns   : 
 Argument  : reference-to-glob to file or filehandle object 

set_displayname_flat

 Title     : set_displayname_flat
 Usage     : $ali->set_displayname_flat() 
           : 
           :
 Function  : Makes all the sequences be displayed
           : as just their name, not name/start-end
           : 
           :
 Returns   : 
 Argument  : 

set_displayname_normal

 Title     : set_displayname_normal
 Usage     : $ali->set_displayname_normal() 
           : 
           :
 Function  : Makes all the sequences be displayed
           : as name/start-end
           : 
           :
 Returns   : 
 Argument  : 

set_displayname_count

 Title     : set_displayname_count
 Usage     : $ali->set_displayname_count 
           : 
           :
 Function  : sets the names to be name_#
           : where # is the number of times this
           : name has been used. 
           :
 Returns   : 
 Argument  : 

each_alphabetically

 Title     : each_alphabetically
 Usage     : foreach $seq ( $ali->each_alphabetically() )
           : 
           :
 Function  : returns an array of sequence object sorted
           : alphabetically by name and then by start point
           : 
           : Does not change the order of the alignment
 Returns   : 
 Argument  : 

sort_alphabetically

 Title     : sort_alphabetically
 Usage     : $ali->sort_alphabetically
           : 
           :
 Function  : changes the order of the alignemnt
           : to alphabetical on name followed by
           : numerical by number
           :
 Returns   : 
 Argument  : 

map_chars

 Title     : map_chars
 Usage     : $ali->map_chars('\.','-')
           : 
           :
 Function  : does a s/$arg1/$arg2/ on 
           : the sequences. Useful for
           : gap characters
           :
           : Notice that the from (arg1) is interpretted 
           : as a regex, so be careful about quoting meta
           : characters (eg $ali->map_chars('.','-') wont
           : do what you want)
 Returns   : 
 Argument  : 

uppercase

 Title     : uppercase()
 Usage     : $ali->uppercase()
           : 
           :
 Function  : Sets all the sequences
           : to uppercase
           : 
           :
 Returns   : 
 Argument  : 

no_sequences

 Title     : no_sequences
 Usage     : $depth = $ali->no_sequences
           : 
           :
 Function  : number of sequence in the
           : sequence alignment
           : 
           :
 Returns   : 
 Argument  : 

no_residues

 Title     : no_residues
 Usage     : $no = $ali->no_residues
           : 
           :
 Function  : number of residues in total
           : in the alignment
           : 
           :
 Returns   : 
 Argument  : 

purge

 Title   : purge
 Usage   : $aln->purge(0.7);
 Function: removes sequences above whatever %id
 Example :
 Returns : An array of the removed sequences
 Arguments

 This function will grind on large alignments. Beware!

 (perhaps not ideally implemented)

percentage_identity

 Title   : percentage_identity
 Usage   : $id = $align->percentage_identity
 Function:
    The function uses a fast method to calculate the average percentage identity of the alignment
 Returns : The average percentage identity of the alignment
 Args    : None

read_Prodom

 Title   : read_Prodom
 Usage   : $ali->read_Prodom( $file )
 Function: Reads in a Prodom format alignment
 Returns : 
    Args    : A filehandle glob or ref. to a filehandle object