SimpleAlign - Multiple alignments held as a set of sequences
$aln = new Bio::SimpleAlign; $aln->read_MSF(\*STDIN); $aln->write_fasta(\*STDOUT);
This module is included with the central Bioperl distribution:
http://bio.perl.org/Core/Latest ftp://bio.perl.org/pub/DIST
Follow the installation instructions included in the README file.
SimpleAlign handles multiple alignments of sequences. It is very permissive of types (it wont insist on things being all same length etc): really it is a SequenceSet explicitly held in memory with a whole series of built in manipulations and especially file format systems for read/writing alignments.
SimpleAlign basically views an alignment as an immutable block of text. SimpleAlign *is not* the object to be using if you want to manipulate an alignment (eg, truncate an alignment or remove columns that are all gaps). These functions are much better done by UnivAln by Georg Fuellen.
However for lightweight display/formatting - this is the one to use.
Tricky concepts. SimpleAlign expects name,start,end to be 'unique' in the alignment, and this is the key for the internal hashes. (name,start,end is abreviated nse in the code). However, in many cases people don't want the name/start-end to be displayed: either multiple names in an alignment or names specific to the alignment (ROA1_HUMAN_1, ROA1_HUMAN_2 etc). These names are called 'displayname', and generally is what is used to print out the alignment. They default to name/start-end
The SimpleAlign Module came from Ewan Birney's Align module
SimpleAlign is being slowly converted to bioperl coding standards, mainly by Ewan.
User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one of the Bioperl mailing lists. Your participation is much appreciated.
vsns-bcd-perl@lists.uni-bielefeld.de - General discussion vsns-bcd-perl-guts@lists.uni-bielefeld.de - Technically-oriented discussion http://bio.perl.org/MailList.html - About the mailing lists
Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via email or the web:
bioperl-bugs@bio.perl.org http://bio.perl.org/bioperl-bugs/
Ewan Birney, birney@sanger.ac.uk
Bio::Seq.pm - The biosequence object http://bio.perl.org/Projects/modules.html - Online module documentation http://bio.perl.org/Projects/SeqAlign/ - Bioperl sequence alignment project http://bio.perl.org/ - Bioperl Project Homepage
The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _
Title : id Usage : $myalign->id("Ig") Function : Gets/sets the id field of the alignment : Returns : An id string Argument : An id string (optional)
Title : addSeq Usage : $myalign->addSeq($newseq); : : Function : Adds another sequence to the alignment : *doesn't* align it - just adds it to the : hashes : Returns : nothing Argument :
Title : removeSeq Usage : $aln->removeSeq($seq); Function : removes a single sequence from an alignment
Title : eachSeq Usage : foreach $seq ( $align->eachSeq() ) : : Function : gets an array of Seq objects from the : alignment : : Returns : an array Argument : nothing
Title : consensus_string Usage : $str = $ali->consensus_string() : : Function : Makes a consensus : : : Returns : Argument :
Title : read_MSF Usage : $al->read_MSF(\*STDIN); Function: reads MSF formatted files. Tries to read *all* MSF It reads all non whitespace characters in the alignment area. For MSFs with weird gaps (eg ~~~) map them by using $al->map_chars('~','-'); Example : Returns : Args : filehandle
Title : write_MSF Usage : $ali->write_MSF(\*FH) : : Function : writes MSF format output : : : Returns : Argument :
Title : length_aln() Usage : $len = $ali->length_aln() : : Function : returns the maximum length of the alignment. : To be sure the alignment is a block, use is_flush : : Returns : Argument :
Title : is_flush Usage : if( $ali->is_flush() ) : : Function : Tells you whether the alignment : is flush, ie all of the same length : : Returns : 1 or 0 Argument :
Title : read_fasta Usage : $ali->read_fasta(\*INPUT) : : Function : reads in a fasta formatted : file for an alignment : : Returns : Argument :
Title : read_selex Usage : $ali->read_selex(\*INPUT) : : Function : reads selex (hmmer) format : alignments : : Returns : Argument :
Title : read_mase Usage : $ali->read_mase(\*INPUT) : : Function : reads mase (seaview) : formatted alignments : : Returns : Argument :
Title : read_Pfam_file Usage : $ali->read_Pfam_file("thisfile"); : Function : opens a filename, reads : a Pfam (mul) formatted alignment : : : Returns : Argument :
Title : read_Pfam Usage : $ali->read_Pfam(\*INPUT) : : Function : reads a Pfam formatted : Alignment (Mul format). : - this is the format used by Belvu : Returns : Argument :
Title : write_Pfam Usage : $ali->write_Pfam(\*OUTPUT) : : Function : writes a Pfam/Mul formatted : file : : Returns : Argument :
Title : write_clustalw Usage : $ali->write_clustalw : : Function : writes a clustalw formatted : (.aln) file : : Returns : Argument :
Title : write_fasta Usage : $ali->write_fasta(\*OUTPUT) : : Function : writes a fasta formatted alignment : Returns : Argument : reference-to-glob to file or filehandle object
Title : set_displayname_flat Usage : $ali->set_displayname_flat() : : Function : Makes all the sequences be displayed : as just their name, not name/start-end : : Returns : Argument :
Title : set_displayname_normal Usage : $ali->set_displayname_normal() : : Function : Makes all the sequences be displayed : as name/start-end : : Returns : Argument :
Title : set_displayname_count Usage : $ali->set_displayname_count : : Function : sets the names to be name_# : where # is the number of times this : name has been used. : Returns : Argument :
Title : each_alphabetically Usage : foreach $seq ( $ali->each_alphabetically() ) : : Function : returns an array of sequence object sorted : alphabetically by name and then by start point : : Does not change the order of the alignment Returns : Argument :
Title : sort_alphabetically Usage : $ali->sort_alphabetically : : Function : changes the order of the alignemnt : to alphabetical on name followed by : numerical by number : Returns : Argument :
Title : map_chars Usage : $ali->map_chars('\.','-') : : Function : does a s/$arg1/$arg2/ on : the sequences. Useful for : gap characters : : Notice that the from (arg1) is interpretted : as a regex, so be careful about quoting meta : characters (eg $ali->map_chars('.','-') wont : do what you want) Returns : Argument :
Title : uppercase() Usage : $ali->uppercase() : : Function : Sets all the sequences : to uppercase : : Returns : Argument :
Title : no_sequences Usage : $depth = $ali->no_sequences : : Function : number of sequence in the : sequence alignment : : Returns : Argument :
Title : no_residues Usage : $no = $ali->no_residues : : Function : number of residues in total : in the alignment : : Returns : Argument :
Title : purge Usage : $aln->purge(0.7); Function: removes sequences above whatever %id Example : Returns : An array of the removed sequences Arguments This function will grind on large alignments. Beware! (perhaps not ideally implemented)
Title : percentage_identity Usage : $id = $align->percentage_identity Function: The function uses a fast method to calculate the average percentage identity of the alignment Returns : The average percentage identity of the alignment Args : None
Title : read_Prodom Usage : $ali->read_Prodom( $file ) Function: Reads in a Prodom format alignment Returns : Args : A filehandle glob or ref. to a filehandle object
To install Bio::Seq, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Bio::Seq
CPAN shell
perl -MCPAN -e shell install Bio::Seq
For more information on module installation, please visit the detailed CPAN module installation guide.