NAME
Bio::Phylo::Matrices::MatrixRole  Extra behaviours for a character state matrix
SYNOPSIS
use Bio::Phylo::Factory;
my $fac = Bio::Phylo::Factory>new;
# instantiate taxa object
my $taxa = $fac>create_taxa;
for ( 'Homo sapiens', 'Pan paniscus', 'Pan troglodytes' ) {
$taxa>insert( $fac>create_taxon( 'name' => $_ ) );
}
# instantiate matrix object, 'standard' data type. All categorical
# data types follow semantics like this, though with different
# symbols in lookup table and matrix
my $standard_matrix = $fac>create_matrix(
'type' => 'STANDARD',
'taxa' => $taxa,
'lookup' => {
'' => [],
'0' => [ '0' ],
'1' => [ '1' ],
'?' => [ '0', '1' ],
},
'labels' => [ 'Opposable big toes', 'Opposable thumbs', 'Not a pygmy' ],
'matrix' => [
[ 'Homo sapiens' => '0', '1', '1' ],
[ 'Pan paniscus' => '1', '1', '0' ],
[ 'Pan troglodytes' => '1', '1', '1' ],
],
);
# note: complicated constructor for mixed data!
my $mixed_matrix = Bio::Phylo::Matrices::Matrix>new(
# if you want to create 'mixed', value for 'type' is array ref...
'type' => [
# ...with first field 'mixed'...
'mixed',
# ...second field is an array ref...
[
# ...with _ordered_ key/value pairs...
'dna' => 10, # value is length of type range
'standard' => 10, # value is length of type range
# ... or, more complicated, value is a hash ref...
'rna' => {
'length' => 10, # value is length of type range
# ...value for 'args' is an array ref with args
# as can be passed to 'unmixed' datatype constructors,
# for example, here we modify the lookup table for
# rna to allow both 'U' (default) and 'T'
'args' => [
'lookup' => {
'A' => [ 'A' ],
'C' => [ 'C' ],
'G' => [ 'G' ],
'U' => [ 'U' ],
'T' => [ 'T' ],
'M' => [ 'A', 'C' ],
'R' => [ 'A', 'G' ],
'S' => [ 'C', 'G' ],
'W' => [ 'A', 'U', 'T' ],
'Y' => [ 'C', 'U', 'T' ],
'K' => [ 'G', 'U', 'T' ],
'V' => [ 'A', 'C', 'G' ],
'H' => [ 'A', 'C', 'U', 'T' ],
'D' => [ 'A', 'G', 'U', 'T' ],
'B' => [ 'C', 'G', 'U', 'T' ],
'X' => [ 'G', 'A', 'U', 'T', 'C' ],
'N' => [ 'G', 'A', 'U', 'T', 'C' ],
},
],
},
],
],
);
# prints 'mixed(Dna:110, Standard:1120, Rna:2130)'
print $mixed_matrix>get_type;
DESCRIPTION
This module defines a container object that holds Bio::Phylo::Matrices::Datum objects. The matrix object inherits from Bio::Phylo::Listable, so the methods defined there apply here.
METHODS
CONSTRUCTOR
 new()

Matrix constructor.
Type : Constructor Title : new Usage : my $matrix = Bio::Phylo::Matrices::Matrix>new; Function: Instantiates a Bio::Phylo::Matrices::Matrix object. Returns : A Bio::Phylo::Matrices::Matrix object. Args : type => optional, but if used must be FIRST argument, defines datatype, one of dnarnaprotein continuousstandardrestriction[ mixed => [] ] taxa => optional, link to taxa object lookup => character state lookup hash ref labels => array ref of character labels matrix => twodimensional array, first element of every row is label, subsequent are characters
 new_from_bioperl()

Matrix constructor from Bio::Align::AlignI argument.
Type : Constructor Title : new_from_bioperl Usage : my $matrix = Bio::Phylo::Matrices::Matrix>new_from_bioperl( $aln ); Function: Instantiates a Bio::Phylo::Matrices::Matrix object. Returns : A Bio::Phylo::Matrices::Matrix object. Args : An alignment that implements Bio::Align::AlignI
MUTATORS
 set_special_symbols

Sets three special symbols in one call
Type : Mutator Title : set_special_symbols Usage : $matrix>set_special_symbols( missing => '?', gap => '', matchchar => '.' ); Function: Assigns state labels. Returns : $self Args : Three args (with distinct $x, $y and $z): missing => $x, gap => $y, matchchar => $z Notes : This method is here to ensure you don't accidentally use the same symbol for missing AND gap
 set_charlabels()

Sets argument character labels.
Type : Mutator Title : set_charlabels Usage : $matrix>set_charlabels( [ 'char1', 'char2', 'char3' ] ); Function: Assigns character labels. Returns : $self Args : ARRAY, or nothing (to reset);
 set_raw()

Set contents using twodimensional array argument.
Type : Mutator Title : set_raw Usage : $matrix>set_raw( [ [ 'taxon1' => 'acgt' ], [ 'taxon2' => 'acgt' ] ] ); Function: Syntax sugar to define $matrix data contents. Returns : $self Args : A twodimensional array; first dimension contains matrix rows, second dimension contains taxon name / character string pair.
ACCESSORS
 get_special_symbols()

Retrieves hash ref for missing, gap and matchchar symbols
Type : Accessor Title : get_special_symbols Usage : my %syms = %{ $matrix>get_special_symbols }; Function: Retrieves special symbols Returns : HASH ref, e.g. { missing => '?', gap => '', matchchar => '.' } Args : None.
 get_charlabels()

Retrieves character labels.
Type : Accessor Title : get_charlabels Usage : my @charlabels = @{ $matrix>get_charlabels }; Function: Retrieves character labels. Returns : ARRAY Args : None.
 get_nchar()

Calculates number of characters.
Type : Accessor Title : get_nchar Usage : my $nchar = $matrix>get_nchar; Function: Calculates number of characters (columns) in matrix (if the matrix is nonrectangular, returns the length of the longest row). Returns : INT Args : none
 get_ntax()

Calculates number of taxa (rows) in matrix.
Type : Accessor Title : get_ntax Usage : my $ntax = $matrix>get_ntax; Function: Calculates number of taxa (rows) in matrix Returns : INT Args : none
 get_raw()

Retrieves a 'raw' (twodimensional array) representation of the matrix's contents.
Type : Accessor Title : get_raw Usage : my $rawmatrix = $matrix>get_raw; Function: Retrieves a 'raw' (twodimensional array) representation of the matrix's contents. Returns : A twodimensional array; first dimension contains matrix rows, second dimension contains taxon name and characters. Args : NONE
 get_ungapped_columns()

Type : Accessor Title : get_ungapped_columns Usage : my @ungapped = @{ $matrix>get_ungapped_columns }; Function: Retrieves the zerobased column indices of columns without gaps Returns : An array reference with zero or more indices (i.e. integers) Args : NONE
 get_invariant_columns()

Type : Accessor Title : get_invariant_columns Usage : my @invariant = @{ $matrix>get_invariant_columns }; Function: Retrieves the zerobased column indices of invariant columns Returns : An array reference with zero or more indices (i.e. integers) Args : Optional: gap => if true, counts the gap symbol (probably '') as a variant missing => if true, counts the missing symbol (probably '?') as a variant
CALCULATIONS
 calc_indel_sizes()

Calculates size distribution of insertions or deletions
Type : Calculation Title : calc_indel_sizes Usage : my %sizes = %{ $matrix>calc_indel_sizes }; Function: Calculates the size distribution of indels. Returns : HASH Args : Optional: trim => if true, disregards indels at start and end insertions => if true, counts insertions, if false, counts deletions
 calc_prop_invar()

Calculates proportion of invariant sites.
Type : Calculation Title : calc_prop_invar Usage : my $pinvar = $matrix>calc_prop_invar; Function: Calculates proportion of invariant sites. Returns : Scalar: a number Args : Optional: # if true, counts missing (usually the '?' symbol) as a state # in the final tallies. Otherwise, missing states are ignored missing => 1 # if true, counts gaps (usually the '' symbol) as a state # in the final tallies. Otherwise, gap states are ignored gap => 1
 calc_state_counts()

Calculates occurrences of states.
Type : Calculation Title : calc_state_counts Usage : my %counts = %{ $matrix>calc_state_counts }; Function: Calculates occurrences of states. Returns : Hashref: keys are states, values are counts Args : Optional  one or more states to focus on
 calc_state_frequencies()

Calculates the frequencies of the states observed in the matrix.
Type : Calculation Title : calc_state_frequencies Usage : my %freq = %{ $object>calc_state_frequencies() }; Function: Calculates state frequencies Returns : A hash, keys are state symbols, values are frequencies Args : Optional: # if true, counts missing (usually the '?' symbol) as a state # in the final tallies. Otherwise, missing states are ignored missing => 1 # if true, counts gaps (usually the '' symbol) as a state # in the final tallies. Otherwise, gap states are ignored gap => 1 Comments: Throws exception if matrix holds continuous values
 calc_distinct_site_patterns()

Identifies the distinct distributions of states for all characters and counts their occurrences. Returns an arrayofarrays, where the first cell of each inner array holds the occurrence count, the second cell holds the pattern, i.e. an array of states. For example, for a matrix like this:
taxon1 GTGTGTGTGTGTGTGTGTGTGTG taxon2 AGAGAGAGAGAGAGAGAGAGAGA taxon3 TCTCTCTCTCTCTCTCTCTCTCT taxon4 TCTCTCTCTCTCTCTCTCTCTCT taxon5 AAAAAAAAAAAAAAAAAAAAAAA taxon6 CGCGCGCGCGCGCGCGCGCGCGC taxon7 AAAAAAAAAAAAAAAAAAAAAAA
The following data structure will be returned:
[ [ 12, [ 'G', 'A', 'T', 'T', 'A', 'C', 'A' ] ], [ 11, [ 'T', 'G', 'C', 'C', 'A', 'G', 'A' ] ] ]
The patterns are sorted from most to least frequently occurring, the states for each pattern are in the order of the rows in the matrix. (In other words, the original matrix can more or less be reconstructed by inverting the patterns, and multiplying them by their occurrence, although the order of the columns will be lost.)
Type : Calculation Title : calc_distinct_site_patterns Usage : my $patterns = $object>calc_distinct_site_patterns; Function: Calculates distinct site patterns. Returns : A multidimensional array, see above. Args : NONE Comments:
 calc_gc_content()

Calculates the G+C content as a fraction on the total
Type : Calculation Title : calc_gc_content Usage : my $fraction = $obj>calc_gc_content; Function: Calculates G+C content Returns : A number between 0 and 1 (inclusive) Args : Optional: # if true, counts missing (usually the '?' symbol) as a state # in the final tallies. Otherwise, missing states are ignored missing => 1 # if true, counts gaps (usually the '' symbol) as a state # in the final tallies. Otherwise, gap states are ignored gap => 1 Comments: Throws 'BadArgs' exception if matrix holds anything other than DNA or RNA. The calculation also takes the IUPAC symbol S (which is CG) into account, but no other symbols (such as V, for ACG);
 calc_median_sequence()

Calculates the median character sequence of the matrix
Type : Calculation Title : calc_median_sequence Usage : my $seq = $obj>calc_median_sequence; Function: Calculates median sequence Returns : Array in list context, string in scalar context Args : Optional: ambig => if true, uses ambiguity codes to summarize equally frequent states for a given character. Otherwise picks a random one. missing => if true, keeps the missing symbol (probably '?') if this is the most frequent for a given character. Otherwise strips it. gaps => if true, keeps the gap symbol (probably '') if this is the most frequent for a given character. Otherwise strips it. Comments: The intent of this method is to provide a crude approximation of the most commonly occurring sequences in an alignment, for example as a starting sequence for a sequence simulator. This gives you something to work with if ancestral sequence calculation is too computationally intensive and/or not really necessary.
METHODS
 keep_chars()

Creates a cloned matrix that only keeps the characters at the supplied (zerobased) indices.
Type : Utility method Title : keep_chars Usage : my $clone = $object>keep_chars([6,3,4,1]); Function: Creates spliced clone. Returns : A spliced clone of the invocant. Args : Required, an array ref of integers Comments: The columns are retained in the order in which they were supplied.
 prune_chars()

Creates a cloned matrix that omits the characters at the supplied (zerobased) indices.
Type : Utility method Title : prune_chars Usage : my $clone = $object>prune_chars([6,3,4,1]); Function: Creates spliced clone. Returns : A spliced clone of the invocant. Args : Required, an array ref of integers Comments: The columns are retained in the order in which they were supplied.
 prune_invariant()

Creates a cloned matrix that omits the characters for which all taxa have the same state (or missing);
Type : Utility method Title : prune_invariant Usage : my $clone = $object>prune_invariant; Function: Creates spliced clone. Returns : A spliced clone of the invocant. Args : None Comments: The columns are retained in the order in which they were supplied.
 prune_uninformative()

Creates a cloned matrix that omits all uninformative characters. Uninformative are considered characters where all nonmissing values are either invariant or autapomorphies.
Type : Utility method Title : prune_uninformative Usage : my $clone = $object>prune_uninformative; Function: Creates spliced clone. Returns : A spliced clone of the invocant. Args : None Comments: The columns are retained in the order in which they were supplied.
 prune_missing_and_gaps()

Creates a cloned matrix that omits all characters for which the invocant only has missing and/or gap states.
Type : Utility method Title : prune_missing_and_gaps Usage : my $clone = $object>prune_missing_and_gaps; Function: Creates spliced clone. Returns : A spliced clone of the invocant. Args : None Comments: The columns are retained in the order in which they were supplied.
 bootstrap()

Creates bootstrapped clone.
Type : Utility method Title : bootstrap Usage : my $bootstrap = $object>bootstrap; Function: Creates bootstrapped clone. Returns : A bootstrapped clone of the invocant. Args : Optional, a subroutine reference that returns a random integer between 0 (inclusive) and the argument provided to it (exclusive). The default implementation is to use sub { int( rand( shift ) ) }, a user might override this by providing an implementation with a better random number generator. Comments: The bootstrapping algorithm uses perl's random number generator to create a new series of indices (without replacement) of the same length as the original matrix. These indices are first sorted, then applied to the cloned sequences. Annotations (if present) stay connected to the resampled cells.
 jackknife()

Creates jackknifed clone.
Type : Utility method Title : jackknife Usage : my $bootstrap = $object>jackknife(0.5); Function: Creates jackknifed clone. Returns : A jackknifed clone of the invocant. Args : * Required, a number between 0 and 1, representing the fraction of characters to jackknife. * Optional, a subroutine reference that returns a random integer between 0 (inclusive) and the argument provided to it (exclusive). The default implementation is to use sub { int( rand( shift ) ) }, a user might override this by providing an implementation with a better random number generator. Comments: The jackknife algorithm uses perl's random number generator to create a new series of indices of cells to keep. These indices are first sorted, then applied to the cloned sequences. Annotations (if present) stay connected to the resampled cells.
 replicate()

Creates simulated replicate.
Type : Utility method Title : replicate Usage : my $replicate = $matrix>replicate($tree); Function: Creates simulated replicate. Returns : A simulated replicate of the invocant. Args : Tree to simulate the characters on. Optional: seed => a random integer seed model => an object of class Bio::Phylo::Models::Substitution::Dna or Bio::Phylo::Models::Substitution::Binary random_rootseq => start DNA sequence simulation from random ancestral sequence instead of the median sequence in the alignment. Comments: Requires Statistics::R, with 'ape', 'phylosim', 'phangorn' and 'phytools'. If model is not given as argument, it will be estimated.
 insert()

Insert argument in invocant.
Type : Listable method Title : insert Usage : $matrix>insert($datum); Function: Inserts $datum in $matrix. Returns : Modified object Args : A datum object Comments: This method reimplements the method by the same name in Bio::Phylo::Listable
 compress_lookup()

Removes unused states from lookup table
Type : Method Title : validate Usage : $obj>compress_lookup Function: Removes unused states from lookup table Returns : $self Args : None
 check_taxa()

Validates taxa associations.
Type : Method Title : check_taxa Usage : $obj>check_taxa Function: Validates relation between matrix and taxa block Returns : Modified object Args : None Comments: This method implements the interface method by the same name in Bio::Phylo::Taxa::TaxaLinker
 make_taxa()

Creates a taxa block from the objects contents if none exists yet.
Type : Method Title : make_taxa Usage : my $taxa = $obj>make_taxa Function: Creates a taxa block from the objects contents if none exists yet. Returns : $taxa Args : NONE
SERIALIZERS
 to_xml()

Serializes matrix to nexml format.
Type : Format convertor Title : to_xml Usage : my $data_block = $matrix>to_xml; Function: Converts matrix object into a nexml element structure. Returns : Nexml block (SCALAR). Args : Optional: compact => 1 (for compact representation of matrix)
 to_nexus()

Serializes matrix to nexus format.
Type : Format convertor Title : to_nexus Usage : my $data_block = $matrix>to_nexus; Function: Converts matrix object into a nexus data block. Returns : Nexus data block (SCALAR). Args : The following options are available: # if set, writes TITLE & LINK tokens 'links' => 1 # if set, writes block as a "data" block (deprecated, but used by mrbayes), # otherwise writes "characters" block (default) data_block => 1 # if set, writes "RESPECTCASE" token respectcase => 1 # if set, writes "GAPMODE=(NEWSTATE or MISSING)" token gapmode => 1 # if set, writes "MSTAXA=(POLYMORPH or UNCERTAIN)" token polymorphism => 1 # if set, writes character labels charlabels => 1 # if set, writes state labels statelabels => 1 # if set, writes mesquitestyle charstatelabels charstatelabels => 1 # by default, names for sequences are derived from $datum>get_name, if # 'internal' is specified, uses $datum>get_internal_name, if 'taxon' # uses $datum>get_taxon>get_name, if 'taxon_internal' uses # $datum>get_taxon>get_internal_name, if $key, uses $datum>get_generic($key) seqnames => one of (internaltaxontaxon_internal$key)
 to_dom()

Analog to to_xml.
Type : Serializer Title : to_dom Usage : $matrix>to_dom Function: Generates a DOM subtree from the invocant and its contained objects Returns : an Element object Args : Optional: compact => 1 : renders characters as sequences, not individual cells
SEE ALSO
There is a mailing list at https://groups.google.com/forum/#!forum/biophylo for any user or developer questions and discussions.
 Bio::Phylo::Taxa::TaxaLinker

This object inherits from Bio::Phylo::Taxa::TaxaLinker, so the methods defined therein are also applicable to Bio::Phylo::Matrices::Matrix objects.
 Bio::Phylo::Matrices::TypeSafeData

This object inherits from Bio::Phylo::Matrices::TypeSafeData, so the methods defined therein are also applicable to Bio::Phylo::Matrices::Matrix objects.
 Bio::Phylo::Manual

Also see the manual: Bio::Phylo::Manual and http://rutgervos.blogspot.com.
CITATION
If you use Bio::Phylo in published research, please cite it:
Rutger A Vos, Jason Caravas, Klaas Hartmann, Mark A Jensen and Chase Miller, 2011. Bio::Phylo  phyloinformatic analysis using Perl. BMC Bioinformatics 12:63. http://dx.doi.org/10.1186/147121051263