The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Bio::Phylo::EvolutionaryModels - Evolutionary models for phylogenetic trees and methods to sample these Klaas Hartmann, September 2007

SYNOPSIS

 #For convenience we import the sample routine (so we can write sample(...) instead of
 #Bio::Phylo::EvolutionaryModels::sample(...).
 use Bio::Phylo::EvolutionaryModels qw (sample);
 
 #Example#A######################################################################
 #Simulate a single tree with ten species from the constant rate birth model with parameter 0.5
 my $tree = Bio::Phylo::EvolutionaryModels::constant_rate_birth(birth_rate => .5, tree_size => 10);
 
 #Example#B######################################################################
 #Sample 5 trees with ten species from the constant rate birth model using the b algorithm
 my ($sample,$stats) = sample(sample_size =>5,
                              tree_size => 10,
                              algorithm => 'b',
                              algorithm_options => {rate => 1},
                              model => \&Bio::Phylo::EvolutionaryModels::constant_rate_birth,
                              model_options => {birth_rate=>.5});

                              
 #Print a newick string for the 4th sampled tree                              
 print $sample->[3]->to_newick."\n";            
 
 #Example#C######################################################################
 #Sample 5 trees with ten species from the constant rate birth and death model using 
 #the bd algorithm and two threads (useful for dual core processors)
 #NB: we must specify an nstar here, an appropriate choice will depend on the birth_rate
 #    and death_rate we are giving the model    
               
 my ($sample,$stats) = sample(sample_size =>5,
                              tree_size => 10,
                              threads => 2,
                              algorithm => 'bd',
                              algorithm_options => {rate => 1, nstar => 30},
                              model => \&Bio::Phylo::EvolutionaryModels::constant_rate_birth_death,
                              model_options => {birth_rate=>1,death_rate=>.8});
                               
 #Example#D######################################################################
 #Sample 5 trees with ten species from the constant rate birth and death model using 
 #incomplete taxon sampling
 #
 #sampling_probability is set so that the true tree has 10 species with 50% probability,
 #11 species with 30% probability and 12 species with 20% probability
 #
 #NB: we must specify an mstar here this will depend on the model parameters and the 
 #    incomplete taxon sampling parameters

 my $algorithm_options = {rate => 1, 
                          nstar => 30, 
                          mstar => 12,     
                          sampling_probability => [.5, .3, .2]};
                   
 my ($sample,$stats) = sample(sample_size =>5,
                              tree_size => 10,
                              algorithm => 'incomplete_sampling_bd',
                              algorithm_options => $algorithm_options,
                              model => \&Bio::Phylo::EvolutionaryModels::constant_rate_birth_death,
                              model_options => {birth_rate=>1,death_rate=>.8});

 #Example#E######################################################################
 #Sample 5 trees with ten species from a Yule model using the memoryless_b algorithm
 
 #First we define the random function for the shortest pendant edge for a Yule model
 my $random_pendant_function = sub { 
     %options = @_;
     return -log(rand)/$options{birth_rate}/$options{tree_size};
 };
 
 #Then we produce our sample
 my ($sample,$stats) = sample(sample_size =>5,
                              tree_size => 10,
                              algorithm => 'memoryless_b',
                              algorithm_options => {pendant_dist => $random_pendant_function},
                              model => \&Bio::Phylo::EvolutionaryModels::constant_rate_birth,
                              model_options => {birth_rate=>1});

 #Example#F#######################################################################
 #Sample 5 trees with ten species from a constant birth death rate model using the 
 #constant_rate_bd algorithm
 my ($sample) = sample(sample_size => 5,
                       tree_size => 10,
                       algorithm => 'constant_rate_bd',
                       model_options => {birth_rate=>1,death_rate=>.8});

DESCRIPTION

This package contains evolutionary models for phylogenetic trees and algorithms for sampling from these models. It is a non-OO module that optionally exports the 'sample', 'constant_rate_birth' and 'constant_rate_birth_death' subroutines into the caller's namespace, using the use Bio::Phylo::EvolutionaryModels qw(sample constant_rate_birth constant_rate_birth_death); directive. Alternatively, you can call the subroutines as class methods, as in the synopsis.

The initial set of algorithms available in this package corresponds to those in:

Sampling trees from evolutionary models Klaas Hartmann, Dennis Wong, Tanja Gernhard Systematic Biology, in press

Some comments and code refers back to this paper. Further algorithms and evolutionary are encouraged and welcome.

To make this code as straightforward as possible to read some of the algorithms have been implemented in a less than optimal manner. The code also follows the structure of an earlier version of the manuscript so there is some redundancy (eg. the birth algorithm is just a specific instance of the birth_death algorithm)

SAMPLING

All sampling algorithms should be accessed through the generic sample interface.

Generic sampling interface: sample()

 Type    : Interface
 Title   : sample
 Usage   : see SYNOPSIS
 Function: Samples phylogenetic trees from an evolutionary model
 Returns : A sample of phylogenetic trees and statistics from the
           sampling algorithm
 Args    : Sampling parameters in a hash

This method acts as a gateway to the various sampling algorithms. The argument is a single hash containing the options for the sampling run.

Sampling parameters (* denotes optional parameters):

 sample_size    The number of trees to return (more trees may be returned)  
 tree_size      The size that returned trees should be
 model          The evolutionary model (should be a function reference)
 model_options  A hash pointer for model options (see individual models)
 algorithm      The algorithm to use (omit the preceding sample_)
 algorithm_options A hash pointer for options for the algorithm (see individual algorithms for details)
 threads*       The number of threads to use (default is 1)
 output_format* Set to newick for newick trees (default is Bio::Phylo::Forest::Tree)
 remove_extinct Set to true to remove extinct species

Available algorithms (algorithm names in the paper are given in brackets):

 b                       For all pure birth models (simplified GSA)
 bd                      For all birth and death models (GSA)
 incomplete_sampling_bd  As above, with incomplete taxon sampling (extended GSA)
 memoryless_b            For memoryless pure birth models (PBMSA)
 constant_rate_bd        For birth and death models with constant rates (BDSA)

Model

If you create your own model it must accept an options hash as its input. This options hash can contain any parameters you desire. Your model should simulate a tree until it becomes extinct or the size/age limit as specified in the options has been reached. Respectively these options are tree_size and tree_age.

Multi-threading

Multi-thread support is very simplistic. The number of threads you specify are created and each is assigned the task of finding sample_size/threads samples. I had problems with using Bio::Phylo::Forest::Tree in a multi- threaded setting. Hence the sampled trees are returned as newick strings to the main routine where (if required) Tree objects are recreated from the strings. For most applications this overhead seems negligible in contrast to the sampling times.

From a code perspective this function (sample):

 Checks input arguments
 Handles multi-threading
 Calls the individual algorithms to perform sampling
 Reformats data

Sampling algorithms

These algorithms should be accessed through the sampling interface (sample()). Additional parameters need to be passed to these algorithms as described for each algorithm.

sample_b()

Sample from any birth model

 Type    : Sampling algorithm
 Title   : sample_b
 Usage   : see sample
 Function: Samples trees from a pure birth model
 Returns : see sample
 Args    : %algorithm_options requires the field:
           rate => sampling rate 
sample_bd()

Sample from any birth and death model for which nstar exists

 Type    : Sampling algorithm
 Title   : sample_bd
 Usage   : see sample
 Function: Samples trees from a birth and death model
 Returns : see sample
 Args    : %algorithm_options requires the fields:
           nstar => once a tree has nstar species there should be
           a negligible chance of returning to tree_size species
           rate => sampling rate 
sample_incomplete_sampling_bd()

Sample from any birth and death model with incomplete taxon sampling

 Type    : Sampling algorithm
 Title   : sample_incomplete_sampling_bd
 Usage   : see sample
 Function: Samples trees from a birth and death model with incomplete taxon sampling
 Returns : see sample
 Args    : %algorithm_options requires the fields:
           rate => sampling rate 
           nstar => once a tree has nstar species there should be
           a negligible chance of returning to mstar species
           mstar => trees with more than mstar species form a negligible 
           contribution to the final sample.
           sampling_probability => see below.
           

sampling_probability

 vector: must have length (mstar-tree_size+1) The ith element gives the probability
         of not sampling i species.             
 scalar: the probability of sampling any individual species. Is used to calculate
         a vector as discussed in the paper.
sample_memoryless_b()

Sample from a memoryless birth model

 Type    : Sampling algorithm
 Title   : sample_memoryless_b
 Usage   : see sample
 Function: Samples trees from a memoryless birth model
 Returns : see sample
 Args    : %algorithm_options with fields:
           pendant_dist => function reference for generating random
           shortest pendant edges 

NB: The function pointed to by pendant_dist is given model_options as it's input argument with an added field tree_size. It must return a random value from the probability density for the shortest pendant edges.

sample_constant_rate_bd()

Sample from a constant rate birth and death model

 Type    : Sampling algorithm
 Title   : sample_constant_rate_bd
 Usage   : see sample
 Function: Samples trees from a memoryless birth model
 Returns : see sample
 Args    : no specific algorithm options but see below

NB: This algorithm only applies to constant rate birth and death processes. Consequently a model does not need to be specified (and will be ignored if it is). But birth_rate and death_rate model options must be given.

EVOLUTIONARY MODELS

All evolutionary models take a options hash as their input argument and return a Bio::Phylo::Forest::Tree. This tree may contain extinct lineages (lineages that end prior to the end of the tree).

The options hash contains any model specific parameters (see the individual model descriptions) and one or both terminating conditions: tree_size => the number of extant species at which to terminate the tree tree_age => the age of the tree at which to terminate the process

Note that if the model stops due to the tree_size condition then the tree ends immediately after the speciation event that created the last species.

constant_rate_birth()

A constant rate birth model (Yule/ERM)

 Type    : Evolutionary model
 Title   : constant_rate_birth
 Usage   : $tree = constant_rate_birth(%options)
 Function: Produces a tree from the model terminating at a given size/time
 Returns : Bio::Phylo::Forest::Tree
 Args    : %options with fields:
           birth_rate The birth rate parameter (default 1)
           tree_size  The size of the tree at which to terminate
           tree_age   The age of the tree at which to terminate

 NB: At least one of tree_size and tree_age must be specified           
external_model()

A dummy model that takes as input a set of newick_trees and randomly samples these.

 Type    : Evolutionary model
 Title   : external_model
 Usage   : $tree = $external_model(%options)
 Function: Returns a random tree that was given as input
 Returns : Bio::Phylo::Forest::Tree
 Args    : %options with fields:
           trees      An array of newick strings. One of these is returned at random.

 NB: The usual parameters tree_size and tree_age will be ignored. When sampling 
     using this model the trees array must contain trees adhering to the requirements
     of the sampling algorithm. This is NOT checked automatically.
constant_rate_birth_death()

A constant rate birth and death model

 Type    : Evolutionary model
 Title   : constant_rate_birth_death
 Usage   : $tree = constant_rate_birth_death(%options)
 Function: Produces a tree from the model terminating at a given size/time
 Returns : Bio::Phylo::Forest::Tree
 Args    : %options with fields:
           birth_rate The birth rate parameter (default 1)
           death_rate The death rate parameter (default no extinction)
           tree_size  The size of the tree at which to terminate
           tree_age   The age of the tree at which to terminate

 NB: At least one of tree_size and tree_age must be specified           
diversity_dependent_speciation()

A birth and death model with speciation rate dependent on diversity as per Etienne et. al. 2012

 Type    : Evolutionary model
 Title   : diversity_dependent_speciation
 Usage   : $tree = diversity_dependent_speciation(%options)
 Function: Produces a tree from the model terminating at a given size/time
 Returns : Bio::Phylo::Forest::Tree
 Args    : %options with fields:
           maximal_birth_rate The maximal birth rate parameter (default 1)
           death_rate The death rate parameter (default no extinction)
           K_dash     The modified carrying capacity (no default)
           tree_size  The size of the tree at which to terminate
           tree_age   The age of the tree at which to terminate

 NB: At least one of tree_size and tree_age must be specified           

Reference: Rampal S. Etienne, Bart Haegeman, Tanja Stadler, Tracy Aze, Paul N. Pearson, Andy Purvis and Albert B. Phillimore. "Diversity-dependence brings molecular phylogenies closer to agreement with the fossil record" doi: 10.1098/rspb.2011.1439

constant_rate_birth_death()

A temporal shift birth and death model

 Type    : Evolutionary model
 Title   : temporal_shift_birth_death
 Usage   : $tree = constant_rate_birth_death(%options)
 Function: Produces a tree from the model terminating at a given size/time
 Returns : Bio::Phylo::Forest::Tree
 Args    : %options with fields:
           birth_rates The birth rates 
           death_rates The death rates
           rate_times  The times after which the rates apply (first element must be 0)
           tree_size  The size of the tree at which to terminate
           tree_age   The age of the tree at which to terminate

 NB: At least one of tree_size and tree_age must be specified           
evolving_speciation_rate()

An evolutionary model featuring evolving speciation rates. Each daughter species is assigned its parent's speciation rate multiplied by a normally distributed noise factor.

 Type    : Evolutionary model
 Title   : evolving_speciation_rate
 Usage   : $tree = evolving_speciation_rate(%options)
 Function: Produces a tree from the model terminating at a given size/time
 Returns : Bio::Phylo::Forest::Tree
 Args    : %options with fields:
           birth_rate The initial speciation rate (default 1)
           evolving_std The standard deviation of the normal distribution 
                      from which the rate multiplier is drawn.
           tree_size  The size of the tree at which to terminate
           tree_age   The age of the tree at which to terminate

 NB: At least one of tree_size and tree_age must be specified           
clade_shifts()

A constant rate birth-death model with punctuated changes in the speciation and extinction rates. At each change one lineage receives new pre-specified speciation and extinction rates.

 Type    : Evolutionary model
 Title   : clade_shifts
 Usage   : $tree = clade_shifts(%options)
 Function: Produces a tree from the model terminating at a given size/time
 Returns : Bio::Phylo::Forest::Tree
 Args    : %options with fields:
           birth_rates The speciation rates
           death_rates The death rates
           rate_times  The times at which the rates are introduced to a new
             clade. The first time should be zero. The remaining must be in 
             ascending order.
           tree_size  The size of the tree at which to terminate
           tree_age   The age of the tree at which to terminate

 NB: At least one of tree_size and tree_age must be specified           
beta_binomial()

An evolutionary model featuring evolving speciation rates. From Blum2007

 Type    : Evolutionary model
 Title   : beta_binomial
 Usage   : $tree = beta_binomial(%options)
 Function: Produces a tree from the model terminating at a given size/time
 Returns : Bio::Phylo::Forest::Tree
 Args    : %options with fields:
           birth_rate The initial speciation rate (default 1)
           model_param The parameter as defined in Blum2007
           tree_size  The size of the tree at which to terminate
           tree_age   The age of the tree at which to terminate

 NB: At least one of tree_size and tree_age must be specified