The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME Mashtree

SYNOPSIS

Helps run a mashtree analysis to make rapid trees for genomes. Please see github.com/lskatz/Mashtree for more information.

mashtree executables

This document covers the Mashtree library, but the highlight the mashtree package is the executable `mashtree`. See github.com/lskatz/Mashtree for more information.

Fast method:

    mashtree --numcpus 12 *.fastq.gz [*.fasta] > mashtree.dnd

More accurate method:

    mashtree --mindepth 0 --numcpus 12 *.fastq.gz [*.fasta] > mashtree.dnd

Bootstrapping and jackknifing

    mashtree_bootstrap.pl --reps 100 --numcpus 12 *.fastq.gz -- --min-depth 0 > mashtree.jackknife.dnd
    mashtree_jackknife.pl --reps 100 --numcpus 12 *.fastq.gz -- --min-depth 0 > mashtree.jackknife.dnd

VARIABLES

$VERSION
$MASHTREE_VERSION (same value as $VERSION)
@fastqExt = qw(.fastq.gz .fastq .fq .fq.gz)
@fastaExt = qw(.fasta .fna .faa .mfa .fas .fsa .fa)
@bamExt = qw(.sorted.bam .bam)
@vcfExt = qw(.vcf.gz .vcf)
@mshExt = qw(.msh)
@richseqExt = qw(.gb .gbank .genbank .gbk .gbs .gbf .embl .ebl .emb .dat .swiss .sp)
$fhStick :shared

Used to mark whether a file is being read, so that Mashtree limits disk I/O

METHODS

$SIG{'__DIE__'}

Remakes how `die` works, so that it references the caller

logmsg

Prints a message to STDERR with the thread number and the program name, with a trailing newline.

openFastq
 Opens a fastq file in a thread-safe way.
_truncateFilename
 Removes fastq extension, removes directory name,
distancesToPhylip

1. Read the mash distances 2. Create a phylip file

Arguments: hash of distances, output directory, settings hash

sortNames

Sorts names.

Arguments:

1. $name - array of names 2. $settings - options * $$settings{'sort-order'} is either "abc", "random", "input-order"

createTreeFromPhylip($phylip, $outdir, $settings)
 Create tree file with Quicktree but bioperl 
 as a backup.
treeDist($treeObj1, $treeObj2)
 Lee's implementation of a tree distance. The objective
 is to return zero if two trees are the same.
mashDist($file1, $file2, $k, $settings)

Find the distance between two mash sketch files Alternatively: two hash lists.

mashHashes($sketch)

Return an array of hashes, the kmer length, and the genome estimated length

raw_mash_distance_unequal_sizes($hashes1, $hashes2)

Compare unequal sized hashes. Treat the first set of hashes as the reference (denominator) set.

raw_mash_distance($hashes1, $hashes2)

Return the number of kmers in common and the number compared total. inspiration from https://github.com/onecodex/finch-rs/blob/master/src/distance.rs#L34

transfer_bootstrap_expectation
 Title   : transfer_bootstrap_expectation
 Usage   : my $tree_with_bs = transfer_bootstrap_expectation(\@bs_trees,$guide_tree);
 Function: Calculates the Transfer Bootstrap Expectation (TBE) for internal nodes based on 
           the methods outlined in Lemoine et al, Nature, 2018.
           Currently experimental.
 Returns : L<Bio::Tree::TreeI>
 Args    : Arrayref of L<Bio::Tree::TreeI>s
           Guide tree, L<Bio::Tree::TreeI>s