The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

biopop - SNP statistics based on BioPerl

SYNOPSIS

biopop [options] <alignment_file>

biopop [-h | --help | -V | --version | --man]

 biopop -s pop.fas            # num of [s]egregating sites
 biopop -p pop.fas            # average [p]airwise nucleotide difference
 biopop -f pop.fas            # [f]our gamete tests
 biopop -c pop.fas            # [c]oding SNPs
 biopop -n pop.fas            # [n]on-coding SNPs
 biopop -m pop.fas            # [m]is-match distribution
 biopop -b pop.fas            # Retain only [b]inary informative sites

DESCRIPTION

biopop is a pop-genetics utility based on BioPerl modules including Bio::PopGen::Utilities, Bio::PopGen::Statistics, and Bio::PopGen::Population. Most methods are not in BioPerl and have not been validated. Use with caution.

OPTIONS

--bi-part

Prints, for each binary informative SNP sites, a NEWICK tree. This could be used to test site compatibility (recombination), similar to the four-gamete test.

--bi-sites, -b

Prints a FASTA alignment consisting of only binary-informative SNPs.

--bi-sites-for-r

Prints binary-informative SNPs for each individual, in a pseudo-diploid genotype so the output could be imported into R package "genetics" for further analysis.

--distance|-d 'jc|k2|uncorrected|f81|t92|f84|tajimanei'

Prints a distance matrix based on a specified method (JC by default)

--four-gametes, -f

Performs four-gametes test of recombination by Hudson & Kaplan (Genetics.1985. 111:147-164) and a test of epistasis (Wilson??). It identifies all binary-informative SNPs and print, for each of pair of SNPs per line, site coordinates, counts of four possible gametes, Shannon diversity of haplotypes, and whether compatible or not. Two SNPs are incompatibile if all four possible haplotypes are present, indicating recombination. Presence of only two of the four possible haplotypes indicate, on the other hand, a possible epistatic interaction.

--heterozygosity, -H

Print, for each segregating site, the observed heterozygosity [i.e., 1-sum(freq^2)].

--input, -i <format>

Input file format. By default, this is 'FASTA'. Now it tries to guess the format. No more need to set this flag.

--mis-match, -m

Print pairwise mismatches for all sequences, the distribution of which indicates population age.

--pi, -p

Nucleotide Diversity is a measure of genetic variation or differences.

--seg-sites, -s

Prints number of segregating sites.

--snp-coding, -c

Identify & print, for each 2-state SNP, codon position, aligned nucleotide position, syn/nonsyn, frequencies of each allelic state, and Shannon diversity for a coding alignment.

--snp-coding-long, -C

Print long-format of the above method.

--snp-noncoding, -n

Identify & print, for each 2-state SNP, SNP position, SNP states, frequencies of each allleic state, and Shannon diversity.

--stats, -t <comma separated list of values>

Specify the statistics ('pi', 'theda', 'tajima_d', per-site values) you would like to gather from input data. e.g., "theta,pi" will calculate the theta and pi values.

Can also be specified by giving the option multiple times. e.g., biopop --stats=pi --stats=theta

Common Options

--help, -h

Print a brief help message and exit.

--man

Print the manual page and exit.

--version, -V

Print current release version and exit.

SEE ALSO

CONTRIBUTORS

  • Yözen Hernández <yzhernand at gmail dot com> (initial design & implementation)

  • Weigang Qiu <weigang@genectr.hunter.cuny.edu> (Maintainer)

  • Rocky Bernstein (testing & release)

TO DO

  • Clean and refactor PopManipulation codes (e.g., factor out shared variables and subroutines)

  • Move dist methods to bioaln

  • Add multiple-loci (pop-genome) capabilities

  • Add outgroup-based statistics, e.g, mk, iHS

  • Add KaKs statistiscs

TO CITE

  • Hernandez, Bernstein, Qiu, et al (2017). "BpWrappers: Command-line utilities for manipulation of sequences, alignments, and phylogenetic trees based on BioPerl". (In prep).

  • Stajich et al (2002). "The BioPerl Toolkit: Perl Modules for the Life Sciences". Genome Research 12(10):1611-1618.