Bio::Minimizer - minimizer package
Based on the ideas put forth by Roberts et al 2004: https://academic.oup.com/bioinformatics/article/20/18/3363/202143
my $minimizer = Bio::Minimizer->new($sequenceString); my $kmers = $minimizer->{kmers}; # hash of minimizer => kmer my $minimizers= $minimizer->{minimizers};# hash of minimizer => [kmer1,kmer2,...] # hash of minimizer => [start1,start2,...] # Start coordinates are on the fwd strand even when # matched against the rev strand. my $starts = $minimizer->{starts}; # With more options my $minimizer2= Bio::Minimizer->new($sequenceString,{k=>31,l=>21});
Creates a set of minimizers from sequence
example: Sort a fastq file by minimizer, potentially shrinking gzip size.
This is implemented in this package's scripts/sort*.pl scripts.
use Bio::Minimizer # Read fastq file via stdin, in this example while(my $id = <>){ # Grab an entry ($seq,$plus,$qual) = (scalar(<>), scalar(<>), scalar(<>)); chomp($id,$seq,$plus,$qual); # minimizer object $MINIMIZER = Bio::Minimizer->new($seq,{k=>length($seq)}); # The only minimizer in this entry because k==length(seq) $minMinimizer = (values(%{$$MINIMIZER{minimizers}}))[0]; # combine the minimum minimizer with the entry, for # sorting later. # Save the entry as a string so that we don't have to # parse it later. my $entry = [$minMinimizer, "$id\n$seq\n$plus\n$qual\n"]; push(@entry,$entry); } for my $e(sort {$$a[0] cmp $$b[0]} @entry){ print $$e[1]; }
Arguments: sequence A string of ACGT settings A hash k Kmer length l Minimizer length (some might call it lmer) numcpus Number of threads to use. (not used)
To install Bio::Minimizer, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Bio::Minimizer
CPAN shell
perl -MCPAN -e shell install Bio::Minimizer
For more information on module installation, please visit the detailed CPAN module installation guide.