Bio::Kmer - Helper module for Kmer Analysis.


A module for helping with kmer analysis.

  use strict;
  use warnings;
  use Bio::Kmer;
  my $kmer=Bio::Kmer->new("file.fastq.gz",{kmercounter=>"jellyfish",numcpus=>4});
  my $kmerHash=$kmer->kmers();
  my $countOfCounts=$kmer->histogram();

The BioPerl way

  use strict;
  use warnings;
  use Bio::SeqIO;
  use Bio::Kmer;

  # Load up any Bio::SeqIO object. Quality values will be
  # faked internally to help with compatibility even if
  # a fastq file is given.
  my $seqin = Bio::SeqIO->new(-file=>"input.fasta");
  my $kmer=Bio::Kmer->new($seqin);
  my $kmerHash=$kmer->kmers();
  my $countOfCounts=$kmer->histogram();


A module for helping with kmer analysis. The basic methods help count kmers and can produce a count of counts. Currently this module only supports fastq format. Although this module can count kmers with pure perl, it is recommended to give the option for a different kmer counter such as Jellyfish.



Boolean describing whether the module instance is using threads


Bio::Kmer->new($filename, \%options)

Create a new instance of the kmer counter. One object per file.

  Filename can be either a file path or a Bio::SeqIO object.

  Applicable arguments for \%options:
  Argument     Default    Description
  kmercounter  perl       What kmer counter software to use.
                          Choices: Perl, Jellyfish.
  kmerlength|k 21         Kmer length
  numcpus      1          This module uses perl 
                          multithreading with pure perl or 
                          can supply this option to other 
                          software like jellyfish.
  gt           1          If the count of kmers is fewer 
                          than this, ignore the kmer. This 
                          might help speed analysis if you 
                          do not care about low-count kmers.
  sample       1          Retain only a percentage of kmers.
                          1 is 100%; 0 is 0%
                          Only works with the perl kmer counter.
  verbose      0          Print more messages.

  my $kmer=Bio::Kmer->new("file.fastq.gz",{kmercounter=>"jellyfish",numcpus=>4});

Returns the number of base pairs counted. In some cases such as when counting with Jellyfish, that number is not calculated; instead the length is calculated by the total length of kmers. Internally, this number is stored as $kmer->{_ntcount}.

Note: internally runs $kmer->histogram() if $kmer->{_ntcount} is not initially found.

  Arguments: None
  Returns:   integer

Count kmers. This method is called as soon as new() is called and so you should never have to run this method. Internally caches the kmer counts to ram.

  Arguments: None
  Returns:   None

Clears kmer counts and histogram counts. You should probably never use this method.

  Arguments: None
  Returns:   None

Query the set of kmers with your own query

  Arguments: query (string)
  Returns:   Count of kmers. 
              0 indicates that the kmer was not found.
             -1 indicates an invalid kmer (e.g., invalid length)

Count the frequency of kmers. Internally caches the histogram to ram.

  Arguments: none
  Returns:   Reference to an array of counts. The index of 
             the array is the frequency.

Return actual kmers

  Arguments: None
  Returns:   Reference to a hash of kmers and their counts

Finds the union between two sets of kmers

  Arguments: Another Bio::Kmer object
  Returns:   List of kmers

Finds the intersection between two sets of kmers

  Arguments: Another Bio::Kmer object
  Returns:   List of kmers

Finds the set of kmers unique to this Bio::Kmer object.

  Arguments: Another Bio::Kmer object
  Returns:   List of kmers

Cleans the temporary directory and removes this object from RAM. Good for when you might be counting kmers for many things but want to keep your overhead low.

  Arguments: None
  Returns:   1


MIT license. Go nuts.


Author: Lee Katz <>

For additional help, go to

CPAN module at