++ed by:

1 PAUSE user
2 non-PAUSE users.

Author image Andrea Telatin 🧬
and 1 contributors


Proch::N50 - a small module to calculate N50 (total size, and total number of sequences) for a FASTA or FASTQ file. It's easy to install, with minimal dependencies.


version 1.3.0


  use Proch::N50 qw(getStats getN50);
  my $filepath = '/path/to/assembly.fasta';

  # Get N50 only: getN50(file) will return an integer
  print "N50 only:\t", getN50($filepath), "\n";

  # Full stats
  my $seq_stats = getStats($filepath);
  print Data::Dumper->Dump( [ $seq_stats ], [ qw(*FASTA_stats) ] );
  # Will print:
  # %FASTA_stats = (
  #               'N50' => 65,
  #               'N75' => 50,
  #               'N90' => 4,
  #               'min' => 4,
  #               'max' => 65,
  #               'dirname' => 'data',
  #               'auN' => 45.02112,
  #               'size' => 130,
  #               'seqs' => 6,
  #               'filename' => 'test.fa',
  #               'status' => 1
  #             );

  # Get also a JSON object
  my $seq_stats_with_JSON = getStats($filepath, 'JSON');
  print $seq_stats_with_JSON->{json}, "\n";
  # Will print:
  # {
  #    "status" : 1,
  #    "seqs" : 6,
  #    <...>
  #    "filename" : "small_test.fa",
  #    "N50" : 65,
  # }
  # Directly ask for the JSON object only:
  my $json = jsonStats($filepath);
  print $json;



This function returns the N50 for a FASTA/FASTQ file given, or 0 in case of error(s).

getStats(filepath, alsoJSON)

Calculates N50 and basic stats for <filepath>. Returns also JSON if invoked with a second parameter. This function return a hash reporting:

size (int)

total number of bp in the files

N50, N75, N90 (int)

the actual N50, N75, and N90 metrices

auN (float)

the area under the Nx curve, as described in https://lh3.github.io/2020/04/08/a-new-metric-on-assembly-contiguity. Returs with 5 decimal digits.

min (int)

Minimum length observed in FASTA/Q file

max (int)

Maximum length observed in FASTA/Q file

seqs (int)

total number of sequences in the files

filename (string)

file basename of the input file

dirname (string)

name of the directory containing the input file (as received)

path (string)

name of the directory containing the input file (resolved to its absolute path)

json (string: JSON pretty printed)

(pretty printed) JSON string of the object (only if JSON is installed)


Returns the JSON string with basic stats (same as $result->{json} from getStats(File, JSON)). Requires JSON::PP installed.

_n50fromHash(hash, totalsize)

This is an internal helper subroutine that perform the actual N50 calculation, hence its addition to the documentation. Expects the reference to an hash of sizes $size{SIZE} = COUNT and the total sum of sizes obtained parsing the sequences file. Returns N50, min and max lengths.


Module (N50.pm)

FASTX::Reader (required)
JSON::PP, <File::Basename> (core modules)

Implementation (n50.pl)


(optional) when using --format JSON


(optional) when using --format screen. This might be substituted by a different module in the future.


Andrea Telatin <andrea@telatin.com>


This software is Copyright (c) 2018-2020 by Andrea Telatin.

This is free software, licensed under:

  The MIT (X11) License