original_n50_algorithm - A program to calculate N50 from FASTA/FASTQ files, used as template for Proch::N50 [n50_algorithm.pl]


Andrea Telatin <andrea.telatin@quadram.ac.uk>


This program parses a list of FASTA/FASTQ files calculating for each one the number of sequences, the sum of sequences lengths and the N50. It will print the result in different formats, by default only the N50 is printed for a single file and all metrics in TSV format for multiple files.


n50_algorithm.pl [options] [FILE1 FILE2 FILE3...]


-f, --format

Output format: default, tsv, json, custom. See below for format specific switches.

-s, --separator

Separator to be used in 'tsv' output. Default: tab. The 'tsv' format will print a header line, followed by a line for each file given as input with: file path, as received, total number of sequences, total size in bp, and finally N50.

-b, --basename

Instead of printing the path of each file, will only print the filename, stripping relative or absolute paths to it.

-j, --noheader

When used with 'tsv' output format, will suppress header line.

-n, --nonewline

If used with 'default' or 'csv' output format, will NOT print the newline character after the N50. Usually used in bash scripting.

-t, --template

String to be used with 'custom' format. Will be used as template string for each sample, replacing {new} with newlines, {tab} with tab and {N50}, {seqs}, {size}, {path} with sample's N50, number of sequences, total size in bp and file path respectively (the latter will respect --basename if used).

-p, --pretty

If used with 'json' output format, will format the JSON in pretty print mode. Example:

   "file1.fa" : {
     "size" : 290,
     "N50" : "290",
     "seqs" : 2
   "file2.fa" : {
     "N50" : "456",
     "size" : 456,
     "seqs" : 2
-h, --help

Will display this full help message and quit, even if other arguments are supplied.


