The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Compress::DSRC - Perl bindings to the DSRC compression library

SYNOPSIS

Single-shot (de)compression

    use Compress::DSRC;

    my $engine   = Compress::DSRC::Module->new;
    my $settings = Compress::DSRC::Settings->new;
    my $threads  = 8;

    $settings->set_dna_level(2);
    $settings->set_lossy(1);
    $engine->compress(
        'foo.fq' => 'foo.fq.dsrc',
         $settings,
         $threads,
    ) or die $engine->error;
    $engine->decompress(
        'foo.fq.dsrc' => 'bar.fq',
         $threads,
    ) or die $engine->error;

Per-record (de)compression

    use Compress::DSRC;

    my $reader = Compress::DSRC::Reader->new;

    $reader->start( 'bar.fq.dsrc', $threads,)
        or die $reader->error;

    my $record = Compress::DSRC::Record->new;
    while ($reader->read_record($record) {
        print $record->get_tag,      "\n";
        print $record->get_sequence, "\n";
        print $record->get_plus,     "\n";
        print $record->get_quality,  "\n";
        # or, more likely, do something else with record
    }
    $reader->finish;

DESCRIPTION

This module provides bindings to the DSRC compression library. It provides basic access to the DsrcModule (one-shot (de)compression) and DsrcArchive (record-by-record (de)compression) APIs.

CLASSES

Compress::DSRC provides the following classes used in compression and decompression:

Compress::DSRC::Module

Objects of this class are used for one-shot compression and decompression (providing an input filename and output filename, along with some other optional parameters).

Compress::DSRC::Reader

Objects of this class are used to read record-by-record from a compressed archive.

Compress::DSRC::Writer

Objects of this class are used to writer record-by-record to a compressed archive.

Compress::DSRC::Settings

Objects of this class contain compression settings and are provided as arguments to several methods that write compressed data.

Compress::DSRC::Record

Objects of this class contain a single FASTQ record with accessors to each of the four data slots.

METHODS

Compress::DSRC::Module

new
    my $engine = Compress::DSRC::Module->new;

Creates a new one-shot (de)compression engine

compress
    $engine->compress(
        'foo.fq',
        'foo.fq.dsrc',
        $settings,
        $threads,
    ) or die $engine->error;

Compress a FASTQ file in one shot. Required arguments are (in order) input filename, output filename, and a Compress::DSRC::Settings object. Number of threads to use for compression is an optional fourth argument (default: 1).

decompress
    $engine->decompress(
        'foo.fq.dsrc',
        'foo.fq',
        $threads,
    ) or die $engine->error;

As with compress() but in the other direction. Required arguments are (in order) input filename and output filename. Number of threads to use for decompression is an optional third argument (default: 1).

error

If an error occurs, a description can be retrieving using this method.

Compress::DSRC::Reader

new
    my $reader = Compress::DSRC::Reader->new;

Create a new Reader object

start
    $reader->start( 'foo.fq', $threads );

Initialize a decompression session. Arguments are the input filename (required) and the number of threads to use (default: 1).

read_record
    while ($reader->read_record( $record )) {
        # do something with $record;
    }

Read the next record in the file. A single argument is expected - a Compress::DSRC::Record object whose data slots will be populated from the record read.

next_record
    while (my $record = $reader->next_record()) {
        # do something with $record;
    }

This provides a slightly more Perl-ish alternative to read_record() for those who prefer it, at the cost of ~ 1.5x longer run times (a new Compress::DSRC::Record object is generated for each call).

finish
    $reader->finish;

Finalize the session.

error

If an error occurs, a description can be retrieving using this method.

Compress::DSRC::Writer

new
    my $writer = Compress::DSRC::Writer->new;

Create a new Writer object

start
    $writer->start( 'foo.fq', $settings, $threads );

Initialize a compression session. Arguments are the input filename and Compress::DSRC::Settings object (required) and the number of threads to use (default: 1).

write_record
    $writer->write_record( $record );

Write a record to file. A single argument is expected - a Compress::DSRC::Record object.

finish
    $writer->finish;

Finalize the session.

error

If an error occurs, a description can be retrieving using this method.

Compress::DSRC::Record

The underlying class is a C++ struct, so all methods are accessors to class member variables. See FASTQ documentation for more information. get_plus and set_plus will be rarely used (This slot in the FASTQ specification is generally redundant and usually empty) but are included for completeness.

    my $record = Compress::DSRC::Record->new;
    $record->set_tag( '@read1 other info' );
    $records->set_sequence( 'ATGGCCTA' );
    $records->set_quality( '998398A8' );
    # do something with $record;
get_tag / set_tag
get_sequence / set_sequence
get_plus / set_plus
get_quality / set_quality

Compress::DSRC::Settings

The underlying class is a C++ struct, so all methods are accessors to class member variables. For more information on the meaning of settings, see DSRC documentation.

get_dna_level / set_dna_level

Get/set the DNA compression level.

get_qual_level / set_qual_level

Get/set the quality compression level.

get_lossy / set_lossy

Get/set whether to use lossy (binning) quality compression

get_calc_crc32 / set_calc_crc32

Get/set whether to do CRC32 checking during compression

get_buffer_size / set_buffer_size

See DSRC documentation.

get_tag_mask / set_tag_mask

See DSRC documentation.

DEPENDENCIES

Requires a C++ compiler and the Boost system/thread libraries. There are no other external dependencies.

CAVEATS AND BUGS

Currently the underlying C++ library (and thus this module) does not handle the edge case of a FASTQ file containing a single record. A bug report has been filed upstream.

Please report bugs to the author.

AUTHOR

Jeremy Volkening <jdv@base2bio.com>

COPYRIGHT AND LICENSE

Copyright 2015-2016 Jeremy Volkening

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.