Compress::DSRC - Perl bindings to the DSRC compression library
Single-shot (de)compression
use Compress::DSRC; my $engine = Compress::DSRC::Module->new; my $settings = Compress::DSRC::Settings->new; my $threads = 8; $settings->set_dna_level(2); $settings->set_lossy(1); $engine->compress( 'foo.fq' => 'foo.fq.dsrc', $settings, $threads, ) or die $engine->error; $engine->decompress( 'foo.fq.dsrc' => 'bar.fq', $threads, ) or die $engine->error;
Per-record (de)compression
use Compress::DSRC; my $reader = Compress::DSRC::Reader->new; $reader->start( 'bar.fq.dsrc', $threads,) or die $reader->error; my $record = Compress::DSRC::Record->new; while ($reader->read_record($record) { print $record->get_tag, "\n"; print $record->get_sequence, "\n"; print $record->get_plus, "\n"; print $record->get_quality, "\n"; # or, more likely, do something else with record } $reader->finish;
This module provides bindings to the DSRC compression library. It provides basic access to the DsrcModule (one-shot (de)compression) and DsrcArchive (record-by-record (de)compression) APIs.
Compress::DSRC provides the following classes used in compression and decompression:
Compress::DSRC
Compress::DSRC::Module
Objects of this class are used for one-shot compression and decompression (providing an input filename and output filename, along with some other optional parameters).
Compress::DSRC::Reader
Objects of this class are used to read record-by-record from a compressed archive.
Compress::DSRC::Writer
Objects of this class are used to writer record-by-record to a compressed archive.
Compress::DSRC::Settings
Objects of this class contain compression settings and are provided as arguments to several methods that write compressed data.
Compress::DSRC::Record
Objects of this class contain a single FASTQ record with accessors to each of the four data slots.
my $engine = Compress::DSRC::Module->new;
Creates a new one-shot (de)compression engine
$engine->compress( 'foo.fq', 'foo.fq.dsrc', $settings, $threads, ) or die $engine->error;
Compress a FASTQ file in one shot. Required arguments are (in order) input filename, output filename, and a Compress::DSRC::Settings object. Number of threads to use for compression is an optional fourth argument (default: 1).
$engine->decompress( 'foo.fq.dsrc', 'foo.fq', $threads, ) or die $engine->error;
As with compress() but in the other direction. Required arguments are (in order) input filename and output filename. Number of threads to use for decompression is an optional third argument (default: 1).
compress()
If an error occurs, a description can be retrieving using this method.
my $reader = Compress::DSRC::Reader->new;
Create a new Reader object
$reader->start( 'foo.fq', $threads );
Initialize a decompression session. Arguments are the input filename (required) and the number of threads to use (default: 1).
while ($reader->read_record( $record )) { # do something with $record; }
Read the next record in the file. A single argument is expected - a Compress::DSRC::Record object whose data slots will be populated from the record read.
while (my $record = $reader->next_record()) { # do something with $record; }
This provides a slightly more Perl-ish alternative to read_record() for those who prefer it, at the cost of ~ 1.5x longer run times (a new Compress::DSRC::Record object is generated for each call).
read_record()
$reader->finish;
Finalize the session.
my $writer = Compress::DSRC::Writer->new;
Create a new Writer object
$writer->start( 'foo.fq', $settings, $threads );
Initialize a compression session. Arguments are the input filename and Compress::DSRC::Settings object (required) and the number of threads to use (default: 1).
$writer->write_record( $record );
Write a record to file. A single argument is expected - a Compress::DSRC::Record object.
$writer->finish;
The underlying class is a C++ struct, so all methods are accessors to class member variables. See FASTQ documentation for more information. get_plus and set_plus will be rarely used (This slot in the FASTQ specification is generally redundant and usually empty) but are included for completeness.
get_plus
set_plus
my $record = Compress::DSRC::Record->new; $record->set_tag( '@read1 other info' ); $records->set_sequence( 'ATGGCCTA' ); $records->set_quality( '998398A8' ); # do something with $record;
The underlying class is a C++ struct, so all methods are accessors to class member variables. For more information on the meaning of settings, see DSRC documentation.
Get/set the DNA compression level.
Get/set the quality compression level.
Get/set whether to use lossy (binning) quality compression
Get/set whether to do CRC32 checking during compression
See DSRC documentation.
Requires a C++ compiler and the Boost system/thread libraries. There are no other external dependencies.
Currently the underlying C++ library (and thus this module) does not handle the edge case of a FASTQ file containing a single record. A bug report has been filed upstream.
Please report bugs to the author.
Jeremy Volkening <jdv@base2bio.com>
Copyright 2015-2016 Jeremy Volkening
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.
To install Compress::DSRC, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Compress::DSRC
CPAN shell
perl -MCPAN -e shell install Compress::DSRC
For more information on module installation, please visit the detailed CPAN module installation guide.