Compress::BGZF::Reader - Performs blocked GZIP (BGZF) decompression
use Compress::BGZF::Reader; # Use as filehandle my $fh_bgz = Compress::BGZF::Reader->new_filehandle( $bgz_filename ); # you can do this, but it's probably faster just to pipe gunzip while (my $line = <$fh_bgz>) { print $line; } # here's the random-access goodness # fetch 32 bytes from uncompressed offset 1001 seek $fh_bgz, 1001, 0; read $fh_bgz, my $data, 32; print $data; # Use as object my $reader = Compress::BGZF::Reader->new( $bgz_filename ); # Move to a virtual offset (somehow pre-calculated) and read 32 bytes $reader->move_to_vo( $virt_offset ); my $data = $reader->read_data(32); print $data; $reader->write_index( $fn_idx );
Compress::BGZF::Reader is a module implementing random access to the BGZIP file format. While it can do sequential/streaming reads, there is really no point in using it for this purpose over standard GZIP tools/libraries, since BGZIP is GZIP-compatible. The
Compress::BGZF::Reader
There are two main modes of construction - as an object (using new()) and as a filehandle glob (using new_filehandle). The filehandle mode is straightforward for general use (emulating seek/read/tell functionality and passing to other classes/methods that expect a filehandle). The object mode has additional features such as seeking to virtual offsets and dumping the offset index to file.
new()
new_filehandle
my $fh_bgzf = Compress::BGZF::Writer->new_filehandle( $input_fn );
Create a new Compress::BGZF::Reader engine and tie it to a IO::File handle, which is returned. Takes a mandatory single argument for the filename to be read from.
my $line = <$fh_bgzf>; my $line = readline $fh_bgzf; seek $fh_bgzf, 256, 0; read $fh_bgzf, my $buffer, 32; my $loc = tell $fh_bgzf; print "End of file\n" if eof($fh_bgzf);
These functions emulate the standard perl functions of the same name.
my $reader = Compress::BGZF::Reader->new( $fn_in );
Create a new Compress::BGZF::Reader engine. Requires a single argument - the name of the BGZIP file to be read from.
$reader->move_to( 493, 0 );
Seeks to the given uncompressed offset. Takes two arguments - the requested offset and the relativity of the offset (0: file start, 1: current, 2: file end)
$reader->move_to_vo( $virt_offset );
Like move_to, but takes as a single argument a virtual offset. Virtual offsets are described more in the top-level documentation for Compress::BGZF.
move_to
Compress::BGZF
$reader->get_vo();
Returns the virtual offset of the current read position
my $data = $reader->read_data( 32 );
Read uncompressed data from the current location. Takes a single argument - the number of bytes to be read - and returns the data read or undef if at EOF.
undef
EOF
my $line = $reader->getline();
Reads one line of uncompressed data from the current location, shifting the current file offset accordingly. Returns the line read or undef if currently at EOF.
my $size = $reader->usize();
Returns the uncompressed size of the file, as calculated during indexing.
$reader->write_index( $fn_index );
Writes the compressed index to file. The index format (as defined by htslib) consists of little-endian int64-coded values. The first value is the number of offsets in the index. The rest of the values consist of pairs of block offsets relative to the compressed and uncompressed data. The first offset (always 0,0) is not included. The index files written by Compress::BGZF should be compatible with those of the htslib bgzip software, and vice versa.
bgzip
Note that when using the tied filehandle interface, the behavior of the module will replicate that of a file opened in raw mode. That is, none of the Perl magic concerning platform-specific newline conversions will be performed. It's expected that users of this module will generally be seeking to predetermined byte offsets in a file (such as read from an index), and operations such as seek, read, and <> are not reliable in a cross-platform way on files opened in 'text' mode. In other words, seeking to and reading from a specific offset in 'text' mode may return different results depending on the platform Perl is running on. This isn't an issue specific to this module but to Perl in general. Users should simply be aware that any data read using this module will retain its original line endings, which may not be the same as those of the current platform.
seek
read
<>
For a further discussion, see http://perldoc.perl.org/perlport.html#Newlines.
This is code is in alpha testing stage and the API is not guaranteed to be stable.
Please reports bugs to the author.
Jeremy Volkening <jdv *at* base2bio.com>
Copyright 2015-2016 Jeremy Volkening
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.
To install Compress::BGZF, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Compress::BGZF
CPAN shell
perl -MCPAN -e shell install Compress::BGZF
For more information on module installation, please visit the detailed CPAN module installation guide.