The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

PDL::Compression - compression utilities

DESCRIPTION

These routines generally accept some data as a PDL and compress it into a smaller PDL. Algorithms typically work on a single dimension and thread over other dimensions, producing a threaded table of compressed values if more than one dimension is fed in.

The Rice algorithm, in particular, is designed to be identical to the RICE_1 algorithm used in internal FITS-file compression (see PDL::IO::FITS).

SYNOPSIS

 use PDL::Compression

 ($b,$asize) = $a->rice_compress();
 $c = $b->rice_expand($asize);

FUNCTIONS

METHODS

rice_compress

  Signature: (in(n); [o]out(m); int[o]len(); lbuf(n); int blocksize)

Squishes an input PDL along the 0 dimension by Rice compression. In scalar context, you get back only the compressed PDL; in list context, you also get back ancillary information that is required to uncompress the data with rice_uncompress.

Multidimensional data are threaded over - each row is compressed separately, and the returned PDL is squished to the maximum compressed size of any row. If any of the streams could not be compressed (the algorithm produced longer output), the corresponding length is set to -1 and the row is treated as if it had length 0.

Rice compression only works on integer data types -- if you have floating point data you must first quantize them.

The underlying algorithm is identical to the Rice compressor used in CFITSIO (and is used by PDL::IO::FITS to load and save compressed FITS images).

The optional blocksize indicates how many samples are to be compressed as a unit; it defaults to 32.

How it works:

Rice compression is a subset of Golomb compression, and works on data sets where variation between adjacent samples is typically small compared to the dynamic range of each sample. In this implementation (originally written by Richard White and contributed to CFITSIO in 1999), the data are divided into blocks of samples (by default 32 samples per block). Each block has a running difference applied, and the difference is bit-folded to make it positive definite. High order bits of the difference stream are discarded, and replaced with a unary representation; low order bits are preserved. Unary representation is very efficient for small numbers, but large jumps could give rise to ludicrously large bins in a plain Golomb code; such large jumps ("high entropy" samples) are simply recorded directly in the output stream.

Working on astronomical or solar image data, typical compression ratios of 2-3 are achieved.

  $out = $pdl->rice_compress($blocksize);
  ($out, $len, $blocksize, $dim0) = $pdl->rice_compress;

  $new = $out->rice_expand;

rice_compress ignores the bad-value flag of the input piddles. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.

rice_expand

  Signature: (in(n); [o]out(m); lbuf(n); int blocksize)

Unsquishes a PDL that has been squished by rice_expand.

     ($out, $len, $blocksize, $dim0) = $pdl->rice_compress;
     $copy = $out->rice_expand($dim0, $blocksize);
     

rice_expand ignores the bad-value flag of the input piddles. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.

AUTHORS

Copyright (C) 2010 Craig DeForest. All rights reserved. There is no warranty. You are allowed to redistribute this software / documentation under certain conditions. For details, see the file COPYING in the PDL distribution. If this file is separated from the PDL distribution, the copyright notice should be included in the file.

The Rice compression library is derived from the similar library in the CFITSIO 3.24 release, and is licensed under yet more more lenient terms than PDL itself; that notice is present in the file "ricecomp.c".

BUGS

  • Currently headers are ignored.

  • Currently there is only one compression algorithm.

TODO

  • Add object encapsulation

  • Add test suite