The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Data::BitStream - A bit stream class including integer coding methods

SYNOPSIS

  use Data::BitStream;
  my $stream = Data::BitStream->new;
  $stream->put_gamma($_) for (1 .. 20);
  $stream->rewind_for_read;
  my @values = $stream->get_gamma(-1);

See the examples for more uses.

DESCRIPTION

A Mouse/Moose class providing read/write access to bit streams. This includes many integer coding methods as well as straightforward ways to implement new codes.

Bit streams are often used in data compression and in embedded products where memory is at a premium. While this Perl implementation may not be appropriate for many of these applications (speed and Perl), it can be very useful for prototyping and experimenting with different codes. A future implementation using XS for internals may resolve some performance concerns.

EXAMPLES

Display bit patterns for some codes

  use Data::BitStream;
  sub string_of { my $stream = Data::BitStream->new;
                  $_[0]->($stream);
                  return $stream->to_string; }
  my @codes = qw(Gamma Delta Omega Fib);
  printf "%5s  " . (" %-11s" x scalar @codes) . "\n", 'N', @codes;
  foreach my $n (0 .. 20) {
    printf "%5d  ", $n;
    printf " %-11s", string_of(sub{shift->put_gamma($n)});
    printf " %-11s", string_of(sub{shift->put_delta($n)});
    printf " %-11s", string_of(sub{shift->put_omega($n)});
    printf " %-11s", string_of(sub{shift->put_fib($n)});
    print "\n";
  }

A simple predictor/encoder compression snippit

  use Data::BitStream;
  my $stream = Data::BitStream->new;
  # Loop over the data: characters, pixels, table entries, etc.
  foreach my $v (@values) {
    # predict the value at this pixel using your subroutine.
    my $p = predict();
    # determine the signed difference.
    my $diff = $v - $p;
    # Turn this into an absolute difference suitable for coding.
    my $error = ($diff < 0)  ?  -2*$diff  :  2*$diff-1;
    # Encode this using gamma encoding (or whichever works best for you).
    $stream->put_gamma($error);
  }
  # Nicely packed up compressed data.
  my $compressed_data = $stream->to_raw;

This is a classic prediction-coding style compression method, used in many applications. Most lossless image compressors use this method, though often with some extra steps further reduce the error term. JPEG-LS, for example, uses a very simple predictor, and puts its effort into relatively complex bias estimations and adaptive determination of the parameter for Rice coding.

Convert Elias Delta encoded strings into Fibonacci encoded strings

  #/usr/bin/perl
  use Data::BitStream;
  my $d = Data::BitStream->new;
  my $f = Data::BitStream->new;
  while (<>) {
    chomp;
    $d->from_string($_);
    $f->erase_for_write;
    $f->put_fib( $d->get_delta(-1) );
    print scalar $f->to_string, "\n";
  }

Using a custom encoding method

  use Data::BitStream;
  use Data::BitStream::Code::Baer;
  use Data::BitStream::Code::BoldiVigna;

  my $stream = Data::BitStream->new;
  Data::BitStream::Code::Baer->meta->apply($stream);
  Data::BitStream::Code::BoldiVigna->meta->apply($stream);

  $stream->put_baer(-1, 14);      # put 14 as a Baer c-1 code
  $stream->put_boldivigna(2, 7);  # put 7 as a Zeta(2) code

  $stream->rewind_for_read;
  my $v1 = $stream->get_baer(-1);
  my $v2 = $stream->get_boldivigna(2);

Not all codes are included by default, including the power-law codes of Michael Baer, the Zeta codes of Boldi and Vigna, and Escape codes. These, and any other codes write or acquire, can be incorporated using Moose::Meta::Role as shown above.

METHODS

CLASS METHODS

maxbits

Returns the number of bits in a word, which is the largest allowed size of the bits argument to read and write. This will be either 32 or 64.

OBJECT METHODS (reading)

These methods are only value while the stream is in reading state.

rewind

Moves the position to the stream beginning.

exhausted

Returns true is the stream is at the end. Rarely used.

read($bits [, 'readahead'])

Reads $bits from the stream and returns the value. $bits must be between 1 and maxbits.

The position is advanced unless the second argument is the string 'readahead'.

skip($bits)

Advances the position $bits bits. Used in conjunction with readahead.

read_string($bits)

Reads $bits bits from the stream and returns them as a binary string, such as '0011011'.

OBJECT METHODS (writing)

These methods are only value while the stream is in writing state.

write($bits, $value)

Writes $value to the stream using $bits bits. $bits must be between 1 and maxbits.

The length is increased by $bits bits.

Regardless of the contents of $value, exactly $bits bits will be used. If $value has more non-zero bits than $bits, the lower bits are written. In other words, $value will be masked before writing.

put_string(@strings)

Takes one or more binary strings, such as '1001101', '001100', etc. and writes them to the stream. The number of bits used for each value is equal to the string length.

put_stream($source_stream)

Writes the contents of $source_stream to the stream. This is a helper method that might be more efficient than doing it in one of the many other possible ways. The default implementation uses:

  $self->put_string( $source_stream->to_string );

OBJECT METHODS (conversion)

These methods may be called at any time, and will adjust the state of the stream.

to_string

Returns the stream as a binary string, e.g. '00110101'.

to_raw

Returns the stream as packed big-endian data. This form is portable to any other implementation on any architecture.

to_store

Returns the stream as some scalar holding the data in some implementation specific way. This may be portable or not, but it can always be read by the same implementation. It might be more efficient than the raw format.

from_string($string)

The stream will be set to the binary string $string.

from_raw($packed [, $bits])

The stream is set to the packed big-endian vector $packed which has $bits bits of data. If $bits is not present, then length($packed) will be used as the byte-length. It is recommended that you include $bits.

from_store($blob [, $bits])

Similar to from_raw, but using the value returned by to_store.

OBJECT METHODS (other)

pos

A read-only non-negative integer indicating the current position in a read stream. It is advanced by read, get, and skip methods, as well as changed by to, from, rewind, and erase methods.

len

A read-only non-negative integer indicating the current length of the stream in bits. It is advanced by write and put methods, as well as changed by from and erase methods.

writing

A read-only boolean indicating whether the stream is open for writing or reading. Methods for read such as read, get, skip, rewind, skip, and exhausted are not allowed while writing. Methods for write such as write and put are not allowed while reading.

The write_open and erase_for_write methods will set writing to true. The write_close and rewind_for_read methods will set writing to false.

The read/write distinction allows implementations more freedom in internal caching of data. For instance, they can gather writes into blocks. It also can be helpful in catching mistakes such as reading from a target stream.

erase

Erases all the data, while the writing state is left unchanged. The position and length will both be 0 after this is finished.

write_open

Changes the state to writing with no other API-visible changes.

write_close

Changes the state to reading, and the position is set to the end of the stream. No other API-visible changes happen.

erase_for_write

A helper function that performs erase followed by write_open.

rewind_for_read

A helper function that performs write_close followed by rewind.

OBJECT METHODS (coding)

All coding methods are biased to 0. This means values from 0 to 2^maxbits-1 (for universal codes) may be encoded, even if the original code as published starts with 1.

All get_ methods take an optional count as the last argument. If $count is 1 or not supplied, a single value will be read. If $count is positive, that many values will be read. If $count is negative, values are read until the end of the stream.

get_ methods called in list context this return a list of all values read. Called in scalar context they return the last value read.

put_ methods take one or more values as input after any optional parameters and write them to the stream. All values must be non-negative integers that do not exceed the maximum encodable value (~0 for universal codes, parameter-specific for others).

get_unary([$count])
put_unary(@values)

Reads/writes one or more values from the stream in 0000...1 unary coding. Unary coding is only appropriate for relatively small numbers, as it uses $value + 1 bits per value.

get_unary1([$count])
put_unary1(@values)

Reads/writes one or more values from the stream in 1111...0 unary coding.

get_binword($bits, [$count])
put_binword($bits, @values)

Reads/writes one or more values from the stream as fixed-length binary numbers, each using $bits bits.

get_gamma([$count])
put_gamma(@values)

Reads/writes one or more values from the stream in Elias Gamma coding.

get_delta([$count])
put_delta(@values)

Reads/writes one or more values from the stream in Elias Delta coding.

get_omega([$count])
put_omega(@values)

Reads/writes one or more values from the stream in Elias Omega coding.

get_levenstein([$count])
put_levenstein(@values)

Reads/writes one or more values from the stream in Levenstein coding (sometimes called Levenshtein or Левенште́йн coding).

get_evenrodeh([$count])
put_evenrodeh(@values)

Reads/writes one or more values from the stream in Even-Rodeh coding.

get_fib([$count])
put_fib(@values)

Reads/writes one or more values from the stream in Fibonacci coding. Specifically, the order m=2 C1 codes of Fraenkel and Klein.

get_fib_c2([$count])
put_fib_c2(@values)

Reads/writes one or more values from the stream in Fibonacci C2 coding. Specifically, the order m=2 C2 codes of Fraenkel and Klein. Note that these codes are not prefix-free, hence they will not mix well with other codes in the same stream.

get_golomb($m [, $count])
put_golomb($m, @values)

Reads/writes one or more values from the stream in Golomb coding.

get_golomb(sub { ... }, $m [, $count])
put_golomb(sub { ... }, $m, @values)

Reads/writes one or more values from the stream in Golomb coding using the supplied subroutine instead of unary coding, which can make them work with large outliers. For example to use Fibonacci coding for the base:

  $stream->put_golomb( sub {shift->put_fib(@_)}, $m, $value);

  $value = $stream->put_golomb( sub {shift->get_fib(@_)}, $m);
get_rice($k [, $count])
put_rice($k, @values)

Reads/writes one or more values from the stream in Rice coding, which is the time efficient case where m = 2^k.

get_rice(sub { ... }, $k [, $count])
put_rice(sub { ... }, $k, @values)

Reads/writes one or more values from the stream in Rice coding using the supplied subroutine instead of unary coding, which can make them work with large outliers. For example to use Omega coding for the base:

  $stream->put_rice( sub {shift->put_omega(@_)}, $k, $value);

  $value = $stream->put_rice( sub {shift->get_omega(@_)}, $k);
get_gammagolomb($m [, $count])
put_gammagolomb($m, @values)

Reads/writes one or more values from the stream in Golomb coding using Elias Gamma codes for the base. This is a convenience since they are common.

get_expgolomb($k [, $count])
put_expgolomb($k, @values)

Reads/writes one or more values from the stream in Rice coding using Elias Gamma codes for the base. This is a convenience since they are common.

get_startstop(\@m [, $count])
put_startstop(\@m, @values)

Reads/writes one or more values using Start/Stop codes. The parameter is an array reference which can be an anonymous array, for example:

  $stream->put_startstop( [0,3,2,0], @array );
  my @array2 = $stream->get_startstop( [0,3,2,0], -1);
get_startstepstop(\@m [, $count])
put_startstepstop(\@m, @values)

Reads/writes one or more values using Start-Step-Stop codes. The parameter is an array reference which can be an anonymous array, for example:

  $stream->put_startstepstop( [3,2,9], @array );
  my @array3 = $stream->get_startstepstop( [3,2,9], -1);

SEE ALSO

Data::BitStream::Base
Data::BitStream::WordVec
Data::BitStream::Code::Gamma
Data::BitStream::Code::Delta
Data::BitStream::Code::Omega
Data::BitStream::Code::Levenstein
Data::BitStream::Code::EvenRodeh
Data::BitStream::Code::Fibonacci
Data::BitStream::Code::Golomb
Data::BitStream::Code::Rice
Data::BitStream::Code::GammaGolomb
Data::BitStream::Code::ExponentialGolomb
Data::BitStream::Code::StartStop

AUTHORS

Dana Jacobsen <dana@acm.org>

COPYRIGHT

Copyright 2011 by Dana Jacobsen <dana@acm.org>

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 400:

Non-ASCII character seen before =encoding in 'Левенште́йн'. Assuming UTF-8