# NAME

Data::Bvec - a module to manipulate integer arrays as bit vectors and "compressed bit strings" (using a simple RLE).

# VERSION

VERSION: 1.01

# SYNOPSIS

```
use Data::Bvec;
my $bv = Data::Bvec::->new( nums=>[1,2,3] );
my $vec = $bv->get_bvec(); # 01110000
my $bstr = $bv->get_bstr(); # '-134'
my $nums = $bv->get_nums(); # [1,2,3]
----
use Data::Bvec qw( :all );
my $vec = num2bit( [1,2,3] ); # 01110000
set_bit( $vec, 4, 1 ); # 01111000
my $bstr = compress bit2str $vec; # '-143'
my $nums = bit2num str2bit uncompress $bstr; # [1,2,3,4]
```

# DESCRIPTION/DISCUSSION

This module encapsulates some simple routines for manipulating Perl bit vectors (putting values in; getting values out), but its main goal is to implement a simple run-length encoding scheme for bit vectors that compresses them into relatively human-readable and flat-file-storable strings.

My use case was wanting to prototype a data indexing system, and I wanted to ease debugging by plopping the bitstrings in a flat file that I could examine directly. (Each bit in a vector represents a record in the database -- true or false whether the term is in that record in the field being indexed.) It has worked well enough that I haven't felt the need to change how the bitstrings are stored (just where they're stored).

The initial version of the module used a different set of base-62 digits. In writing Math::Int2Base, I decided to normalize all the bases from 2 to 62 to use 0-9,A-Z,a-z. It makes the numbers sort correctly (ascii-betically == numerically), and it let me say that A base-16 == A base-36 == A base-62.

So now I'm rewriting this module to use those base conversion routines.

# EXPORTS

Nothing is exported by default. The following may be exported individually; all of them may be exported using the `:all`

tag:

```
- set_bit
- howmany
- bit2str
- str2bit
- bit2num
- num2bit
- compress
- uncompress
```

Examples:

```
use Data::Bvec qw( :all );
use Data::Bvec qw( bit2str str2bit compress uncompress );
```

However, if you only use the object methods, nothing would need to be exported. See below.

# SUBROUTINES

## set_bit( $vec, $num, $zero_or_one )

This is a shallow wrapper around Perl's vec() that simply provides the third parameter (1) to that routine that says we're working with a bit vector.

Normally returns $num, if you care.

### Parameters:

#### $vec

A Perl bit vector stored in the scalar.

#### $num

The number whose bit you want to target in the bit vector.

#### $zero_or_one

The value you want to set the bit to: 0 or 1. If not defined, 1 is assumed.

Examples:

```
my $vec = ""; # empty vector
set_bit $vec, 1, 1; # 01000000
set_bit $vec, 2, 1; # 01100000
set_bit $vec, 3; # 01110000
set_bit $vec, 1, 0; # 00110000
```

## bit2str( $vec )

This routine is a shallow wrapper around unpack() that unpacks a bit vector into a string of '0's and '1's, in preparation for compression.

### Parameters:

#### $vec

A Perl bit vector.

Example:

```
my $vec = "";
set_bit $vec, 4, 1; # 00001000
my $str = bit2str $vec; # '00001000'
```

## str2bit( $str )

This routine is a shallow wrapper around pack() that packs a string of '0's and '1's (following uncompression) into a bit vector.

### Parameters:

#### $str

A string of '0's and '1's, e.g., "00001000".

Example:

` my $vec = str2bit '00001000';`

## num2bit( \@integers )

This routine accepts an array ref of integers and returns a bit vector with those integer's bits turned on.

### Parameters:

#### \@integers

A reference to an array of integers.

Examples:

```
my $vec = num2bit [1,2,3]; # 01110000
my $vec = num2bit [3,2,1]; # 01110000
```

The second example is intended to make clear that the order of the integers in the array is not retained (for obvious reasons), and calling bit2num( $vec ) will always return the integers in ascending order (see bit2num() below).

## bit2num( $vec, $beg, $cnt )

This routine accepts a bit vector and returns an array of integers represented by the 1 bits.

The parameters $beg and $cnt are to support retrieving subsets of integers from a large vector -- in essence, to support "paging" through the set.

In scalar context, returns a reference to the array.

### Parameters:

#### $vec

A bit vector.

#### $beg

The first integer (where the bit is 1) to return. Unlike array subscripts, the $beg positions start with 1, not 0.

#### $cnt

The maximum number of integers (including the first) to return.

Examples:

```
# 0----+----1----+----2----+----3-
my $vec = str2bit '01110011110001111100001111110001';
my $set1 = bit2num $vec, 1, 5; # [ 1, 2, 3, 6, 7 ]
my $set2 = bit2num $vec, 6, 5; # [ 8, 9, 13, 14, 15 ]
my $set3 = bit2num $vec, 11, 5; # [ 16, 17, 22, 23, 24 ]
my $set4 = bit2num $vec, 16, 5; # [ 25, 26, 27, 31 ]
```

## compress( $str )

This routine takes a string of '0's and '1's and compresses it using a simple run-length encoding (RLE). It returns this "compressed bit string".

### Parameters:

#### $str

A string of '0's and '1's, e.g., "01110".

Note: the length of the string need not be a multiple of 8.

Example:

```
my $bstr;
$bstr = compress '01110000'; # '-134'
my $str = ('1'x100).('0'x30).('1'x6);
$bstr = compress $str; # '+@1cU6'
```

### Compression Scheme

The compression scheme counts the number of consecutive '0's and '1's and concatenates that count (in base-62) to the compressed bit string.

If the first bit is '0', the compressed bit string begins with '-'. If the first bit is '1', it begins with '+'. The digit following that represents that many of those bits. The next digit represents that many of the "other" bits, and so on. (A "digit" matches /[0-9A-Za-z]/.)

So in the first example, '-134' means 1 '0' bit, then 3 '1' bits, then 4 '0' bits, i.e., '01110000'.

The second example includes a 2-digit number, 1c base-62 (100 decimal, as defined by Math::Int2base).

Any multi-digit number is preceded by a non-digit:

```
'@' for a 2-digit number
'#' for 3 digits
'$' for 4 digits
'%' for 5 digits, and
'^' for 6 digits
```

(Mnemonic: look above the numbers on a qwerty keyboard. A 6-digit number will accommodate 32,590,299,105 consecutive bits. If you need more than that, let me know.)

So '+@1cU6' means 1c (100) '1' bits, then U (30) '0' bits, then 6 '1' bits.

## uncompress( $bstr )

This routine uncompresses a compressed bit string (which would have been compressed by the compress() routine above).

It returns a string of '0's and '1's. This string will (normally) then be converted to a bit vector using str2bit() above.

### Parameters:

#### $bstr

A compressed bit string (see compress() above).

Example:

```
my $bstr = '-134';
my $str = uncompress $bstr; # '01110000'
```

## howmany( $vec, $zero_or_one )

This routine returns a count of the 0 or 1 bits in a bit vector.

### Parameters:

#### $vec

A bit vector.

#### $zero_or_one

The value you want a count of: 0 or 1. Defaults to 1 if not given.

Examples:

```
my $vec = str2bit '01010010';
my $ones_count = howmany $vec; # 3
my $zeros_count = howmany $vec, 0; # 5
```

Note that howmany( $vec, 0 ) will include trailing zero bits.

# METHODS

## new()

This constructs a Data::Bvec object. Each object represents a single array of integers stored either as a bit vector, a compressed bit string, or an array.

### Parameters:

All parameters to new() are named.

#### bvec=>$bit_vector

Stores a Perl bit vector in the object.

```
my $vec = str2bit '01110011110001111100001111110001';
my $bv = Data::Bvec::->new( bvec => $vec );
```

#### bstr=>$compressed_bit_string

Stores a compressed bit string in the object.

```
my $bstr = compress bit2str $vec;
my $bv = Data::Bvec::->new( bstr => $bstr );
```

#### nums=>\@integers

Stores an array of integers in the object. The order of the array is retained when stored.

```
my $nums = bit2num $vec;
my $bv = Data::Bvec::->new( nums => $nums );
```

#### bvec2nums=>$bit_vector

Accepts a bit vector and stores it as an array of integers (as $self->{nums}).

` my $bv = Data::Bvec::->new( bvec2nums => $vec );`

#### nums2bvec=>\@integers

Accepts an array of integers and stores it as a bit vector (as $self->{bvec}). The order of the array is not retained.

` my $bv = Data::Bvec::->new( nums2bvec => $nums );`

#### bvec2bstr=>$bit_vector

Accepts a bit vector and stores it as a compressed bit string (as $self->{bstr}).

` my $bv = Data::Bvec::->new( bvec2bstr => $vec );`

#### bstr2bvec=>$compressed_bit_string

Accepts a compressed bit string and stores it as a bit vector (as $self->{bvec}).

` my $bv = Data::Bvec::->new( bstr2bvec => $bstr );`

#### bstr2nums=>$compressed_bit_string

Accepts a compressed bit string and stores it as an array of integers (as $self->{nums}).

` my $bv = Data::Bvec::->new( bstr2nums => $bstr );`

#### nums2bstr=>\@integers

Accepts an array of integers and stores it as a compressed bit string (as $self->{bstr}). The order of the array is not retained.

` my $bv = Data::Bvec::->new( nums2bstr => $nums );`

## get_bvec()

This routine takes no parameters. It returns a bit vector, regardless how the integers are stored. The object is not changed.

` my $vec = $bv->get_bvec();`

## get_bstr()

This routine takes no parameters. It returns a compressed bit string, regardless how the integers are stored. The object is not changed.

` my $bstr = $bv->get_bstr();`

## get_nums( $beg, $cnt )

This routine returns an array of integers, regardless how the integers are stored. The object is not changed.

Note that if the integers are stored as 'nums' (an array), get_nums() will return them in the same order as the array. If they are stored another way, they will be returned in ascending order.

```
my @integers = $bv->get_nums(); # list returned in list context
my $ints = $bv->get_nums(); # aref returned in scalar context
```

### Parameters:

#### $beg

The first integer to return. Unlike array subscripts, the $beg positions start with 1, not 0. If no $beg is given, 1 is assumed.

#### $cnt

The maximum number of integers (including the first) to return. If no $cnt is given, the rest of the integers are returned.