The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Set::IntSpan - Manages sets of integers

SYNOPSIS

    use Set::IntSpan;

    $set = new Set::IntSpan $set_spec;
    
    copy $set $set_spec;
    
    $Set::IntSpan::Empty_String = $string;
    $run_list   = run_list $set;
    @elements   = elements $set;
    
    $u_set = union      $set $set_spec;
    $i_set = intersect  $set $set_spec;
    $x_set = xor        $set $set_spec;
    $d_set = diff       $set $set_spec;
    $c_set = complement $set;
    
    equal       $set $set_spec;
    equivalent  $set $set_spec;
    superset    $set $set_spec;
    subset      $set $set_spec;
    
    $n = cardinality $set;
    
    empty       $set;
    finite      $set;
    neg_inf     $set;
    pos_inf     $set;
    infinite    $set;
    universal   $set;
    
    member      $set $n;
    insert      $set $n;
    remove      $set $n;

REQUIRES

Perl 5.001

EXPORTS

None

DESCRIPTION

Set::IntSpan manages sets of integers. It is optimized for sets that have long runs of consecutive integers. These arise, for example, in .newsrc files, which maintain lists of articles:

    alt.foo: 1-21,28,31
    alt.bar: 1-14192,14194,14196-14221

Sets are stored internally in a run-length coded form. This provides for both compact storage and efficient computation. In particular, set operations can be performed directly on the encoded representation.

Set::IntSpan is designed to manage finite sets. However, it can also represent some simple infinite sets, such as {x | x>n}. This allows operations involving complements to be carried out consistently, without having to worry about the actual value of MAXINT on your machine.

SET SPECIFICATIONS

Many of the methods take a set specification. There are four kinds of set specifications.

Empty

If a set specification is omitted, then the empty set is assumed. Thus,

    $set = new Set::IntSpan;

creates a new, empty, set. Similarly,

    copy $set;

removes all elements from $set.

Object reference

If an object reference is given, it is taken to be a Set::IntSpan object.

Array reference

If an array reference is given, then the elements of the array are taken to be the elements of the set. The array may contain duplicate elements. The elements of the array may be in any order.

Run list

If a string is given, it is taken to be a run list. A run list specifies a set using a syntax similar to that in .newsrc files.

A run list is a comma-separated list of runs. Each run specifies a set of consecutive integers. The set is the union of all the runs.

Runs may be written in any of several forms.

Finite forms

n

{ n }

a-b

{x | a<=x && x<=b}

Infinite forms

(-n

{x | x<=n}

n-)

{x | x>=n}

(-)

The set of all integers

Empty forms

The empty set is consistently written as '' (the null string). It is also denoted by the special form '-' (a single dash).

Restrictions

The runs in a run list must be disjoint, and must be listed in increasing order.

Valid characters in a run list are 0-9, '(', ')', '-' and ','. White space and underscore (_) are ignored. Other characters are not allowed.

Examples

  1. { 1 }

  2. 1-2

    { 1, 2 }

  3. -5--1

    { -5, -4, -3, -2, -1 }

  4. -

    { }

  5. (-)

    the integers

  6. (--1

    the negative integers

  7. 1-3, 4, 18-21

    { 1, 2, 3, 4, 18, 19, 20, 21 }

METHODS

Creation

new Set::IntSpan $set_spec;

Creates and returns a new set. The initial contents of the set are given by $set_spec.

copy $set $set_spec;

Copies $set_spec into $set. The previous contents of $set are lost. For convenience, copy() returns $set.

$run_list = run_list $set

Returns a run list that represents $set. The run list will not contain white space. $set is not affected.

By default, the empty set is formatted as '-'; a different string may be specified in $Set::IntSpan::Empty_String.

@elements = elements $set;

Returns an array containing the elements of $set. The elements will be sorted in numerical order. In scalar context, returns an array reference. $set is not affected.

Set operations

$u_set = union $set $set_spec;

returns the set of integers in either $set or $set_spec

$i_set = intersect $set $set_spec;

returns the set of integers in both $set and $set_spec

$x_set = xor $set $set_spec;

returns the set of integers in $set or $set_spec, but not both

$d_set = diff $set $set_spec;

returns the set of integers in $set but not in $set_spec

$c_set = complement $set;

returns the complement of $set.

For all set operations, a new Set::IntSpan object is created and returned. The operands are not affected.

Comparison

equal $set $set_spec;

Returns true if $set and $set_spec contain the same elements.

equivalent $set $set_spec;

Returns true if $set and $set_spec contain the same number of elements. All infinite sets are equivalent.

superset $set $set_spec

Returns true if $set is a superset of $set_spec.

subset $set $set_spec

Returns true if $set is a subset of $set_spec.

Cardinality

$n = cardinality $set

Returns the number of elements in $set. Returns -1 for infinite sets.

empty $set;

Returns true if $set is empty.

finite $set

Returns true if $set if finite.

neg_inf $set

Returns true if $set contains {x | x<n} for some n.

pos_inf $set

Returns true if $set contains {x | x>n} for some n.

infinite $set

Returns true if $set is infinite.

universal $set

Returns true if $set contains all integers.

Membership

member $set $n

Returns true if the integer $n is a member of $set.

insert $set $n

Inserts the integer $n into $set. Does nothing if $n is already a member of $set.

remove $set $n

Removes the integer $n from $set. Does nothing if $n is not a member of $set.

CLASS VARIABLES

$Set::IntSpan::Empty_String

$Set::IntSpan::Empty_String contains the string that is returned when run_list() is called on the empty set. $Empty_String is initially '-'; alternatively, it may be set to ''. Other values should be avoided, to ensure that run_list() always returns a valid run list.

run_list() accesses $Empty_String through a reference stored in $set->{empty_string}. Subclasses that wish to override the value of $Empty_String can reassign this reference.

DIAGNOSTICS

Any method will die() if it is passed an invalid run list. Possible messages are:

Bad syntax

$run_list has bad syntax

Bad order

$run_list has overlapping runs or runs that are out of order.

elements $set will die() if $set is infinite.

elements $set can generate an "Out of memory!" message on sufficiently large finite sets.

TESTING

To test IntSpan.pm, run it as a stand-alone perl program:

    %perl IntSpan.pm
    OK
    %

Normal output is "OK"; anything else indicates a problem. Add -v flags for verbose output; the more flags, the more output.

NOTES

Beware of forms like

    union $set [1..5];

This passes a slice of @set to union, which is probably not what you want. To force interpretation of $set and [1..5] as separate arguments, use forms like

    union $set +[1..5];

or

    $set->union([1..5]);
    

Calling elements() on a large, finite set can generate an "Out of memory!" message, which cannot be trapped. Applications that must retain control after an error can use intersect() to protect calls to elements():

    @elements = elements { intersect $set "-1_000_000 - 1_000_000" };

or check the size of $set first:

    cardinality $set < 2_000_000 and @elements = elements $set;

Although Set::IntSpan can represent some infinite sets, it does not perform infinite-precision arithmetic. Therefore, finite elements are restricted to the range of integers on your machine.

The sets implemented here are based on Macintosh data structures called "regions". See Inside Macintosh for more information.

AUTHOR

Steven McDougall <swm@cric.com>

COPYRIGHT

Copyright (c) 1996 Steven McDougall. All rights reserved. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

8 POD Errors

The following errors were encountered while parsing the POD:

Around line 134:

You forgot a '=back' before '=head2'

Around line 136:

'=item' outside of any '=over'

Around line 186:

Expected '=item 2'

Around line 190:

Expected '=item 3'

Around line 194:

Expected '=item 4'

Around line 198:

Expected '=item 5'

Around line 202:

Expected '=item 6'

Around line 206:

Expected '=item 7'