Ron Savage

NAME

Set::Array - Arrays as objects with lots of handy methods

SYNOPSIS

my $sao1 = Set::Array->new(1,2,4,"hello",undef);

my $sao2 = Set::Array->new(qw(a b c a b c));

print $sao1->length; # prints 5

$sao2->unique->length->print; # prints 3

PREREQUISITES

Perl 5.6 or later

The 'Want' module by Robin Houston. Available on CPAN.

DESCRIPTION

Set::Array allows you to create arrays as objects and use OO-style methods on them. Many convenient methods are provided here that appear in the FAQs, the Perl Cookbook or posts from comp.lang.perl.misc. In addition, there are Set methods with corresponding (overloaded) operators for the purpose of Set comparison, i.e. +, ==, etc.

The purpose is to provide built-in methods for operations that people are always asking how to do, and which already exist in languages like Ruby. This should (hopefully) improve code readability and/or maintainability. The other advantage to this module is method-chaining by which any number of methods may be called on a single object in a single statement.

OBJECT BEHAVIOR

The exact behavior of the methods depends largely on the calling context.

Here are the rules:

* If a method is called in void context, the object itself is modified.

* If the method called is not the last method in a chain (i.e. it is called in object context), the object itself is modified by that method regardless of the 'final' context or method call.

* If a method is called in list or scalar context, a list or list refererence is returned, respectively. The object itself is NOT modified.

Here is a quick example:

my $sao = Set::Array->new(1,2,3,2,3);

my @uniq = $sao->unique(); # Object unmodified. '@uniq' contains 3 values.

$sao->unique(); # Object modified, now contains 3 values

Here are the exceptions:

* Methods that report a value, such as boolean methods like exists() or other methods such as at() or as_hash(), never modify the object.

* The methods clear(), delete(), delete_at(), and splice will always modify the object. It seemed much too counterintuitive to call these methods in any context without actually deleting/clearing/substituting the items!

* The methods shift() and pop() will modify the object AND return the value that was shifted or popped from the array. Again, it seemed much too counterintuitive for something like $val = $sao->shift to return a value while leaving the object unchanged. If you really want the first or last value without modifying the object, you can always use the first() or last() method, respectively.

* The methods cshift() and cpop() (for chainable-shift and chainable-pop) will modify the object and return the object. I.e. the value shifted or popped is discarded. See the docs below or the code at the end of t/test.t for examples.

* The join() method always returns a string and is really meant for use in conjunction with the print() method.

BOOLEAN METHODS

In the following sections, the brackets in [val] indicate that val is a optional parameter.

exists([val])

Returns 1 if val exists within the array, 0 otherwise.

If no value (or undef) is passed, then this method will test for the existence of undefined values within the array.

is_empty()

Returns 1 if the array is empty, 0 otherwise. Empty is defined as having a length of 0.

STANDARD METHODS

at(index)

Returns the item at the given index (or undef).

A negative index may be used to count from the end of the array.

If no value (or undef) is specified, it will look for the first item that is not defined.

bag($other_set, $reverse)

Returns the union of both sets, including duplicates (i.e. everything).

Setting $reverse to 1 reverses the sets as the first step in the method.

Note: It does not reverse the contents of the sets.

See "General Notes" for the set of such methods, including a list of overloaded operators.

clear([1])

Empties the array (i.e. length becomes 0).

You may pass a 1 to this method to set each element of the array to undef rather than truly empty it.

compact()

o In scalar context

Returns an array ref of defined items.

The object is not modified.

o In list context

Returns an array of defined items.

The object is not modified.

o In chained context

Returns the object.

The object is modified if it contains undefined items.

count([val])

Returns the number of instances of val within the array.

If val is not specified (or is undef), the method will return the number of undefined values within the array.

cpop()

The 'c' stands for 'chainable' pop.

Removes and discards the last element of the array.

Returns the object.

        Set::Array -> new(1, 2, 3, 4, 5) -> cpop -> join -> print;

prints 1,2,3,4.

See also cshift(), pop() and shift().

cshift()

The 'c' stands for 'chainable' shift.

Removes and discards the first element of the array.

Returns the object.

        Set::Array -> new(1, 2, 3, 4, 5) -> cshift -> join -> print;

prints 2,3,4,5.

See also cpop(), pop() and shift().

delete(@list)

Deletes all items within the object that match @list.

This method will die if @list is not defined.

If your goal is to delete undefined values from your object, use the "compact()" method instead.

This method always modifies the object, if elements in @list match elements in the object.

o In scalar context

Returns an array ref of unique items.

o In list context

Returns an array of unique items.

o In chained context

Returns the object.

delete_at(index, [index])

Deletes the item at the specified index.

If a second index is specified, a range of items is deleted.

You may use -1 or the string 'end' to refer to the last element of the array.

difference($one, $two, $reverse)

Returns all elements in the left set that are not in the right set.

Setting $reverse to 1 reverses the sets as the first step in the method.

Note: It does not reverse the contents of the sets.

See "General Notes" for the set of such methods, including a list of overloaded operators.

Study the sample code below carefully, since all of $set1, $set8 and $set9 get changed, perhaps when you were not expecting them to be.

There is a problem however, with 2 bugs in the Want module (V 0.20), relating to want('OBJECT') and wantref() both causing segfaults.

So, I have used Try::Tiny to capture a call to want('OBJECT') in sub difference().

If an error is thrown, I just ignore it. This is horribly tacky, but after waiting 7 years (it is now 2012-03-07) I have given up on expecting patches to Want.

Sample code:

        #!/usr/bin/env perl

        use strict;
        use warnings;

        use Set::Array;

        # -------------

        my($set1) = Set::Array -> new(qw(abc def ghi jkl mno) );
        my($set8) = Set::Array -> new(@$set1);           # Duplicate for later.
        my($set9) = Set::Array -> new(@$set1);           # Duplicate for later.
        my($set2) = Set::Array -> new(qw(def jkl pqr));
        my($set3) = $set1 - $set2;                       # Changes $set1. $set3 is a set.
        my($set4) = Set::Array -> new(@{$set8 - $set2}); # Changes $set8. $set4 is a set.
        my(@set5) = $set9 -> difference($set2);          # Changes $set9. $set5 is an array.

        print '1: ', join(', ', @$set3), ". \n";
        print '2: ', join(', ', @{$set4 -> print}), ". \n";
        print '3: ', join(', ', $set4 -> print), ". \n";
        print '4: ', join(', ', @set5), ". \n";

The last 4 lines all produce the same, correct, output, so any of $set3, $set4 or $set5 is what you want.

See t/difference.pl.

duplicates()

Returns a list of N-1 elements for each element which appears N times in the set.

For example, if you have set "X X Y Y Y", this method would return the list "X Y Y".

If you want the output to be "X Y", see "unique()".

o In scalar context

Returns an array ref of duplicated items.

The object is not modified.

o In list context

Returns an array of duplicated items.

The object is not modified.

o In chained context

Returns the object.

The object is modified if it contains duplicated items.

fill(val, [start], [length])

Sets the selected elements of the array (which may be the entire array) to val.

The default value for start is 0.

If length is not specified the entire array, however long it may be, will be filled.

A range may also be used for the start parameter. A range must be a quoted string in '0..999' format.

E.g. $sao->fill('x', '3..65535');

The array length/size may not be expanded with this call - it is only meant to fill in already-existing elements.

first()

Returns the first element of the array (or undef).

flatten()

Causes a one-dimensional flattening of the array, recursively.

That is, for every element that is an array (or hash, or a ref to either an array or hash), extract its elements into the array.

E.g. my $sa = Set::Array->new([1,3,2],{one=>'a',two=>'b'},x,y,z);

$sao->flatten->join(',')->print; # prints "1,3,2,one,a,two,b,x,y,z"

foreach(sub ref)

Iterates over an array, executing the subroutine for each element in the array.

If you wish to modify or otherwise act directly on the contents of the array, use $_ within your sub reference.

E.g. To increment all elements in the array by one...

$sao->foreach(sub{ ++$_ });

get()

This is an alias for the indices() method.

index(val)

Returns the index of the first element of the array object that contains val.

Returns undef if no value is found.

Note that there is no dereferencing here so if you are looking for an item nested within a ref, use the flatten method first.

indices(val1, [val2], [valN])

Returns an array consisting of the elements at the specified indices, or undef if the element is out of range.

A range may also be used for each of the <valN> parameters. A range must be a quoted string in '0..999' format.

intersection($other_set)

Returns all elements common to both sets.

Note: It does not eliminate duplicates. Call "unique()" if that is what you want.

You are strongly encouraged to examine line 19 of both t/intersection.1.pl and t/intersection.2.pl.

Setting $reverse to 1 reverses the sets as the first step in the method.

Note: It does not reverse the contents of the sets.

See "General Notes" for the set of such methods, including a list of overloaded operators.

is_equal($other_set)

Tests to see if the 2 sets are equal (regardless of order). Returns 1 for equal and 0 for not equal.

Setting $reverse to 1 reverses the sets as the first step in the method.

Since order is ignored, this parameter is irrelevant.

Note: It does not reverse the contents of the sets.

See "General Notes" for the set of such methods, including a list of overloaded operators.

See also "not_equal($other_set)".

join([string])

Joins the elements of the list into a single string with the elements separated by the value of string.

Useful in conjunction with the print() method.

If no string is specified, then string defaults to a comma.

e.g. $sao->join('-')->print;

last()

Returns the last element of the array (or undef).

length()

Returns the number of elements within the array.

max()

Returns the maximum value of an array.

No effort is made to check for non-numeric data.

new()

This is the constructor.

See "difference($one, $two, $reverse)" for sample code.

See also "flatten()" for converting arrayrefs and hashrefs into lists.

not_equal($other_set)

Tests to see if the 2 sets are not equal (regardless of order). Returns 1 for not equal and 0 for equal.

Setting $reverse to 1 reverses the sets as the first step in the method.

Since order is ignored, this parameter is irrelevant.

Note: It does not reverse the contents of the sets.

See "General Notes" for the set of such methods, including a list of overloaded operators.

See also "is_equal($other_set)".

pack(template)

Packs the contents of the array into a string (in scalar context) or a single array element (in object or void context).

pop()

Removes the last element from the array.

Returns the popped element.

See also cpop(), cshift() and shift().

Prints the contents of the array.

If a 1 is provided as an argument, the output will automatically be terminated with a newline.

This also doubles as a 'contents' method, if you just want to make a copy of the array, e.g. my @copy = $sao->print;

Can be called in void or list context, e.g.

$sao->print(); # or... print "Contents of array are: ", $sao->print();

push(list)

Adds list to the end of the array, where list is either a scalar value or a list.

Returns an array or array reference in list or scalar context, respectively.

Note that it does not return the length in scalar context. Use the length method for that.

reverse()

o In scalar context

Returns an array ref of the items in the object, reversed.

The object is not modified.

o In list context

Returns an array of the items in the object, reversed.

The object is not modified.

o In chained context

Returns the object.

The object is modified, with its items being reversed.

rindex(val)

Similar to the index() method, except that it returns the index of the last val found within the array.

Returns undef if no value is found.

set(index, value)

Sets the element at index to value, replacing whatever may have already been there.

shift()

Shifts off the first element of the array and returns the shifted element.

See also cpop(), cshift() and pop().

sort([coderef])

Sorts the contents of the array in alphabetical order, or in the order specified by the optional coderef.

o In scalar context

Returns an array ref of the items in the object, sorted.

The object is not modified.

o In list context

Returns an array of the items in the object, sorted.

The object is not modified.

o In chained context

Returns the object.

The object is modified by sorting its items.

Use your standard $a and $b variables within your sort sub:

Program:

        #!/usr/bin/env perl

        use Set::Array;

        # -------------

        my $s = Set::Array->new(
                { name => 'Berger', salary => 15000 },
                { name => 'Berger', salary => 20000 },
                { name => 'Vera', salary => 25000 },
        );

        my($subref) = sub{ $b->{name} cmp $a->{name} || $b->{salary} <=> $a->{salary} };
        my(@h)      = $s->sort($subref);

        for my $h (@h)
        {
                print "Name: $$h{name}. Salary: $$h{salary}. \n";
        }

Output (because the sort subref puts $b before $a for name and salary):

        Name: Vera. Salary: 25000.
        Name: Berger. Salary: 20000.
        Name: Berger. Salary: 15000.

splice([offset], [length], [list])

Splice the array starting at position offset up to length elements, and replace them with list.

If no list is provided, all elements are deleted.

If length is omitted, everything from offset onward is removed.

Returns an array or array ref in list or scalar context, respectively.

This method always modifies the object, regardless of context.

If your goal was to grab a range of values without modifying the object, use the indices method instead.

unique()

Returns a list of 1 element for each element which appears N times in the set.

For example, if you have set "X X Y Y Y", this method would return the list "X Y".

If you want the output to be "X Y Y", see "duplicates()".

o In scalar context

Returns an array ref of unique items.

The object is not modified.

o In list context

Returns an array of unique items.

The object is not modified.

o In chained context

Returns the object.

The object is modified if it contains duplicated items.

unshift(list)

Prepends a scalar or list to array.

Note that this method returns an array or array reference in list or scalar context, respectively.

It does not return the length of the array in scalar context. Use the length method for that.

ODDBALL METHODS

as_hash([$option])

Returns a hash based on the current array, with each even numbered element (including 0) serving as the key, and each odd element serving as the value.

This can be switched by using $option, and setting it to odd, in which case the even values serve as the values, and the odd elements serve as the keys.

The default value of $option is even.

Of course, if you do not care about insertion order, you could just as well do something like, $sao->reverse->as_hash;

This method does not actually modify the object itself in any way. It just returns a plain hash in list context or a hash reference in scalar context. The reference is not blessed, therefore if this method is called as part of a chain, it must be the last method called.

$option can be specified in various ways:

undef

When you do not supply a value for this parameter, the default is even.

'odd' or 'even'

The value may be a string.

This possibility was added in V 0.18.

This is now the recommended alternative.

{key_option => 'odd'} or {key_option => 'even'}

The value may be a hash ref, with 'key_option' as the hash key.

This possibility was added in V 0.18.

(key_option => 'odd') or (key_option => 'even')

The value may be a hash, with 'key_option' as the hash key.

This was the original (badly-documented) alternative to undef, and it still supported in order to make the code backwards-compatible.

impose([append/prepend], string)

Appends or prepends the specified string to each element in the array.

Specify the method with either 'append' or 'prepend'.

The default is 'append'.

randomize()

Randomizes the order of the elements within the array.

rotate(direction)

Moves the last item of the list to the front and shifts all other elements one to the right, or vice-versa, depending on what you pass as the direction - 'ftol' (first to last) or 'ltof' (last to first).

The default is 'ltof'.

e.g. my $sao = Set::Array->new(1,2,3);

$sao->rotate(); # order is now 3,1,2

$sao->rotate('ftol'); # order is back to 1,2,3

to_hash()

This is an alias for as_hash().

OVERLOADED (COMPARISON) OPERATORS

General Notes

For overloaded operators you may pass a Set::Array object, or just a normal array reference (blessed or not) in any combination, so long as one is a Set::Array object. You may use either the operator or the equivalent method call.

Warning: You should always experiment with these methods before using them in production. Why? Because you may have unrealistic expectations that they automatially eliminate duplicates, for example. See the "FAQ" for more.

Examples (using the '==' operator or 'is_equal' method):

my $sao1 = Set::Array->new(1,2,3,4,5);

my $sao2 = Set::Array->new(1,2,3,4,5);

my $ref1 = [1,2,3,4,5];

if($sao1 == $sao2)... # valid

if($sao1 == $ref1)... # valid

if($ref1 == $sao2)... # valid

if($sao1->is_equal($sao2))... # valid

if($sao1->is_equal($ref1))... # valid

All of these operations return either a boolean value (for equality operators) or an array (in list context) or array reference (in scalar context).

& or bag - The union of both sets, including duplicates.

- or difference - Returns all elements in the left set that are not in the right set. See "difference($one, $two)" for details.

== or is_equal - This tests for equality of the content of the sets, though ignores order. Thus, comparing (1,2,3) and (3,1,2) will yield a true result.

!= or not_equal - Tests for inequality of the content of the sets. Again, order is ignored.

* or intersection - Returns all elements that are common to both sets.

Be warned that that line says 'all elements', not 'unique elements'. You can call "unique" is you need just the unique elements.

See t/intersection.*.pl for sample code with and without calling unique().

% or symmetric_difference or symm_diff - Returns all elements that are in one set or the other, but not both. Opposite of intersection.

+ or union - Returns the union of both sets. Duplicates excluded.

FAQ

Why does the intersection() method include duplicates in the output?

Because it is documented to do that. The docs above say:

"Returns all elements that are common to both sets.

Be warned that that line says 'all elements', not 'unique elements'. You can call "unique()" is you need just the unique elements."

Those statements means what they says!

See t/intersection.*.pl for sample code with and without calling unique().

The following section, EXAMPLES, contains other types of FAQ items.

EXAMPLES

For our examples, I will create 3 different objects

my $sao1 = Set::Array->new(1,2,3,a,b,c,1,2,3);

my $sao2 = Set::Array->new(1,undef,2,undef,3,undef);

my $sao3 = Set::Array->new(1,2,3,['a','b','c'],{name=>"Dan"});

How do I...

get the number of unique elements within the array?

$sao1->unique()->length();

count the number of non-undef elements within the array?

$sao2->compact()->length();

count the number of unique elements within an array, excluding undef?

$sao2->compact()->unique()->length();

print a range of indices?

$sao1->indices('0..2')->print();

test to see if two Set::Array objects are equal?

if($sao1 == $sao2){ ... }

if($sao1->is_equal($sao2){ ... } # Same thing

fill an array with a value, but only if it is not empty?

if(!$sao1->is_empty()){ $sao1->fill('x') }

shift an element off the array and return the shifted value?

my $val = $sao1->shift())

shift an element off the array and return the array?

my @array = $sao1->delete_at(0)

flatten an array and return a hash based on now-flattened array?, with odd elements as the key?

my %hash = $sao3->flatten()->reverse->as_hash();

delete all elements within an array?

$sao3->clear();

$sao3->splice();

modify the object AND assign a value at the same time?

my @unique = $sao1->unique->print;

KNOWN BUGS

There is a bug in the Want-0.05 module that currently prevents the use of most of the overloaded operators, though you can still use the corresponding method names. The equality operators == and != should work, however.

There are still bugs in Want V 0.20. See the discussion of "difference($one, $two)" for details.

FUTURE PLANS

Anyone want a built-in 'permute()' method?

I am always on the lookout for faster algorithms. If you heve looked at the code for a particular method and you know of a faster way, please email me. Be prepared to backup your claims with benchmarks (and the benchmark code you used). Tests on more than one operating system are preferable. No, map is not always faster - foreach loops usually are in my experience.

More flexibility with the foreach method (perhaps with iterators?).

More tests.

THANKS

Thanks to all the kind (and sometimes grumpy) folks at comp.lang.perl.misc who helped me with problems and ideas I had.

Thanks also to Robin Houston for the 'Want' module! Where would method chaining be without it?

AUTHOR

Original author: Daniel Berger djberg96 at hotmail dot com imperator on IRC (freenode)

Maintainer since V 0.12: Ron Savage <ron@savage.net.au> (in 2005).

Home page: http://savage.net.au/index.html




Hosting generously
sponsored by Bytemark