The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Statistics::WeightedSelection - Select a random object according to its weight.

VERSION

version 0.002

SYNOPSIS

    use Statistics::WeightedSelection;

    my $w = Statistics::WeightedSelection->new();

    # add some objects
    $w->add(
        object => 'string',
        weight => 4,
    );
    $w->add(
        object => {p => 1, q => 2},
        weight => 1,
    );
    $w->add(
        object => $any_scalar,
        weight => 7.5,
    );

    # get a random one based upon the individual weight relative to the
    #   combined weight and remove it from the pool for future selection
    #
    #   4 / 12.5 * 100 percent of the time, you'll get 'string'
    #   1 / 12.5 * 100 percent of the time, you'll get {p => 1, q => 2}
    # 7.5 / 12.5 * 100 percent of the time, you'll get $any_scalar
    my $object = $w->get();

    # because the last one was removed, the remaining objects are the new
    #   pool for calculating weights and probabilities
    my $another_object = $w->get();

    # get the number of objects remaining
    my $remaining_object_count = $w->count();

    # when constructed using replace_after_get and a true value, probababilities
    #   of being selected will remain constant, as after an item is selected,
    #   it is not removed from the pool.
    my $wr = Statistics::WeightedSelection->new(replace_after_get => 1);
    #...
    #...
    my $replaced_object = $wr->get();

DESCRIPTION

A Statistics::WeightedSelection object is intended to hold unordered objects (at least logically from the caller's perspective) that each have a corresponding weight. The objects can be any perl scalar or object, and the weights can be any positive integer or floating number.

At any time, an object can be retrieved from the pool. The probability of any object being selected corresponds to its weight divided by the combined weight of all the objects currently in the container.

Objects that are no longer desired to be in the pool can be removed, and an id can be assigned to any of the items to ease in this later removal.

CAVEATS

An intentional design decision was to use a simple blessed hash to represent the internals of the object, with few direct accessors. The internal _dump() method could be used to see them (with the understanding that internals can change and should not be relied upon in production code), but individual items should not be directly manipulated, and if they are, there's no guarantee of your success.

Adding and manual deletion should be done through the appropriate methods, add() and remove(), respectively.

I partially did this for speed reasons, and partially to protect people from accidental mishaps. Perhaps I could be persuaded to do so with a sufficiently reasonable argument.

METHODS

CONSTRUCTOR - new()

To create a new cache object, call Statistics::WeightedSelection->new. It takes the optional arguments listed below.

replace_after_get (optional)

This single configuration, when true, will not remove the object selected from the pool after a call to get();

    # replace the object selected with the same object, i.e. don't remove it.
    my $w = Statistics::WeightedSelection->new(replace_after_get => 1);

add()

This method is used to add an object and weight to the objects for possible future selection. Two required and one optional arg are described below.

object (required)

The object. Any scalar will do: string, arrayref, hashref, blessed scalar or otherwise.

weight (required)

The weight. Integer or float/decimal. Must be greater than 0. This arbitrary number when divided by the total combined weights of the object is the probability that it will be selected on the next call to get().

id (optional)

This is an id that can be used to remove() items later, if desired. It is not required, and the value, if not passed, will default to a serialized version of the object passed (see above).

get()

Selects an object from the bucket / pool / container randomly, with probabilities of being picked for each item equal to its weight divided by the combined weights.

By default, the object is removed without replacement. replace_after_get() will be called during the course of get(), and if it returns a true value, the item will not be removed.

Returns the randomly selected object.

Takes no arguments.

remove()

Items that were previously added using add() can be removed from future selection. Either objects that are equivalent (not necessarily a ref to the same object in the container, but one that after serialization is equivalent), or ones that match an id (which was an optional arg for add()) will all be removed.

Returns the removed scalars.

clear()

Removes all items from the selection pool. A call to get() immediately afterward will return nothing.

count()

The current count of objects that are in the selection pool. It should be noted that sometimes, the same scalar might have been added multiple times with calls to add(), and that those separate instances are all counted separately.

replace_after_get()

Returns whether or not a future call to get() will replace the object (i.e. not remove it). If true, the object will not be removed. If false, the object will be removed.

The default behavior, if nothing was passed to the constructor, is to have this return false.

replace_after_get(<new_value>)

If replace_after_get() is called with a defined value, this will override the value passed to the constructor (new()), and subsequent calls to replace_after_get() will return this new value.

This sets whether or not an object will be removed from the pool after selection, i.e. a call to get(). If this is truthy, it will remain after a call to get(), and if false, it will not.

ACKNOWLEDGEMENTS

The ideas encapsulated in this module were created while I was working at Rent.com, a RentPath company. Rent.com has supported me the whole way in releasing this module, and they have fostered an openness in not only utilizing open community tools, but contributing to them, as well.

I'd also like to thank an organization and a few individuals for their contributions:

YAPC 2014 in Orlando, Florida

The conference that finally pushed me to finish this module and make it available.

Ripta Pasay

My manager (and brilliant developer) at Rent, who helped ask the appropriate management at our company about releasing this module without specific, formal policies. He also helped me vet the algorithm and test for problems in randomness on initial and subsequent selections.

Aran Deltac

Former Rent.com employee who helped by allowing me to bounce ideas for names and interface of this module, and also to help me search for modules that might have already been written to accomplish a similar purpose.

Steve Nolte

Head hauncho of Milwaukee PM who helped steer me in the direction of how to package and manage this module for release.

Steven Lembark

For discussing namespaces and name ideas with a total stranger. He really is a testament to how helpful people in the Perl community can be.

Sawyer X

More discussion of namespaces, and helping to guide me in to whom to talk about such things for further ideas.

Adam Dutko

For giving a talk at YAPC to discuss issues about making a module and getting it ready for release on CPAN.

LICENSE

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

AUTHOR

Alan Voss <alanvoss@hotmail.com>