The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Statistics::CaseResampling - Efficient resampling

SYNOPSIS

  use Statistics::CaseResampling ':all';

  my $sample = [1,3,5,7,1,2,9];
  my $resampled = resample($sample);
  # $resampled is now a random set of measurements from $sample,
  # including potential duplicates
  
  my $medians = resample_medians($sample, $n_resamples);
  # $medians is not an array reference containing the medians
  # of $n_resamples resample runs
  # this is vastly more efficient that doing the same thing with
  # repeated resample() calls
  
  # utility function:
  print median([1..5]), "\n"; # prints 3

DESCRIPTION

This is a simple XS module for resampling a set of numbers efficiently. As a convenience (for my use case), it can calculate the medians (in O(n) using a selection algorithm) of many resamples and return those instead.

Since this involves drawing many random numbers, the module comes with an embedded Mersenne twister (taken from Math::Random::MT).

If you want to change the seed of the RNG, do this:

  $Statistics::CaseResampling::Rnd
    = Statistics::CaseResampling::RdGen::setup($seed);
 

or

  $Statistics::CaseResampling::Rnd
    = Statistics::CaseResampling::RdGen::setup(@seed);

Do not use the embedded random number generator for other purposes. Use Math::Random::MT instead!

EXPORT

None by default.

Can export any of the functions that are documented below using standard Exporter semantics, including the customary :all group.

FUNCTIONS

resample(ARRAYREF)

Returns a reference to an array containing N random elements from the input array, where N is the length of the original array.

median(ARRAYREF)

Calculates the median of a sample. Works in linear time thanks to using a selection instead of a sort. Unfortunately, the way this is implemented, the median of an even number of parameters is, here, defined as the n/2-1th largest number and not the average of the n/2-1th and the n/2th number. This shouldn't matter for nontrivial sample sizes.

resample_medians(ARRAYREF, NMEDIANS)

Returns a reference to an array containing the medians of NMEDIANS resamples of the original input sample.

TODO

One could calculate other statistics than the median in C for performance.

SEE ALSO

Math::Random::MT

AUTHOR

Steffen Mueller, <smueller@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2010 by Steffen Mueller

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.0 or, at your option, any later version of Perl 5 you may have available.