The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Role::TinyCommons::Collection::PickItems::RandomSeekLines - Provide pick_items() that picks items by random seeking lines in a (file)handle

VERSION

This document describes version 0.009 of Role::TinyCommons::Collection::PickItems::RandomSeekLines (from Perl distribution RoleBundle-TinyCommons-Collection), released on 2021-10-07.

DESCRIPTION

This role provides pick_items() that picks random items by seeking lines in a seekable filehandle. Your class must support these methods to expose the seekable handle: fh (and optionally fh_min_offset and fh_max_offset) (if your collection does not meet this requirement, there are other choices in Role::TinyCommons::Collection::PickItems::*).

The algorithm is as follow:

  1. If fh_min_offset and fh_max_offset is not available, then do a stat() on the handle to find the size ($size).

  2. Seek to a random position in the handle (if fh_min_offset and fh_max_offset is available, then seek between these limits; otherwise seek between 0 and $size.

  3. If we seek to the minimum position (0 or fh_min_offset), we find the next newiine and get the line as the random item to pick. Otherwise, since we might seek to the middle of a line, we find the next newline and discard the partial line first, then get the next line as the random item to pick.

  4. Remove duplicates as needed (unless pick_items()'s allow_resampling option is set to true). Repeat step 2 and 3 until we get the required number of random items to pick.

Caveats:

  • Each of your item must be a line in the handle (excluding the newline) because this method bypasses the get_next_item() abstraction.

  • Not all lines are picked uniformly. Due to the nature of the algorithm, the algorithm favors longer lines; longer lines have a greater probability of being picked.

ROLES MIXED IN

Role::TinyCommons::Collection::PickItems

REQUIRED METHODS

get_item_at_pos

get_item_count

HOMEPAGE

Please visit the project's homepage at https://metacpan.org/release/RoleBundle-TinyCommons-Collection.

SOURCE

Source repository is at https://github.com/perlancar/perl-RoleBundle-TinyCommons-Collection.

SEE ALSO

File::RandomLine

Role::TinyCommons::Collection::PickItems and other Role::TinyCommons::Collection::PickItems::*.

AUTHOR

perlancar <perlancar@cpan.org>

CONTRIBUTING

To contribute, you can send patches by email/via RT, or send pull requests on GitHub.

Most of the time, you don't need to build the distribution yourself. You can simply modify the code, then test via:

 % prove -l

If you want to build the distribution (e.g. to try to install it locally on your system), you can install Dist::Zilla, Dist::Zilla::PluginBundle::Author::PERLANCAR, and sometimes one or two other Dist::Zilla plugin and/or Pod::Weaver::Plugin. Any additional steps required beyond that are considered a bug and can be reported to me.

COPYRIGHT AND LICENSE

This software is copyright (c) 2021 by perlancar <perlancar@cpan.org>.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

BUGS

Please report any bugs or feature requests on the bugtracker website https://rt.cpan.org/Public/Dist/Display.html?Name=RoleBundle-TinyCommons-Collection

When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature.