The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

pick-random-lines - Pick one or more random lines from input

VERSION

This document describes version 0.021 of main::_DataPacker (from Perl distribution App-PickRandomLines), released on 2023-11-20.

SYNOPSIS

pick-random-lines --help (or -h, -?)

pick-random-lines --version (or -v)

pick-random-lines [--algorithm=str] [(--config-path=path)+|--no-config] [--config-profile=profile] [--format=name|--json] [--(no)naked-res] [--no-env] [--num-lines=int|-n=int] [--page-result[=program]|--view-result[=program]] -- [files] ...

DESCRIPTION

TODO: * option to allow or disallow duplicates

OPTIONS

* marks required options.

Main options

--algorithm=s

Default value:

 "scan"

Valid values:

 ["scan","seek"]

scan is the algorithm described in the perlfaq manual (`perldoc -q "random line"). This algorithm scans the whole input once and picks one or more lines randomly from it.

seek is the algorithm employed by the Perl module File::RandomLine. It works by seeking a file randomly and finding the next line (repeated n number of times). This algorithm is faster when the input is very large as it avoids having to scan the whole input. But it requires that the input is seekable (a single file, stdin is not supported and currently multiple files are not supported as well). Might produce duplicate lines.

--file=s@

If none is specified, will get input from stdin.

Can also be specified as the 1st command-line argument and onwards.

Can be specified multiple times.

--files-json=s

See --file.

Can also be specified as the 1st command-line argument and onwards.

--num-lines=s, -n

Default value:

 1

If input contains less lines than the requested number of lines, then will only return as many lines as the input contains.

Configuration options

--config-path=s

Set path to configuration file.

Can actually be specified multiple times to instruct application to read from multiple configuration files (and merge them).

Can be specified multiple times.

--config-profile=s

Set configuration profile to use.

A single configuration file can contain profiles, i.e. alternative sets of values that can be selected. For example:

 [profile=dev]
 username=foo
 pass=beaver
 
 [profile=production]
 username=bar
 pass=honey

When you specify --config-profile=dev, username will be set to foo and password to beaver. When you specify --config-profile=production, username will be set to bar and password to honey.

--no-config

Do not use any configuration file.

If you specify --no-config, the application will not read any configuration file.

Environment options

--no-env

Do not read environment for default options.

If you specify --no-env, the application wil not read any environment variable.

Output options

--format=s

Choose output format, e.g. json, text.

Default value:

 undef

Output can be displayed in multiple formats, and a suitable default format is chosen depending on the application and/or whether output destination is interactive terminal (i.e. whether output is piped). This option specifically chooses an output format.

--json

Set output format to json.

--naked-res

When outputing as JSON, strip result envelope.

Default value:

 0

By default, when outputing as JSON, the full enveloped result is returned, e.g.:

 [200,"OK",[1,2,3],{"func.extra"=>4}]

The reason is so you can get the status (1st element), status message (2nd element) as well as result metadata/extra result (4th element) instead of just the result (3rd element). However, sometimes you want just the result, e.g. when you want to pipe the result for more post-processing. In this case you can use --naked-res so you just get:

 [1,2,3]
--page-result

Filter output through a pager.

This option will pipe the output to a specified pager program. If pager program is not specified, a suitable default e.g. less is chosen.

--view-result

View output using a viewer.

This option will first save the output to a temporary file, then open a viewer program to view the temporary file. If a viewer program is not chosen, a suitable default, e.g. the browser, is chosen.

Other options

--help, -h, -?

Display help message and exit.

--version, -v

Display program's version and exit.

COMPLETION

The script comes with a companion shell completer script (_pick-random-lines) for this script.

bash

To activate bash completion for this script, put:

 complete -C _pick-random-lines pick-random-lines

in your bash startup (e.g. ~/.bashrc). Your next shell session will then recognize tab completion for the command. Or, you can also directly execute the line above in your shell to activate immediately.

It is recommended, however, that you install modules using cpanm-shcompgen which can activate shell completion for scripts immediately.

tcsh

To activate tcsh completion for this script, put:

 complete pick-random-lines 'p/*/`pick-random-lines`/'

in your tcsh startup (e.g. ~/.tcshrc). Your next shell session will then recognize tab completion for the command. Or, you can also directly execute the line above in your shell to activate immediately.

It is also recommended to install shcompgen (see above).

other shells

For fish and zsh, install shcompgen as described above.

CONFIGURATION FILE

This script can read configuration files. Configuration files are in the format of IOD, which is basically INI with some extra features.

By default, these names are searched for configuration filenames (can be changed using --config-path): ~/.config/pick-random-lines.conf, ~/pick-random-lines.conf, or /etc/pick-random-lines.conf.

All found files will be read and merged.

To disable searching for configuration files, pass --no-config.

You can put multiple profiles in a single file by using section names like [profile=SOMENAME] or [SOMESECTION profile=SOMENAME]. Those sections will only be read if you specify the matching --config-profile SOMENAME.

You can also put configuration for multiple programs inside a single file, and use filter program=NAME in section names, e.g. [program=NAME ...] or [SOMESECTION program=NAME]. The section will then only be used when the reading program matches.

You can also filter a section by environment variable using the filter env=CONDITION in section names. For example if you only want a section to be read if a certain environment variable is true: [env=SOMEVAR ...] or [SOMESECTION env=SOMEVAR ...]. If you only want a section to be read when the value of an environment variable equals some string: [env=HOSTNAME=blink ...] or [SOMESECTION env=HOSTNAME=blink ...]. If you only want a section to be read when the value of an environment variable does not equal some string: [env=HOSTNAME!=blink ...] or [SOMESECTION env=HOSTNAME!=blink ...]. If you only want a section to be read when the value of an environment variable includes some string: [env=HOSTNAME*=server ...] or [SOMESECTION env=HOSTNAME*=server ...]. If you only want a section to be read when the value of an environment variable does not include some string: [env=HOSTNAME!*=server ...] or [SOMESECTION env=HOSTNAME!*=server ...]. Note that currently due to simplistic parsing, there must not be any whitespace in the value being compared because it marks the beginning of a new section filter or section name.

To load and configure plugins, you can use either the -plugins parameter (e.g. -plugins=DumpArgs or -plugins=DumpArgs@before_validate_args), or use the [plugin=NAME ...] sections, for example:

 [plugin=DumpArgs]
 -event=before_validate_args
 -prio=99
 
 [plugin=Foo]
 -event=after_validate_args
 arg1=val1
 arg2=val2

 

which is equivalent to setting -plugins=-DumpArgs@before_validate_args@99,-Foo@after_validate_args,arg1,val1,arg2,val2.

List of available configuration parameters:

 algorithm (see --algorithm)
 files (see --file)
 format (see --format)
 naked_res (see --naked-res)
 num_lines (see --num-lines)

ENVIRONMENT

PICK_RANDOM_LINES_OPT

String. Specify additional command-line options.

FILES

~/.config/pick-random-lines.conf

~/pick-random-lines.conf

/etc/pick-random-lines.conf

HOMEPAGE

Please visit the project's homepage at https://metacpan.org/release/App-PickRandomLines.

SOURCE

Source repository is at https://github.com/perlancar/perl-App-PickRandomLines.

SEE ALSO

Data::Unixish::pick.

shuf. The venerable Unix utility. shuf -n is a Unix idiom for when wanting to pick one or several lines from an input. Our pick is generally slower than the optimized C-based utility, but offers several pick algorithms like scan (which does not need to hold the entire input in memory for shuffling) and seek (which does not need to scan the entire input).

AUTHOR

perlancar <perlancar@cpan.org>

CONTRIBUTING

To contribute, you can send patches by email/via RT, or send pull requests on GitHub.

Most of the time, you don't need to build the distribution yourself. You can simply modify the code, then test via:

 % prove -l

If you want to build the distribution (e.g. to try to install it locally on your system), you can install Dist::Zilla, Dist::Zilla::PluginBundle::Author::PERLANCAR, Pod::Weaver::PluginBundle::Author::PERLANCAR, and sometimes one or two other Dist::Zilla- and/or Pod::Weaver plugins. Any additional steps required beyond that are considered a bug and can be reported to me.

COPYRIGHT AND LICENSE

This software is copyright (c) 2023, 2020 by perlancar <perlancar@cpan.org>.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

BUGS

Please report any bugs or feature requests on the bugtracker website https://rt.cpan.org/Public/Dist/Display.html?Name=App-PickRandomLines

When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature.