NAME
Unixish - Data transformation framework, inspired by Unix toolbox philosophy
SPECIFICATION VERSION
1.0
VERSION
This document describes version 1.0.7 of Unixish (from Perl distribution Unixish), released on 2024-12-10.
ABSTRACT
This document specifies Unixish, Perl framework for data processing (transformation, conversion, whatever) using the tried-and-true Unix toolbox philosophy. For the implementation, see Data::Unixish.
STATUS
Early draft. The 1.0 series does not guarantee full backward compatibility between revisions, so caveat implementor. However, major incompatibility will bump the version to 1.1.
PHILOSOPHY
The Unix philosophy says a program should do only one thing and do it well. Problem is solved by sewing or chaining together a sequence of small, specialized programs. From Douglas McIlroy, the original developer of Unix pipelines:
This is the Unix philosophy: Write programs that do one thing and do it well.
Write programs to work together. Write programs to handle text streams, because
that is a universal interface.
In Unixish, programs translate to functions. Unixish is essentially a set of guidelines and tools on how to write such functions.
The goal of the framework is to let users easily create functions that can be used as normal Perl functions operating on arrays and streams (filehandles), as well as functions that can become Unix command-line utilities.
GLOSSARY
Data::Unixish is the Perl implementation.
dux is a short notation for Data::Unixish.
GUIDELINES
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
Function should accept a hash argument
%args
This future-proofs the function when more and more arguments are added.
Arguments should be described in Rinci metadata
See Rinci and Rinci::function for more details.
There are some standard arguments: in, out
in and out are analogous to standard input and output streams, explained below.
Arguments should have good defaults
Input data is given in
$args{in}
It is a "stream", usually actually a reference to array or a tied array. Function can iterate it as follows:
while (my ($index, $item) = each @{ $args{in} }) { ... }
Function SHOULD NOT slurp it in memory like this, unless necessary:
# CAUTION! for (@{ $args{in} }) { ... }
Remember that in Perl 5 for() is not lazy, the stream might contain very large amount of data or is infinite.
Output should be written (appended) to
$args{out}
It is a "stream", usually actually a reference to array or a tied array. Function can append output as follows:
while (my ($index, $item) = each @{ $args{in} }) { ... push @{ $args{out} }, $res; }
Note that assigning an array directly doesn't work the way you think:
# DOESN'T WORK $args{out} = [1, 2, 3];
Error messages can be logged to Log::ger
Standard format for error message will be specified in the future.
When processing, undef/invalid/non-applicable value should generally be skipped (passed unchanged)
For example, the date dux function accepts either an integer (assumed as Unix timestamp) or a DateTime object. Other values like undef, an empty string, or other kinds of unsupported objects should not be processed and just passed to the output stream unprocessed. A warning can be logged if needed.
A well-written dux function can be readily transformed into a usual Unix command-line utility.
NAMESPACE ORGANIZATION
Unixish is the specification.
Data::Unixish is the implementation.
Each dux function should be written in all-lowercase name, put under Data::Unixish::FUNCTION_NAME package. The function itself is put in that package with the same name. For example the Data::Unixish::date package contains the Data::Unixish::date::date function.
A further subpackaging is allowed, for example: Data::Unixish::English::count_syllables.
App::dux is a utility to access dux functions via the command-line.
HOMEPAGE
Please visit the project's homepage at https://metacpan.org/release/Unixish.
SOURCE
Source repository is at https://github.com/perlancar/perl-Unixish.
SEE ALSO
Related specifications
Rinci and Rinci::function, another specification to leverage functions.
CellFunc and RowFunc, two other specification (convention) for writing (respectively) value function (function that operates on a single data item) and row function (function that operates on a single "row" or record).
Related data manipulation frameworks
Data::TableData::Object, a common table interface to manipulate several kinds of data structure that are "table-like".
Similar projects on CPAN
Text::Pipe (2007-now)
Similarly inspired by Unix pipes, OO. I didn't find this project before I started Unixish.
Some of the differences: Text::Pipe is, as the name suggests, more text-oriented. It was created to unify text/template processing. Unixish on the other hand focuses on functions that can accept streams/arrays.
Perl Power Tools (1999-now)
(Search for
ppt
in CPAN).Actually not quite the same thing, but the end result is roughly the same for many text-oriented Unix utilities (like wc, sum, head, tail, etc).
Others
Log::ger, a logging framework, used by this specification.
AUTHOR
perlancar <perlancar@cpan.org>
CONTRIBUTING
To contribute, you can send patches by email/via RT, or send pull requests on GitHub.
Most of the time, you don't need to build the distribution yourself. You can simply modify the code, then test via:
% prove -l
If you want to build the distribution (e.g. to try to install it locally on your system), you can install Dist::Zilla, Dist::Zilla::PluginBundle::Author::PERLANCAR, Pod::Weaver::PluginBundle::Author::PERLANCAR, and sometimes one or two other Dist::Zilla- and/or Pod::Weaver plugins. Any additional steps required beyond that are considered a bug and can be reported to me.
COPYRIGHT AND LICENSE
This software is copyright (c) 2024 by perlancar <perlancar@cpan.org>.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.
BUGS
Please report any bugs or feature requests on the bugtracker website https://rt.cpan.org/Public/Dist/Display.html?Name=Unixish
When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature.