IO::ReadPreProcess - Macro processing built into IO::File replacement
use IO::ReadPreProcess; my $fh = new IO::ReadPreProcess(File => './input.file') or die "Startup error: $IO::ReadPreProcess::errstr\n"; while(<$fh>) { print $_; # Or other processing of input } die($IO::ReadPreProcess::errstr . "\n") if($fn->error);
The input file may contain:
This line will be returned by getline .# This is a comment .let this := 'that' Another line .if this eq 'that' Another line returned .print The variable this has the value \v{this} .else This line will not be seen .fi This line returned .include another.file Line returned after the contents of another.file
Provide an 'intelligent' bottom end read function for scripts, what is read is pre-processed before the script sees it. Your program does not need code to conditionally discard some input, include files and substitute values.
An easy way of reading input where some lines are read conditionally and other files included: .if/.else/.elseif/.fi, do: .include .let .print, loops: .while .for; subroutine definition & call; write to other streams - and more.
Provides IO::Handle-ish functions and input diamond - thus easy to slot in to existing scripts.
The preprocessing layer has variables that can be set and read by your perl code. In the input files they are set via .let directives, and can be made part of your script's input with .echo and \v{xxx}.
.let
.echo
\v{xxx}
IO::ReadPreProcess returns lines from the input stream. This may have directives that include:
IO::ReadPreProcess
set variables to arithmetic or string expressions
conditionally return lines
include other files
print to stdout or stderr
Conditions are done by Math::Expression.
Math::Expression
new returns an IO::ReadPreProcess object, undef on error.
new
undef
Arguments to new
File
Fd
Arguments File and Fd, see method open. If one of these is not given, method open must be called.
open
Trim
If this is true (default) then input lines will be trimmed of spaces.
Math
A Math::Expression object that will be used for expression evaluation. If this is not given a new object will be instantiated with PermitLoops => 1, EnablePrintf => 1.
PermitLoops => 1, EnablePrintf => 1
If you share a Math::Expression object between different IO::ReadPreProcess objects then the different files being read will see the same variables.
DirStart
DirStartRE
DirStart is the string at the start of a line that introduces a directive, the default is full stop .. If you wish to change this, provide this option. So to use directives like #if go:
.
#if
new IO::ReadPreProcess(File => 'fred', DirStart> => '#')
Before use the characters that are special in Regular Expressions will have a backslash \ prepended, this string is stored in DirStartRE. If the option DirStartRE is provided this transformation will not be done and the provided string will be used directly, thus more complex start sequences can be used.
\
Eg: allow the start sequence to be either . or %:
%
new IO::ReadPreProcess(File => 'fred', DirStartRE> => '[.%]')
Raw
If this is given and true then processing of directives does not happen, they are returned by getline. You may change this property as input is read but take care to avoid errors, eg: a .if is read in Raw mode but its .fi in Cooked mode; a complaint will result as the .fi did not have an .if.
getline
.if
.fi
Raw might set when in an .include. When the end of that file is reached the previous file (that had the .include directive) will be returned to and lines read from there.
.include
Default: 0
OnError
What should happen when an error happens. Values:
warn
Print a message to STDERR with warn, this is the default.
STDERR
die
Print a message to STDERR with die which terminates the program.
Do nothing. The application should check the method error and look at $IO::ReadPreProcess::errstr.
error
$IO::ReadPreProcess::errstr
Pipes are only allowed with .include if the property PipeOK is true (default false).
PipeOK
Loops (while, until and for) will abort after this number of iterations. The count restarts if the loop is restarted. A value of 0 disables this test.
while
until
for
0
This may be overridden on an individual loop with the -i option.
-i
Default 50.
This defines output streams that may be written to by .out and .print -o. The streams can either be IO::File, an array or a reference to a function (when the line will be passed as the only argument).
.out
.print -o
IO::File
The members STDOUT and STDERR are added if not passed, given values *STDOUT{IO} and *STDERR{IO}. Names must match the RE /\w+/.
STDOUT
*STDOUT{IO}
*STDERR{IO}
/\w+/
Eg:
my $lf = IO::File->new('logFile', 'w+'); my @lines; sub func { say "func called '$_[0]'"; } OutStreams => { fun => \&func, log => $lf, buf => \@lines }
This provides the ability to write to multiple places, however the file (or function) must be opened by the Perl script. IO::ReadPreProcess does not provide the ability to open new files.
.out can create in-memory streams. These have names like @divert (ie match /@\w+/i). In-memory streams can be written to by .out & .print -o and read by .include & .read.
@divert
/@\w+/i
.read
The properties Trim, OnError, MaxLoopCount, OutStreams, PipeOK and Raw (see new) may be directly assigned to at any time.
MaxLoopCount
OutStreams
$fh->Raw = 1;
Also the following:
Note that there are many useful values that you can get here, some set by IO::ReadPreProcess (see below), others by .let directives. You can thus communicate with the preprocessing layer.
You can set Math variables like this:
$fh->{Math}->VarSetScalar('FirstName', 'Henry');
You can get Math variable values like this:
$name = $fh->{Math}->ParseToScalar('FirstName'); $fileName = $fh->{Math}->ParseToScalar('_FileName');
Place
A string that can be used in messages to the user about the current input place. The value will be like:
line 201 of slides/regular-expressions.mod
warn "Something wrong at $fh->{Place}\n";
This has been discussed above.
The argument is the name of the file to be opened and read from. This method need not be used if the information is given to new. open returns an IO::ReadPreProcess object, undef on error.
gives the name of the file to be opened. This is mandatory.
If this is given it provides a file descriptor (from IO::File) that is already open for reading. In which case File (which must still be given) is a name that is used in error messages. This is useful if you want to read from stdin or a pipe.
stdin
If there is an error in opening a file look at $IO::ReadPreProcess::errstr;
Example:
$fh->open(Fd => \*STDIN, File => 'Standard input', OutStreams => { log => $lf });
close
Closes the current input file. If the current file was opened by a .include, the next line that is read will be the one after the .include directive.
This will not normally be used by applications.
close returns an IO::ReadPreProcess object, undef on error.
** Also used to end a block
will return a line from input. This line is not necessarily the next one in the input file since directives (see below) may specify that some lines are not returned or that input is taken from another file.
As an alternative, the object (what is returned by new) may be used in the diamond operator which really calls getline.
After all input has been read this returns undef.
while(my $line = $fh->getline) { ... }
Returns undef on error.
getlines
Returns the rest of input as an array.
This must be called in a list context.
putline
The argument list will be put as input on the current frame and these will be 'read' as the very next input. Useful for running a .sub. Eg:
$fh->putline('.show Frodo 35');
binmode
This package is intended to read text files, thus setting binary data is probably not a good idea. binmode also allows different (layer) encoding to be supported, eg:
$fh->binmode(':utf8');
Any binmode settings will be applied to all files subsequently opened, eg: because of .include.
Returns true on success, undef on error.
See perl's binmode function.
eof
Returns 1 if the next read will return End Of File or the file is not open.
Returns true of there has been an error. See clearerr.
clearerr
Clears any error indicator.
Input files may contain directives. These all start with a full stop (.) at the start of line, this may be changed with DirStart. There may not be spaces before the ..
Lines starting with directives other than the ones below will be returned to the application.
.#
These lines are treated as comment and are removed from input.
The argument is an expression as understood by Math::Expression. Then result is ignored. This may be used to set one or more variables.
.let count := 0; page := 1 .let ++count .let if(count > 10) { ++page; count := 0 }
.elseif
.else
.unless
The rest of the .if line is evaluated by Math::Expression and if the result is true the following lines will be returned. An optional .else reverses the sense of the .if as regards (not) returning lines. .if may be nested. A .if must have a matching and ending .fi. .elseif may be used where a .else can be found and must be followed by a condition. .elsif is a synonym for .elseif.
.elsif
.unless is the same as .if except that the truthness of the result is considered inverted.
Text following .fi or .else will be ignored - you may use as comment.
The condition may be a defined subroutine which will be run and the value set by .return used as the boolean. The arguments are processed as if by .print.
.return
.print
.if .someSub arg1 \v{someVariable} Conditional text .fi
The condition may also be one of the directives: .include .read .test
.test
The rest of the line will be printed to stdout.
stdout
If the line starts -e it will be written to output stream STDERR.
-e
If the line starts -o strm it will be written to output stream strm. strm may be an in-memory stream.
-o strm
strm
.print -o log Something interesting has happened! .print -o @divert A line to be read back later
The following escapes will be recognised and substitutions performed:
\e
generates the escape character \.
\0
generates the empty string. You might use this if you wanted to .print a line starting with -e.
\v{var}
interpolates variable var or array member array[index] from Math::Expression. var must match the regular expression: /\w+|\w+\[\w+\]/i
var
array[index]
Escape substitution is performed as with .print and the line returned by getline. This allows variables to be used in the input the application reads without it being aware of what is going on.
.echo Index=\v{i} person=\v{names[i]}
The first argument is a file path that is opened and lines from this returned to the application.
Paths that start / are absolute and are just accepted.
/
Path that start # are taken to be with respect to the current working directory of the process. The # is removed and the path accepted.
#
.# Include a file 'header' from a generic 'snippets' directory: .include #snippets/header
Other paths are relative to the file being processed, the directory path is prepended and the result used. If such a path is used in a file opened by Fd an error results.
.# Include a file in the same directory as the current file: .include common_module
If the path starts $ the next word is a variable name. The value is prepended to the rest of the path and the file tested for existence as above (eg test starts /, # and others). If the variable is an array the paths are tried until one is found. Eg:
$
.let dirs := split(':', '.:mydir:#builddir:/home/you/yourdir:/usr/local/ourdir') .include $dirs/good_file.txt
Words that follow are deemed arguments and made available within the include via the array _ARGS. See .sub.
_ARGS
.sub
.# header can generate different headers, ask for one suitable for a report: .include #snippets/header report
The file path and arguments are processed for escapes as .print.
The file path and arguments may contain spaces if they are surrounded by quotes ('").
'"
If the path starts | the rest of the line is a pipe that will be run and read from. Pipes are only allowed if the property PipeOK is true (default false). WARNING this will run an arbitrary command, you must be confident of the source and contents of the files being processed.
|
If the first arguments are -s name the file is opened on a named stream that may be used by .read and should be closed with .close.
-s name
.close
If the first argument is -pn the file stream is put n frames below the current one. A new frame is created for every file opened, if, while, sub executed, ... (n is an optional number, default: 1)
-pn
n
if
sub
If the path starts with an @ (ie matches /@\w+/i) the include reads from the in-memory stream that was created with an earlier .out. Eg:
@
.include @divert
This is only needed to close named streams. The -s name option is needed.
This diverts output to the output stream (see: OutStreams) mentioned. Lines generated will be sent there until a .out directive without an argument.
.out index Meals in London Times of the last tube trains .out
In-memory streams must be created before they are used, this is done with the -c option. -c may be used on an existing stream and will throw away any existing content.
-c
.out -c @buf Text diverted to @buf .out
.local
Marks the arguments as variable names that are local to the current block (.include, .while, .sub, ...). When the block returns the previous value will be restored. The values restored are the values of the variables at the time the .local is seen. Note that variable scope is dynamic, not lexical.
.while
This happens automatically for _ARGS on an .include and \c{.sub} and named \c{.sub} arguments.
This ends reading a file early, the previous file is picked up on the line after the .include. At the top level (ie first file) end of file is returned to the application.
Within a .sub this may be used to return a value. The value of the last expression in a .sub is not automatically used as a return value.
.return may be followed by an expression; this will be assigned to the variable _ (underscore). Default undef:
_
.return count + 1
.exit
The application will be terminated.
Any text after on the line will be processed by Math::Expression and if it is a number it is used as an exit code. If none is specified the exit code will be 2.
.eval
The rest of the line is processed for escapes as .print. It is then treated as if it had just been read. The processed line might even start with a command that is recognised by this module, eg this ends up setting variable a to the value 3:
a
.let a := 1; b := 2; var := 'a' .print a=\v{a} b=\v{b} .eval .let \v{var} := 3 .print a=\v{a} b=\v{b}
Do not use .eval to generate a conditional or loop, eg: .if; .while.
Read the next line of input into a variable. The line is trimmed of the trailing newline (chomped). It is trimmed of white space if Trim. The variable _EOF is assigned 0.
_EOF
At end of file the variable is assigned the empty string and the variable _EOF is assigned 1. The variable _ is set to 1 on read success.
This will be of most use with a stream opened with a -p or -s option:
-p
-s
.include -p | hostname .read host .echo This machine is called \v{host} .include -s who | whoami .read -s who me .echo Logged in as \v{me} .close -s who
If the first argument is an in-memory stream (ie a name that starts with @) a line is read from that. Eg:
.read @divert line
Defines a subroutine on the following lines, ending with .done. The subroutine is called by invoking it .name.
.done
.name
Arguments may be passed to the subroutine and are available in the array _ARGS. Following the name optional names may be given, these are variables as .local and, when called, any arguments are copied there. Beware: these are copies, ie separate from what is in _ARGS.
name
.sub show name age Hobbits live in the Shire .echo \v{name} is \v{age} years old .echo That name again: \v{_ARGS[0]} .done .show 'Bilbo Baggins' 50
You can get the original argument string with join, beware this will not give the exact argument string since if two words are separated by more than one space the extra spaces will be lost.
join
.sub manyArgs .let allArg := join(' ', _ARGS); na := count(_ARGS) .echo All \v{na} arguments as a string '\v{allArg}' .done .manyArgs all cats have whiskers
.noop
This is a no-operation and does nothing.
This starts a loop that continues as long as the expression (see .if) is true. The loop is terminated by the line .done.
If the option -inn is given, the loop limit is set to nn for this loop. See default MaxLoopCount. There may be spaces between -i and nn.
-inn
nn
.let i := 0 .while -i 100 i++ < 100 Part of a .while loop .echo i has the value \v{i} .done
Loops are buffered in memory. .include within a loop is not buffered, ie read on every iteration.
.until
This is the same as .while except that the loop stops when the expression becomes true.
.for
This starts a loop. The loop is terminated by the line .done.
This has the form:
.for init ;; condition ;; incr
Note that the ;; will be seen even if inside a quoted string.
;;
As with .while and .until you may use the -i option. init is run once before the loop starts; condition is as .while; incr is run after every iteration. init and incr are processed by Math::Expression, ie no subs allowed.
init
condition
incr
Count down begins .for i := 10 ;; i > 0 ;; i-- .echo \v{i} .done Blast off! .sub foo num .return num > 2 .done .for i := 5 ;; .foo \v{i} ;; i-- something ... .done
.break
.last
Terminate the current loop. These directives are synonyms.
These may be followed by the number of loops to terminate, default 1.
.continue
.next
Abandon the rest of the current loop, start the next iteration. These directives are synonyms.
These may be followed by a number, inner ones are terminated, that loop number has its iteration started, default 1.
Ends blocks: .while .until .for .sub. If may be followed by the type of block that it ends, if so a consistency check is made.
.for i := 0 ;; i < 5 ;; i++ Text output .done for
Various tests. This will set _ to 0 or 1.
1
-f
This returns true if the argument file exists. The file path is as for .include except that pipes are not allowed. This also sets the array _STAT with information about the file (see below) and _TestFile will be the path found - ie after the #, $, ... is resolved.
_STAT
_TestFile
.if .test -f $dirs/good_file.txt .print -e Including \v{_TestFile}, size \v{_STAT[7]} bytes. .include $dirs/good_file.txt .fi
-m
This returns true if the argument in-memory stream exists. If it exists, the variable _COUNT is set to the number of lines in the stream.
_COUNT
.error
An error is returned to the application, ie undef is returned. The remaining text on the line is processed, see OnError above.
.set
Permits the setting of run time options. These may also be given as arguments to new:
trace=n
Set the trace level to n. 1 traces directives, 2 traces directives and generated input.
2
.case
.do
.endswitch
.function
.switch
These are reserved directives that may be used in the future.
Any starting _ are reserved for future use
The following variables will be assigned to:
_FileName
The name of the current File.
_LineNumber
The number of the line just read.
_FileNames
Array of files being read. The file last .included is in _FileNames[-1].
_FileNames[-1]
_LineNumbers
Array of line numbers as _FileNames.
_IncludeDepth
The number of files that are open for reading. The file passed to new or open is number 1.
Value of the last .return.
Arguments provided to a .sub or .include.
_TIME
The current time (seconds), supplied by Math::Expression.
Set to 1 if .read finds End Of File, else set to 0.
_CountGen
Count of lines generated.
_CountSkip
Count of lines skipped.
_CountDirect
Count of directives processed.
_CountFrames
Count of frames opened. For every: sub, if, loop.
_CountOpen
Count of files opened.
EmptyArray
EmptyList
Empty arrays supplied by Math::Expression.
Array of information about the last file found by .test -f. Members are as for perl's stat function:
.test -f
0 device number of filesystem 1 inode number 2 file mode (type and permissions) 3 number of (hard) links to the file 4 numeric user ID of file's owner 5 numeric group ID of file's owner 6 the device identifier (special files only) 7 total size of file, in bytes 8 last access time in seconds since the epoch 9 last modify time in seconds since the epoch 10 inode change time in seconds since the epoch 11 preferred block size for file system I/O 12 actual number of blocks allocated
The name of the last file found by .test -f.
_Initialised
Internal use, prevent double initialisation of variables.
Most methods return undef if there is an error. There will be a reason in $IO::ReadPreProcess::errstr. The error could be from IO::Handle (where $! might be helpful) or an error in the file format in which case $! will be set to EINVAL.
IO::Handle
$!
EINVAL
Beware: getline returns undef on end of file as well as error. Checking the method error will distinguish the two cases.
Note also the property OnError (see above).
The script below sets some variables that are passed on the command line, more from include files and then reads stdin. The variables that are set can be used to control what it reads.
use IO::ReadPreProcess; use Getopt::Long; use Math::Expression; # One arithmetic instance so that variables are visible in all files: my $ArithEnv = new Math::Expression( PermitLoops => 1, EnablePrintf => 1 ); my @let = (); my @includes = (); my $verbose = 0; my $help = 0; # Look at command line options ... add other options here: GetOptions(help => \$help, 'include=s' => \@includes, 'let=s' => \@let, verbose => \$verbose); Usage if $help; # Evaluate all --let # Look like: --let='advanced := 1' for (@let) { say "Evaluating: $_" if $verbose; die "Invalid --let='$_'\n" unless(defined $ArithEnv->ParseToScalar($_)); } # Read all --include # These must not yeild anything other than blank lines # The point is that we evaluate .let, etc. for my $file (@includes) { say "Including: $file" if $verbose; my $inc = IO::ReadPreProcess->new(File => $file, Math => $ArithEnv, OnError => 'die', PipeOK => 1) or die "$0: Opening include '$file': $IO::ReadPreProcess::errstr\n"; # All that is next should be empty lines: while (<$inc>) { die "Non empty line found via '--include $file' at $inc->{Place}\n" if /\S/; } } # If not stdin, maybe loop over @ARGV: my $fh = new IO::ReadPreProcess(Fd => \*STDIN, File => 'Standard input', Math => $ArithEnv) or die "Startup error: $IO::ReadPreProcess::errstr\n"; while(<$fh>) { ... die "Error ... at: $fh->{Place}\n" if(...); } # Use pre-processor variable print "Sum output " . $ArithEnv->ParseToScalar('sum') . "\n";
Most of the interest lies in the input:
.let sum := 0 A line of input .# Check to see if this is advanced .if advanced Complicated stuff .let level := 'advanced' .if advanced > 1 .# Bring in an extra file: .include extra_files/very_complex .let sum = sum + 2 .fi advanced > 1 .else Simple stuff .let level := 'simple' .fi .# Bring in an extra file where _ARGS[0] is either 'advanced' or 'simple': .include extra_files/extra_module \v{level} .print Showing material that is \v{level}
For more examples see the test suite.
At the end of the run you might want to do this:
# Some stats, for fun: say STDERR $ArithEnv->ParseToScalar('printf("Preprocessing: lines generated %d, skipped %d. Directives %d, frames opened %d, files opened %d", _CountGen, _CountSkip, _CountDirect, _CountFrames, _CountOpen)');
Do be aware that a .include will open any file for which the process has permissions. So there is scope for an input file to pass the contents of arbitrary files into your program; this also applies to any files that the initial input file may, directly or indirectly, .include.
If a pipe is created: read this section twice.
Summary: be aware of the provenance of all input files.
When used in the diamond operator in a list context only one line will be returned. This is due to a problem in the perl module overload.
overload
Please report any bugs or feature requests to bug-io-readpreprocess at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=IO-ReadPreProcess. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
bug-io-readpreprocess at rt.cpan.org
You can find documentation for this module with the perldoc command.
perldoc IO::ReadPreProcess
You can also look for information at:
RT: CPAN's request tracker (report bugs here)
http://rt.cpan.org/NoAuth/Bugs.html?Dist=IO-ReadPreProcess
AnnoCPAN: Annotated CPAN documentation
http://annocpan.org/dist/IO-ReadPreProcess
CPAN Ratings
http://cpanratings.perl.org/d/IO-ReadPreProcess
Search CPAN
http://search.cpan.org/dist/IO-ReadPreProcess/
Alain Williams, <addw@phcomp.co.uk> April 2015, 2017.
<addw@phcomp.co.uk>
Copyright (C) 2015, 2017 Alain Williams. All Rights Reserved.
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
See http://dev.perl.org/licenses/ for more information.
Provide an 'intelligent' bottom end read function for scripts.
To install IO::ReadPreProcess, copy and paste the appropriate command in to your terminal.
cpanm
cpanm IO::ReadPreProcess
CPAN shell
perl -MCPAN -e shell install IO::ReadPreProcess
For more information on module installation, please visit the detailed CPAN module installation guide.