The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Text::iPerl - engine for bringing any text documents alive with bits of embedded Perl

SYNOPSIS

use Text::iPerl;
include 'documentname';

or

perl -MText::iPerl -e include <infile >outfile

See iperl for a far more comfortable command-line variant

DESCRIPTION

This is the engine of an inverse Perl interpreter, which controls normal text with macro invocations and specially marked bits of Perl. This setup of the document is always the same, though details may vary according to the style in effect. (See set_style.) The engine is invoked with include, or its variants include_filehandle and include_string. It treats a given document in two phases, with two or three aspects:

Markup Style

Bits of Perl to be evaluated have to be specially marked up as such. How this is done differs greatly depending on the style in effect. But apart from different syntaxes there are only two fundamental ways in which Perl can be embedded: non-printing and printing. Not all styles provide both ways. The difference between these two ways is to be seen as default-functionality and is not restrictive. Non-printing Perl may very well use the print statement, or system-commands to output something via STDOUT into the output stream. If system-commands are used you should first turn on autoflushing ($| = 1) to ensure that output order is preserved.

Perl

The whole document is actually reinverted or transformed into a Perl-programme, where each bit of normal text gets transformed into a semicolon-terminated Perl-statement. The markup around bits of non-printing Perl simply gets removed and a terminating semicolon added, which almost never hurts. If you want a bit of non-printing Perl to control the preceding or following bit of normal text, you can prevent the semicolon by starting or ending the Perl code with \;. You can delimit the bit of normal text with a bit of non-printing Perl containing only a semicolon.

Printing bits of Perl, on the other hand, get passed as an argument list to a print statement, or to printf, if it starts with %. If a printing bit of Perl is empty, $_ is printed. If it is a literal integer, $_[n] is printed.

There are several interesting things you can do with syntactically incomplete bits of Perl. You can seal the fate of the following bit of plain text by preceding it with an expression followed by and or or and terminated with \;. Or you can have dangling curly braces of an if-elsif-else-statement. They might also be of a loop, which will likely contain one or more printing bits of Perl.

Dangling curly braces may even be of a sub, which will then print the contained plain text when called. Likewise they may be of an anonymous sub which could be the second argument to define.

There are no syntactic extensions to Perl, just a couple of variables and functions like include or define.

Macros

Normal text gets output as is, unless we have macro-definitions. If macros are defined, at runtime every bit of normal text gets repeatedly scanned for macros, which are expanded until no more macro invocations are found, i.e. macro expansions occur depth-first. Macros are functions returning a string. If they also print something, that comes in the output stream before the returned string and is not subject to repeated scanning. Scanning starts again where the last macro was found, so if a macro returns what might be the second part of a macro name together with the preceding text, that is not found. (See define, undefine & macro.)

Macro invocations consist of the macro name, a string of letters, digits and underscores, optionally usually immediately followed by a parenthesized Perl parameter list. Note that even if the macro is surrounded by a bit of Perl with a my-variable, that variable will not be visible, since macro invocations are evaluated later, not seen at compile time. Depending on the style, macro invocations may be surrounded by additional syntactic sugar. (See $macro_start and friends.)

@EXPORT

Text::iPerl exports the following functions by default:

include, include_string, include_filehandle, define, undefine, macro

@EXPORT_OK

Text::iPerl optionally exports the following function and variables:

set_style, $cache, $comment_level, debug, %debug, $documents, @documents, $joiner, $macro_end, $macro_name_end, $macro_start, $macro_start_dollar1, $max_macro_growth, $max_macro_expansions, $printfer, %trace

FUNCTIONS

debug WHEN, STRING
debug WHEN, CODEREF
define STRING, EXPR
define STRING
define

Defines a macro whose value may be interpolated into bits of plain text in scalar context. STRING or $_ should be a string consisting of letters, digits and underscores, which is the name of the macro. EXPR is the body of the macro. If it is a reference to a function the macro interpolation will call that. If it is a string-reference the macro is an alias to that macro or to that Perl-builtin, which doesn't allow a function reference to be taken. If it is missing, the macro is a soft reference to a Perl-function of the same name.

If the second argument is a string (should be single-quoted), its variables will be interpolated at the moment the macro gets called. The macro arguments may of course be accessed as '... $_[0] ... $_[1] ...', but there is a more comfortable possibility. The first argument to define may contain parameter specifications in parentheses after the name. These are a comma separated list of scalar variables with optionally a list variable at the end. Each of these variables may be assigned to, giving the named parameter a default value.

For styles like cpp which don't allow embedding Perl-expressions into the document, you can use any one of the following to get a Perl-evaluating macro:

define PERL => '@_';
define 'PERL( $eval = $_ )', '$eval';
define PERL => sub { $_[0] };
define PERL => sub { print $_[0]; '' };

The first allows multiple arguments, to be separated by $". The second gives a Perl-typical default argument of $_. The third simply evaluates one argument. The fourth does the same, but, the value being printed, it will not be reparsed for further macro-invocations.

include EXPR, REPEATCOUNT, HUSH
include EXPR, REPEATCOUNT
include EXPR
include

Includes a document, parsing it as iPerl and merging the result into the current output. EXPR works just like in open. If no filename is given, reads from STDIN. If filename is not a full path, then if called from within a known file, the file is searched in the directory of that document, else in the current directory. If it is not found there, the directories in @opt_I followed by those in @include are searched, unless filename starts with ./.

The second argument may be an integer (often 1), meaning to include the file only if that filename hasn't already been included that many times. Since this can be fooled by multiple links to the same file, or if you use chdir, the second argument may also be a reference to an integer (e.g. \1). In that case the physical identity of the file is used, rather than the filename.

The third argument, when true, means to continue silently when the file was not found.

Note that include is simply a Perl function, thus a run-time affair. This means that if you define any functions within the included document, they are not known within the including one. You can either mark them as such for the compiler (ampersand and/or parens) or you can place the include statement within a BEGIN {} block.

include_filehandle FILEHANDLE

Likewise, but reads from the FILEHANDLE.

#! /usr/local/bin/perl
use Text::iPerl;
include_filehandle DATA;
__END__
Self-parsing iPerl document goes here.
include_string EXPR
include_string

Likewise, but parses EXPR or $_ if none.

macro STRING
macro

Returns undef or the macro-definition of STRING, either as a code-reference, or the name as a string if the macro is a soft reference to a Perl function. Without an argument returns the list of defined macro-names.

if( defined macro 'mymacro' ) { ... }
foreach( macro ) { ... }
return

The normal Perl keyword, returns from a document when used at its top-level, i.e. outside of functions, macros or macro invocations, save for the m4-style pseudo-macros. This means, that the rest of the document is not processed and output.

set_style STRING[, ARGUMENT ...]
set_style CODEREF

Set one of the following iPerl-styles. The various styles are more or less adapted to various document types. But of course any style can be used anywhere. Sometimes this requires some extra care, for example HTML documents may contain the sequence !< which can lead to startling effects when used with the bang style.

It can sometimes be useful to have two different styles in a document, for example if you want to do some time-consuming offline treatment in a document that will nevertheless later be an active web-document.

The macro invocation style is the only one to be immediately effective, being a runtime affair. The styles for embedded bits of Perl, being a compiletime affair, only become effective for the next iPerl-documents to be included.

The mnemonic for the variously used {...} is a Perl block, though here it is simply a stretch of interpolated Perl code, that does not define a block. The mnemonic for <...> is a Perl input operator, but inverted here, since the document reads from Perl code. STRING may be one of the following:

'bang'
'unix'

Everything on the same line after # is deleted depending on $comment_level.

Lines starting with a ! are bits of Perl. This reminds of interactive unix programs which thus allow a shell escape.

Perl within lines, potentially spanning several lines, is enclosed in !{ and }!. This reminds of Perl blocks, but does not delimit a block. As a special case !}! without whitespace is equivalent to !{}}!, i.e. one closing brace.

Perl values to be printed to the document are enclosed in !< and >!. This reminds of the Perl read operator, inverted here in that the document reads from a Perl expression.

Macros may be optionally preceded by &, useful to set them off from preceding alphanumeric characters.

'control'

Lines starting with a ^A are bits of Perl. This reminds of the beginning of the alphabet, hence of the line.

Perl within lines, potentially spanning several lines, is enclosed in ^B and ^E. This reminds of beginning and end.

Perl values to be printed to the document are enclosed in ^P and ^E. This reminds of print and end.

'cpp'

Everything on the same line after // or from /* upto next */ is deleted depending on $comment_level.

Lines starting with a # are bits of Perl. They may be continued over several lines, as long as each line ends with a \.

generic => COMMENT, BEFOREPRINT, AFTERPRINT, BEFORE, AFTER

Arguments are 5 regexps, which may not make backreferences via parentheses. This allows you to define your own simple style. Anything matching COMMENT is simply ignored. BEFOREPRINT and AFTERPRINT markup a printing bit of Perl. And BEFORE and AFTER markup a plain bit of Perl.

'm4'

Perl within lines, potentially spanning several lines, is enclosed in the pseudo-macro perl({ and }). This reminds of Perl blocks, but does not delimit a block. As a special case perl(}) without whitespace is equivalent to perl({}}), i.e. one closing brace.

Perl values to be printed to the document are enclosed in the pseudo-macro perl(< and >). This reminds of the Perl read operator, inverted here in that the document reads from a Perl expression.

Everything from the pseudo-macro dnl through end of line is deleted.

The customary m4 macros decr, define (iPerl semantics), defn, errprint, eval, ifdef, ifelse, include (iPerl semantics), incr, index, len, maketemp, m4exit, sinclude, substr, syscmd, sysval, traceoff, traceon, translit (with an additional optional 4th argument for the modifiers of tr) and undefine are predefined.

The customary m4 macros changecom, changequote, divert, divnum, dumpdef, m4wrap, popdef, pushdef, shift and undivert are not implemented.

No macro expansion takes place after a #. This could be changed with $macro_start and friends, but note that the above mentioned pseudo-macros are already expanded at compile-time. Changing this within the document would lead to two different comment-styles being used.

Remember that macro arguments are Perl code, not just bits of quoted or unquoted string.

pod => ARG
'pod'

This style can do two things with files containing pod (plain old documentation). For one thing, if ARG is true, it can eliminate any pod from document. It then does nothing else. This allows pod to reside in any file.

For another, if ARG is missing or false, the pod is extracted from the file, processed with embedded Perl, allowing pods to be dynamic and spread across several files. The Perl embedded within the pod has nothing to do with the programme that contains the pod, even if that is a Perl programme. This is because, from a pod-point-of-view, everything that is not pod is ignored.

Paragraphs starting with =for perl or multiple paragraphs surrounded by =begin perl and =end perl contain plain Perl code that can control the pod.

Perl within paragraphs, is enclosed in P<{ and }>. This reminds of Perl blocks, but does not delimit a block. As a special case P<}> without whitespace is equivalent to P<{}}>, i.e. one closing brace.

Perl values to be printed to the document are enclosed in P< and >. This reminds of the Perl read operator, inverted here in that the document reads from a Perl expression.

M< and > delimit a macro call within a paragraph.

'xml'
'sgml'

Everything from <!-- upto next --> is deleted depending on $comment_level.

Bits of Perl are enclosed in <script runat=server> and </script> or <server> and </server>. Attributes, such as language=Perl are ignored but recomended to prevent mistreatment by other parsers. More general alternate tags are <perl> and </perl>. As a more convenient (though probably not XML or SGML compliant) alternative, closer to the other iPerl-styles, bits of Perl may be enclosed in <{ and }>. As a special case <}> without whitespace is equivalent to <{}}>, i.e. one closing brace. The alternatives are likely not recognized by WISIWYG-HTML editors, not being proper HTML, and even the server tag might be a Netscape feature, which other editors cannot handle. Even the script tag can be problematic since it may conditionally include one stretch of text or another, which cannot be done with Javascript, thus confusing an editor which unconditionally sees both stretches of text.

Perl values to be printed to the document are enclosed in &< and >;. This reminds of the Perl read operator, inverted here in that the document reads from a Perl expression. Alternately, only within < and > (actually < is not checked for, due to the forward looking nature of the parser, but should anyway be present before any >), Perl values to be printed to the document are enclosed in a pair of `. When this is not followed by a = the result is surrounded with double quotes.

Entities (iPerl macros) are enclosed in & and ;. If the enclosed text is not a defined macro, it is left as an XML entity.

CODEREF

NOTE: Since the parsing of a document has to be made more efficient, the way this CODEREF works will be totally changed in the future.

Sets a function and returns the old one, which may have been a builtin one.

The function gets four arguments, 0) a string containing the yet unparsed rest of the document, 1) a subregexp to match a beginning of line, 2) a subregexp to put before a comment matcher and 3) a subregexp to put after a comment matcher. The regexps are only relevant if your style cares about beginnings of line or comments. The comment regexps are provided depending on $comment_level. Regexps 1) and 2) also depend on whether the last match (optional 5th return value, see below) ended with a newline. Otherwise the beginning of string will not match a beginning of line.

It gets called repeatedly during parsing of a document and should return a list of 4 or 5 elements: 0) leading plain text, 1) printing Perl expr, 2) plain Perl, 3) the rest to be treated next time and optionally 4) the matched string or at least its last character. Those elements not matching anything should be undef, epsecially 1) since if it is the empty string, $_ will get printed at that point. When it returns undef as the rest, it won't get called again for that document.

undefine EXPR
undefine

Removes the definedness of EXPR or $_.

VARIABLES

@autostyle_by_contents

Hash-like list of regexps to match against document to determine the mode to use when $style starts with 'auto'. Unlike a hash, this list is processed sequentially until a match is found.

@autostyle_by_name

Hash-like list of regexps to match against filenames (actually against $documents[-1]) to determine the mode to use when $style starts with 'auto'. Unlike a hash, this list is processed sequentially until a match is found.

$cache

Make include cache the compiled form of the document for quick reuse when called again for the same file if true.

Due to a Perl-bug with nested closures, source code, rather than byte code, is cached when it contains the word sub.

$comment_level

What to do with comments in a document when compiling it. Concerns comments in the host part (like /* ... */ in style cpp), not Perl comments. Values are:

0: Do not touch comments in document.

1: Remove comments in document, when they go exactly from a beginning of line to an end of line.

2: Like 1, but there may be whitespace before the comment start or after the comment end.

3: Remove all comments in document.

This may be hairy, since iPerl has no knowledge of the host document's syntax and will remove everything that looks like a comment. In Perl or Korn shell, for example, # does not start a comment in all syntactic contexts. Or a C programme might contain /* ... */ within a string. So this variable defaults to 1, which is fairly safe.

%debug

Perform debugging for all flags associated with a true value:

c   generated Perl code
E   show intern evaluations
F   say current input file fullname
f   say current input file basename
i   say calls to include-functions
L   say location where debugger was called internally
p   show searching files in @include
t   trace for all macro calls, not only those in %trace
V   automatically implies any other letter

The following flags are only relevant if t is set or for macros in %trace:

a   show actual arguments
e   show expansion

You can add any other letter if you intend to use it in your own calls to debug.

$documents

Incremented for each document included.

@documents

Contains the list of all nested includes currently active, innermost last. Where a filename is not known for the document, contains the strings '<FILEHANDLE>', '<STDIN>' or '<STRING>'.

@include

Second list of directories where include searches for files not found in the same directory as the file where include was called. Defaults to /usr/include followed by the contents of @INC.

$joiner

Regexp (defaults to \;) to match what must be at the beginning or end of a bit of Perl to suppress the semicolon at that point.

$macro_end
$macro_name_end
$macro_start
$macro_start_dollar1

$macro_start, $macro_name_end and $macro_end are regexps describing the syntactic sugar which is eliminated around macro invocations. If, as in style m4, $macro_start has to look backwards, it should contain one paren-pair matching the portion of text not to discard, and $macro_start_dollar1 should then be true. These change every time a set_style is called explicitly or implicitly.

If $macro_start and $macro_name_end don't contain the regexp \b, macros will be found in the middle of words. Or you can use the latter variable to allow whitespace before the argument list, or prevent it alltogether with a negative lookahead for a parenthesis.

$max_macro_growth

One bit of plain text may grow by no more than this factor through macro expansions.

$max_macro_expansions

In one bit of plain text no more than this many macro expansions may occur.

@opt_I

First list of directories where include searches for files not found in the same directory as the file where include was called. This is not set by Text::iPerl but is used if set outside. The strange name comes from the fact that iperl like the various invokers of the C preprocessor and some m4 implementations use the -I option for this.

$preoutput_handler

Not yet implemented.

Coderef called and reset every time iPerl wants to output a bit of plain text. Will normally be set by programmes to offer some initialization that can be overridden by the beginning of a document.

$printfer

Regexp (defaults to %) to match what must be at the beginning of a printing bit of Perl to use printf instead of print.

$style

This is the name of the style currently in effect. If this starts with 'auto', the style used for an included document is determined in three steps as follows. This variable is then set to auto: style.

Style specified in the file

This is identical to Emacs' local variables specification inside a file. There are two possibilities (here shown for style bang): On the first line, or on the second if the first line is a shebang magic number (#! interpreter), with possibly other semicolon separated variables for use by Emacs:

-*- iPerl-style: "bang" -*-

Or, within the last 3000 characters of the document and not followed by a page break (^L), /* and */ being examples of optional comment delimiters, which, if present, must however be identical on all lines, with possibly other specification-lines only used by Emacs:

/* Local Variables: */
/* iPerl-style: "bang" */
/* End: */


The style must be given as a double-quoted literal string. This can appear anywhere, i.e. in a bit of Perl as a comment or string or in the host document. If neither of these appear the next step is tried.

Document-name matched against @autostyle_by_name

If no match is found, the next step is tried.

Document-contents matched against @autostyle_by_contents

If no match is found, the style of the including document is maintained. If there is none, we die.

@_Text_iPerl

Closure needed for internal purposes visible within your document. The effects of changing this variable are not defined.

%trace

Debug macro operations for all macros who's name is associated with a true value, irrespective of the flags in %debug.

SEE ALSO

iperl, web-iPerl, iPerl.el, perl, http://beam.to/iPerl/

1 POD Error

The following errors were encountered while parsing the POD:

Around line 1308:

Non-ASCII character seen before =encoding in '(C<#! I<interpreter>>),'. Assuming CP1252