NAME
Text::Embed - Cleanly seperate unwieldy text from your source code
SYNOPSIS
use Text::Embed
use Text::Embed CODE|REGEX|SCALAR
use Text::Embed CODE|REGEX|SCALAR, LIST
ABSTRACT
Often, code requires large chunks of text to operate - not large enough to add extra file dependencies, but enough to make using quotes and heredocs' ugly.
A typical example might be code generators - the text itself is code, and as such is difficult to differentiate and maintain when it is embedded inside more code. Similarly, CGI scripts often include embedded HTML or SQL templates.
Text::Embed provides the programmer with an flexible way to store these portions of text in their namespace's __DATA__ handle - away from the logic - and access them through the package variable %DATA.
DESCRIPTION
General Usage:
The general usage is expected to be suitable for a majority of cases.
use Text::Embed;
foreach(keys %DATA)
{
print "$_ = $DATA{$_}\n";
}
print $DATA{foo};
__DATA__
__foo__
yadda yadda yadda...
__bar__
ee-aye ee-aye oh
__baz__
woof woof
Custom Usage:
There are two stages to Text::Embed's execution - corresponding to the first and remaining arguments in its invocation.
use Text::Embed (
sub{ ... }, # parse key/values from DATA
sub{ ... }, # process pairs
... # process pairs
);
...
__DATA__
...
Stage 1: Parsing
By default, Text::Embed uses similar syntax to the __DATA__ token to seperate segments - a line consisting of two underscores surrounding an identifier.
Of course, what is suitable depends on the text being embedded, so a REGEX or CODE reference can be passed as the first argument - in order to gain finer control of how __DATA__ is parsed:
- REGEX
-
use Text::Embed qr(<<<<<<<<(\w*?)>>>>>>>>);
A regular expression will be used in a call to
split()
. Any leading or trailing empty strings will be removed automatically. - CODE
-
use Text::Embed sub{$_ = shift; ...}
A subroutine will be passed a reference to the __DATA__ string. It should return a list of key-value pairs.
In the name of laziness, Text::Embed provides a couple of predefined formats:
- :define
-
#define BAZ baz baz baz #define FOO foo foo foo foo foo foo
- :cdata
-
<![BAZ[baz baz baz]]> <![FOO[ foo foo foo foo foo foo ]]>
- :default
-
__BAZ__ baz baz baz __FOO__ foo foo foo foo foo foo
Stage 2: Processing
After parsing, each key-value pair can be further processed by an arbitrary number of callbacks.
A common usage of this might be controlling how whitespace is represented in each segment. Text::Embed provides some likely defaults which operate on the hash values only:
- :trim
-
Removes trailing or leading whitespace
- :compress
-
Substitutes zero or more whitspace with a single <SPACE>
- :block
-
Removes trailing or leading blank lines, preserves indentation
- :raw
-
Leave untouched
- :default
-
Same as :raw
If comments would make your segments easier to follow, Text::Embed also provides some defaults for stripping common comment syntax:
- :strip-perl
-
Strips Perl comments
- :strip-c
-
Strips C-like comments -
/*...*/
- :strip-cpp
-
Strips both C-like and line-based
//...
comments - :strip-xml
-
Strips XML/HTML-like comments -
<!-- ... -->
If you need more control, CODE references or named subroutines can be invoked as necessary.
An Example Callback chain
For the sake of brevity, consider a module that has some embedded SQL. We can implement a processing callback that will prepare each statement, leaving %DATA full of ready to execute DBI statement handlers:
package Whatever;
use DBI;
use Text::Embed(':default', ':trim', 'prepare_sql');
my $dbh;
sub prepare_sql
{
my ($k, $v) = @_;
if(!$dbh)
{
$dbh = DBI->connect(...);
}
$$v = $dbh->prepare($$v);
}
sub get_widget
{
my $id = shift;
my $sql = $DATA{select_widget};
$sql->execute($id);
if($sql->rows)
{
...
}
}
__DATA__
__select_widget__
SELECT * FROM widgets WHERE widget_id = ?;
__create_widget__
INSERT INTO widgets (widget_id,desc, price) VALUES (?,?,?);
..etc
Notice that each pair is passed by reference. At this point it is safe to rename or modify keys. Undefining a key removes the entry from %DATA.
Utility Functions
Several utility functions are available to aid implementing custom processing handlers.
The first set are equivalent to the default processing options:
- Text::Embed::trim SCALARREF
-
use Text::Embed(':default',':trim'); use Text::Embed(':default', sub {Text::Embed::trim($_[1]);} );
- Text::Embed::compress SCALARREF
-
use Text::Embed(':default',':compress'); use Text::Embed(':default', sub {Text::Embed::compress($_[1]);} );
- Text::Embed::block SCALARREF
-
use Text::Embed(':default',':block'); use Text::Embed(':default', sub {Text::Embed::block($_[1]);} );
Two additional functions are available:
- Text::Embed::strip SCALARREF [REGEX] [REGEX]
-
If similar behaviour to comment stripping is required in a handler, then this function can parse both line-based and multi-line comments, depending on its input.
For example, C++ comments are stripped using:
Text::Embed::strip(\$my_data, '//'); Text::Embed::strip(\$my_data, '/\*', '\*/');
- Text::Embed::interpolate SCALARREF HASHREF [REGEX]
-
Typically, segments may well be some kind of template. This function can be used to interpolate values from a hash into the string data. The default variable syntax is of the form
$(foo)
:my $tmpl = "Hello $(name)! Your age is $(age)\n"; my %vars = (name => 'World', age => 4.5 * (10 ** 9)); Text::Embed::interpolate(\$tmpl, \%vars); print $tmpl;
Any interpolation is done via a simple substitution. An additional regex argument should accomodate this appropriately, by capturing the necessary hashkey in
$1
:Text::Embed::interpolate(\$tmpl, \%vars, '<%(\w+)%>');
BUGS & CAVEATS
The most likely bugs related to using this module should manifest themselves as bad key/value
error messages. There are two related causes:
- COMMENTS
-
It is important to realise that Text::Embed does not have its own comment syntax or preprocessor. Comments should exist in the body of a segment - not preceding it. Any parser that works using
split()
is likely to fail if comments precede the first segment. - CUSTOM PARSING
-
If you are defining your own REGEX parser, make sure you understand how it works when used with
split()
- particularly if your syntax wraps your data. Consider using a subroutine for anything non-trivial.
If you employ REGEX parsers, use seperators that are significantly different - and well spaced - from your data, rather than relying on complicated regular expressions to escape pathological cases.
Bug reports and suggestions are most welcome.
AUTHOR
Copyright (C) 2005 Chris McEwan - All rights reserved.
Chris McEwan <mcewan@cpan.org>
LICENSE
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.