The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

ODF::MailMerge - "Mail Merge" or just substitute tokens in ODF documents

SYNOPSIS

 use ODF::lpOD;
 use ODF::lpOD_Helper;
 use ODF::MailMerge qw/replace_tokens MM_SUBST/;

 my $doc = odf_get_document("/path/to/file.odt");
 my $body = $doc->get_body;

 # Simple replacement of '{who}', '{last words}' and '{zzz}'
 # everywhere in the document.
 my $hash = {
   who => "John Brown",
   'last words' => [
      [color => "#50FFEE", "bold"],
      "I deny everything but...the design on my part to free the slaves."
   ],
   zzz => \&callback,
 };
 replace_tokens($body, $hash);

 # Mail-merge:
 #   1. Find the prototype table containing the token "{mmproto}".
 #   2. Replace tokens in that table using data from a spreadsheet,
 #      replicating the table as many times as necessary for all rows.
 #
 my $engine = ODF::MailMerge::Engine->new(context => $body,
                                          proto_tag => "{mmproto}");

 use Spreadsheet::Edit qw/read_spreadsheet apply %crow/;
 read_spreadsheet "/path/to/data.xlsx!Sheet1";
 apply {
   $engine->add_record(\%crow);  # %crow is a tied hash to current row
 };
 $engine->finish();

 $doc->save(target => "/path/to/output.odt");

DESCRIPTION

This tool uses ODF::lpOD / ODF::lpOD_Helper to patch ODF documents. Token strings of the form "{key}" or "{key:modifiers...}" are replaced with values from a hash indexed by "key".

Optional :modifiers within tokens can change the substituted value or have side-effects such as removing lines when the substituted value is empty.

A "mail merge" function replicates a template object (e.g. table or section) as many times as needed to plug in values from multiple data records.

THE PARADIGM

First, manually create a prototype ODF document using e.g. LibreOffice, containing static content and {tokens} to be interpolated, formatted as desired. To use "mail merge", create a table or other ODF construct which represents a single entry or record, with {token}s where data values should be plugged in.

Substituted values will have the same formatting as the tokens which were replaced. This is quite powerful.

For example, to generate a multi-column "member directory", create a prototype table with tokens like {Name}, {Address}, etc. using any desired styles; place that table in a Section with the desired number of columns.

When processed, the table will be cloned and appended within it's Section, flowing into successive columns and new pages as needed. The prototype table's properties can be set to prevent breaking entries at column/page boundaries, and control borders, inter-entry spacing, etc.

If a {token} is not all the same format, the substituted value will use the format of the first character (i.e. the {).

SIMPLE SUBSTITUTION

$count = replace_tokens($context, $hash);

This function replaces tokens without using the mail-merge mechanism.

$context is the document body or any descendant; $hash maps token names to replacement values as described at "TOKEN REPLACEMENT".

All instances of tokens in $context are replaced if their names exist in %$hash. Token names not in %$hash are left as-is unless the hash contains a '*' wildcard entry.

MAIL MERGE OVERVIEW

  1. A template of some kind specifies how to display data from one database record, with db field references where values should be plugged in. That template is copied as many times as there are database records, plugging in specific values from each reacord.

  2. Some fields may have empty ("") values in a particular record, in which case the containing row, paragraph etc. can be deleted to avoid leaving undesirable blank space. For example a mailing list may allow a secondary addressee line which is not always needed.

  3. Fields may have *multiple* values. In that case the containing row, paragraph, or frame can be replicated to accommodate extra values for the same field. For example a personnel directory may allow each person to have several telephone numbers.

MAIL MERGE API

It does not matter where the data comes from, as long as you can provide a hash table which maps token names to values for a particular record.

The example in the SYNOPSIS reads a spreadsheet using Spreadsheet::Edit, which provides just such a hash via the tied variable "%crow" (current row); this hash maps column titles (among other things) to data values in the row being visited by 'apply'. Therefore tokens {Name} and {Address} would be replaced by appropriate values from the "Name" and "Address" columns.

$engine = ODF::MailMerge::Engine->new(context=>$body, proto_tag=>"{tag}");

$engine = ODF::MailMerge::Engine->new(proto_elt => $elt);

Create a new mail-merge engine which will replicate the indicated protototype element. Currently only table prototypes are supported, but but sections and other ODF text wrappers may be supported later.

In the first form, the string "{tag}" is searched for within $context (e.g. the document body), and the containing Table is used as the prototype element. The tag string may be contained anywhere in the table, and will be deleted (so it has no effect on the final result).

In the second form, the prototype node is specified directly.

$engine->add_record($hashref);

The prototype object is first cloned and appended to any previous copies.

Then all {key} or {key:modifier...} strings in the clone are replaced by looking up "key" in the specified hash as described at "TOKEN REPLACEMENT" below. An exception occurs if an unhandled token is found.

$engine->finish();

This must be called after the last add_record to clean up. It deletes the prototype, leaving behind only the clones with instantiated values.

TOKEN REPLACEMENT

In the hash you provide, keys are token names without the curly brackets or :modifiers. For example, the key "First Name" would be used for token "{First Name}" or "{First Name:...}" .

The hash key '*' is a wildcard, used if there is no entry for a token name.

Tokens may contain internal spaces but leading and trailing spaces around the token name are ignored. Literal : { or } characters must be backslashed i.e. \: \{ or \}.

A hash value may be:

  * "string"                      - a replacement value string
  * [[Style info], "string", ...] - a Styled content value
  * [list of possibly-multiple replacement values]
  * CODE ref                      - a callback (see "CALLBACKS")

[Styled content] values

See ODF::lpOD_Helper for details. In brief, these are refs to arrays containing [style spec] sub-arrays and plain strings, where a [style spec] describes a local style to be applied to the immediately following text string. As used here, the first item must be a [style spec] sub-array.

For example [[color => "red", "bold"], "John Brown"] means substitute "John Brown" in red, bold text, overriding the style of the {token}. Multiple pairs describe adjacent but differently-styled segments.

Styled values are not needed unless you must override the original style of the {token}.

Token :modifiers

:modifiers appended to a token name change the replacement value or have other effects. For example {Address:nb} would be replaced by the value given by $hash->{Address} with all regular spaces replaced by non-breaking spaces.

The standard :modifiers are

  :nb         - Convert spaces to non-breaking

  :unfold     - Convert embedded newlines to spaces

  :breakmulti - Append newline if the value contains embedded newlines.

  :spand      - (only in a table cell) Span the cell down over cells
                below which are empty. To be useful, the cell should have
                Format->align text->Center so it can float.

  :spanr        (only in a table cell) Span the cell rightward over cells
                which are empty.

  :die        - Delete If Empty - the containing row, etc. is deleted
                if all tokens with :die are empty ("") after substitution.

  :rmsb       - Remove shared borders between replicated rows

  :rep_first, etc.  - See below.  Allows advanced control when rows, etc.
                are being replicated to accommodate a multi-valued token.

Eliding Empty Lines (:die)

This modifier deletes the containing row (frame, paragraph, etc.) if all tokens in the row with the :die modifier have an empty value ("").

Note that the row is deleted even if other tokens without :die exist in the row and have non-empty values.

Multi-value tokens

If a token has multiple values, then the containing row, frame or paragraph is replcated.

NOTE: Where the following documentation refers to replicating "rows" it means the appropriate ODF object type.

Replicating rows with more than one token

A row is replicated enough times for the token with the most values. Tokens which have fewer values are instantiated in the initial rows and empty values ("") substituted in later rows. For example, given

  ┌──────────────┬────────────────┬───────────────────────┐
  │{Name}        │ {Phone}        │ {Email}               │
  └──────────────┴────────────────┴───────────────────────┘

if the {Phone} token had four values and {Email} had two, the result would be four copies of the row, looking like this:

  ┌──────────────┬────────────────┬───────────────────────┐
  │John Hancock  │ (415) 555-1212 │ j.hancock@gmail.com   │
  ├──────────────┼────────────────┼───────────────────────┤
  │              │ (650) 555-1212 │ j.hancock@hotmail.com │
  ├──────────────┼────────────────┼───────────────────────┤
  │              │ (800) 555-1212 │                       │
  ├──────────────┼────────────────┼───────────────────────┤
  │              │ (900) 888-7777 │                       │
  └──────────────┴────────────────┴───────────────────────┘

A :rmsb modifier causes shared borders between replicated rows to be eliminated in the current cell.

In the example above, the same template row was instantiated for every replicate. Alternatively a set of template rows may be used where the appropriate template is instantiated in each position of the set (first, last, etc.).

The following eliminates shared row borders similar to what :rmsb would do, but using conditional templates:

  ┌──────────────┬────────────────┬───────────────────────┐
  │{Name}        │ {Phone}        │ {Email}               │
  └──────────────┴────────────────┴───────────────────────┘
  ┌──────────────┬────────────────┬───────────────────────┐
  │{Name}        │ {Phone}        │ {Email:rep_first}     │
  ╵              ╵                ╵                       ╵
  ╷              ╷                ╷                       ╷
  │{Name}        │ {Phone}        │ {Email:rep_mid}       │
  ╵              ╵                ╵                       ╵
  ╷              ╷                ╷                       ╷
  │{Name}        │ {Phone}        │ {Email:rep_last}      │
  └──────────────┴────────────────┴───────────────────────┘
  (extra space between rows just for illustration)

would produce

  ┌──────────────┬────────────────┬───────────────────────┐
  │John Hancock  │ (415) 555-1212 │ j.hancock@gmail.com   │
  │              │ (650) 555-1212 │ j.hancock@hotmail.com │
  │              │ (800) 555-1212 │                       │
  │              │ (900) 888-7777 │                       │
  └──────────────┴────────────────┴───────────────────────┘

The :rep_first modifier indicated that that template without bottom borders should be used for the first row in a multi-value replication set, etc. :rep* conditions must be mutually exclusive.

The first template in a set may be a "regular" template row without conditions, as in the example above. it is used only when none of the conditional templates apply. In the above example that was never because the conditional templates coverd every situation; however the "regular" template would be used if there was only one "replicate", i.e. all {token}s had only a single value.

:rep=EXPR is a general conditional. EXPR is a Perl expression using variables $i and $N, and which evaluates to true when the template should be instantiated. $i will hold the current replicate index (first is zero), and $N the total number of rows in the replication set.

  "Friendly" conditional     Equivalent
       :rep_first              :rep= $i==0 && $N > 1
       :rep_middle             :rep= $i > 0 && $i < $N-1
       :rep_last               :rep= $i==$N-1 && $N > 1
       :rep_only               :rep= $N==1

The above example is not very compelling because the :rmsb modifier provides built-in support for removing shared borders. Here is a more interesting example which formats odd & even replicates distinctively (such as different background colors):

  ┌─────────────────────────────────────────────────────────────┐
  │EVEN (first)   {Token Name:rep_first}                        │
  ╵                                                             ╵
  ╷                                                             ╷
  │EVEN (middle)  {Token Name:rep=$i>0 && $i<$N-1 && ($i%2)==0} │
  ╵                                                             ╵
  ╷                                                             ╷
  │ODD (middle)   {Token Name:rep=$i>0 && $i<$N-1 && ($i%2)==1} │
  ╵                                                             ╵
  ╷                                                             ╷
  │EVEN (last)    {Token Name:rep=$i == $N-1 && ($i % 2)==0}    │
  └─────────────────────────────────────────────────────────────┘
  ╷                                                             ╷
  │ODD (last)     {Token Name:rep=$i == $N-1 && ($i % 2)==1}    │
  └─────────────────────────────────────────────────────────────┘

CALLBACKS

If a hash value is a reference to a sub, the sub is called with args

  ($token_name, $token, $para, $custom_mods)

$token_name is the just the name e.g. "foo" in "{foo:modifiers...}".

$token is the complete "{tokenname...}" string being replaced

$para is the paragraph containing the token.

$custom_mods is a ref to an array of unrecognized :modifier strings (excluding the ':') found in the token. It is up to your code to do what it wants with them. Note: An exception occurs if unrecognized :modifiers are encountered when a callback is not being used.

The callback's return values indicate whether and how to replace the token:

  return(MM_SUBST, <value>)

<value> may be any of the allowed hash values (except for a callback). If a [list of values] is returned and there is actually more than one value, then the containing row will be replicated as described a "Multi-value tokens".

  return(0)

The token is not replaced, but left as-is, and processing continues. This only makes sense if the token will somehow be processed later, for example via a separate call to replace_tokens.

COMPLETE EXAMPLE

A complete example application is included in the distribution. To display the path on your system, run

  perl -MODF::MailMerge=:all -C -E 'say odfmm_example_path'

  # .../site_perl/5.xx.yy/auto/share/dist/ODF-MailMerge/examples/

MISCELLANEOUS

$token_re

A regular expression which matches any {token}. Named captures are set:

  $+{tokname}   # Just the token name, sans leading/trailing spaces or tabs
  $+{mods}      # :mod1:mod2:...

The captured strings will still contain any backslash escapes.

($tokname, $std_mods, $custom_mods) = parse_token("{name:mods...}")

Parses a complete {token}, returning its components with backslash escapes resolved (i.e. the escaping backslash removed). $std_mods and $custom_mods will be refs to arrays containing any :modifiers found without their leading ':'s.

SEE ALSO

ODF::lpOD_Helper

Sreadsheet::Edit

The command-line tool odfedit (which comes with ODF::MailMerge) provides access to many features of ODF::MailMerge without writing Perl code.

AUTHOR

Jim Avera (jim.avera at gmail)

LICENSE

CC0 1.0 / Public Domain. However this requires ODF::lpOD to function so as a practical matter you must comply with ODF::lpOD's license.