The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

MsOffice::Word::Template - generate Microsoft Word documents from Word templates

SYNOPSIS

  my $template = MsOffice::Word::Template->new($filename);
  my $new_doc  = $template->process(\%data);
  $new_doc->save_as($path_for_new_doc);

DESCRIPTION

Purpose

This module treats a Microsoft Word document as a template for generating other documents. The idea is similar to the "mail merge" functionality in Word, but with much richer possibilities, because the whole power of a Perl templating engine can be exploited, for example for

  • dealing with complex, nested datastructures

  • using control directives for loops, conditionals, subroutines, etc.

Template authors just have to use the highlighing function in MsWord to mark the templating directives :

  • fragments highlighted in yelllow are interpreted as data directives, i.e. the template result will be inserted at that point in the document, keeping the current formatting properties (bold, italic, font, etc.).

  • fragments highlighted in green are interpreted as control directives that do not directly generate content, like loops, conditionals, etc. Paragraphs or table rows around such directives are dismissed, in order to avoid empty paragraphs or empty rows in the resulting document.

The syntax of data and control directives depends on the backend templating engine. The default engine is the Perl Template Toolkit; other engines can be specified through parameters to the "new" method -- see the "TEMPLATE ENGINE" section below.

Status

This first release is a proof of concept. Some simple templates have been successfully tried; however it is likely that a number of improvements will have to be made before this system can be used at large scale in production. If you use this module, please keep me informed of your difficulties, tricks, suggestions, etc.

METHODS

new

  my $template = MsOffice::Word::Template->new($docx);
  # or : my $template = MsOffice::Word::Template->new($surgeon);   # an instance of MsOffice::Word::Surgeon
  # or : my $template = MsOffice::Word::Template->new(docx => $docx, %options);

In its simplest form, the constructor takes a single argument which is either a string (path to a docx document), or an instance of MsOffice::Word::Surgeon. Otherwise the constructor takes a list of named parameters, which can be

docx

path to a MsWord document in docx format. This will automatically create an instance of MsOffice::Word::Surgeon and pass it to the constructor through the surgeon keyword.

surgeon

an instance of MsOffice::Word::Surgeon. This is a mandatory parameter, either directly through the surgeon keyword, or indirectly through the docx keyword.

data_color

the Word highlight color for marking data directives (default : yellow)

control_color

the Word highlight color for marking control directives (default : green). Such directives should produce no content. They are treated outside of the regular text flow.

In addition to the attributes above, other attributes can be passed to the constructor for specifying a templating engine different from the default Perl Template Toolkit. These are described in section "TEMPLATE ENGINE" below.

process

  my $new_doc = $template->process(\%data);
  $new_doc->save_as($path_for_new_doc);

Process the template on a given data tree, and return a new document (actually, a new instance of MsOffice::Word::Surgeon). That document can then be saved using "save_as" in MsOffice::Word::Surgeon.

AUTHORING TEMPLATES

A template is just a regular Word document, in which the highlighted fragments represent templating directives.

The data directives, i.e. the "holes" to be filled must be highlighted in yellow. Such zones must contain the names of variables to fill the holes. If the template engine supports it, names of variables can be paths into a complex datastructure, with dots separating the levels, like foo.3.bar.-1 -- see "GET" in Template::Manual::Directive and Template::Manual::Variables if you are using the Template Toolkit.

Control directives like IF, FOREACH, etc. must be highlighted in green. When seeing a green zone, the system will remove XML markup for the surrounding text and run nodes. If the directive is the only content of the paragraph, then the paragraph node is also removed. If this occurs within the first cell of a table row, the markup for that row is also removed. This mechanism ensures that the final result will not contain empty paragraphs or empty rows at places corresponding to control directives.

In consequence of this distinction between yellow and green highlights, templating zones cannot mix data directives with control directives : a data directive within a green zone would generate output outside of the regular XML flow (paragraph nodes, run nodes and text nodes), and therefore MsWord would generate an error when trying to open such content. There is a workaround, however : data directives within a green zone will work if they also generate the appropriate markup for paragraph nodes, run nodes and text nodes; but in that case you must also apply the "none" filter from Template::AutoFilter so that angle brackets in XML markup do not get translated into HTML entities.

TEMPLATE ENGINE

This module invokes a backend templating engine for interpreting the template directives. In order to use an engine different from the default Template Toolkit, you must supply the following parameters to the "new" method :

start_tag

The string for identifying the start of a template directive

end_tag

The string for identifying the end of a template directive

engine

A reference to a method that will perform the templating operation (explained below)

engine_args

An optional list of parameters that may be used by the engine

Given a datatree in $vars, the engine will be called as :

  my $engine  = $self->engine;
  my $new_XML = $self->$engine($vars);

It is up to the engine method to exploit $self->engine_args if needed.

If the engine is called repetively, it may need to store some data to be persistent between two calls, like for example a compiled version of the parsed template. To this end, there is an internal hashref attribute called engine_stash. If necessary the stash can be cleared through the clear_stash method.

Here is an example using Template::Mustache :

  my $template = MsOffice::Word::Template->new(
    docx      => $template_file,
    start_tag => "{{",
    end_tag   => "}}",
    engine    => sub {
      my ($self, $vars) = @_;

      # at the first invocation, create a Mustache compiled template and store it in the stash.
      # Further invocations will just reuse the object in stash.
      my $stash            = $self->{engine_stash} //= {};
      $stash->{mustache} //= Template::Mustache->new(
        template => $self->{template_text},
        @{$self->engine_args},   # for ex. partials, partial_path, context
                                 # -- see L<Template::Mustache> documentation
       );

      # generate new XML by invoking the template on $vars
      my $new_XML = $stash->{mustache}->render($vars);

      return $new_XML;
      },
   );

The engine must make sure that ampersand characters and angle brackets are automatically replaced by the corresponding HTML entities (otherwise the resulting XML would be incorrect and could not be opened by Microsoft Word). The Mustache engine does this automatically. The Template Toolkit would normally require to explicitly add an html filter at each directive :

  [% foo.bar | html %]

but thanks to the Template::AutoFilter module, this is performed automatically.

AUTHORING NOTES SPECIFIC TO THE TEMPLATE TOOLKIT

This chapter just gives a few hints for authoring Word templates with the Template Toolkit.

The examples below use [[double square brackets]] to indicate segments that should be highlighted in green within the Word template.

Bookmarks

The template processor is instantiated with a predefined wrapper named bookmark for generating Word bookmarks. Here is an example:

  Here is a paragraph with [[WRAPPER bookmark name="my_bookmark"]]bookmarked text[[END]].

The name argument is automatically truncated to 40 characters, and non-alphanumeric characters are replaced by underscores, in order to comply with the limitations imposed by Word for bookmark names.

Similarly, there is a predefined wrapper named link_to_bookmark for generating hyperlinks to bookmarks. Here is an example:

  Click [[WRAPPER link_to_bookmark name="my_bookmark" tooltip="tip top"]]here[[END]].

The tooltip argument is optional.

Word fields

A predefined block field generates XML markup for Word fields, like for example :

  Today is [[PROCESS field code="DATE \\@ \"h:mm am/pm, dddd, MMMM d\""]]

Beware that quotes or backslashes must be escaped so that the Template Toolkit parser does not interpret these characters.

The list of Word field codes is documented at https://support.microsoft.com/en-us/office/list-of-field-codes-in-word-1ad6d91a-55a7-4a8d-b535-cf7888659a51.

When used as a wrapper, the field block generates a Word field with alternative text content, displayed before the field gets updated. For example :

  [[WRAPPER field code="TOC \o \"1-3\" \h \z \u"]]Table of contents – press F9 to update[[END]]

TROUBLESHOOTING

If the document generated by this module cannot open in Word, it is probably because the XML generated by your template is not equilibrated and therefore not valid. For example a template like this :

  This paragraph [[ IF condition ]]
     may have problems
  [[END]]

is likely to generate incorrect XML, because the IF statement starts in the middle of a paragraph and closes at a different paragraph -- therefore when the condition evaluates to false, the XML tag for closing the initial paragraph will be missing.

Compound directives like IF .. END, FOREACH .. END, TRY .. CATCH .. END should therefore be equilibrated, either all within the same paragraph, or each directive on a separate paragraph. Examples like this should be successful :

  This paragraph [[ IF condition ]]has an optional part[[ ELSE ]]or an alternative[[ END ]].
  
  [[ SWITCH result ]]
  [[ CASE 123 ]]
     Not a big deal.
  [[ CASE 789 ]]
     You won the lottery.
  [[ END ]]

AUTHOR

Laurent Dami, <dami AT cpan DOT org<gt>

COPYRIGHT AND LICENSE

Copyright 2020-2022 by Laurent Dami.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.