The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Role::Markup::XML - Moo(se) role for bolt-on lazy XML markup

VERSION

Version 0.07

SYNOPSIS

    package My::MarkupEnabled;

    use Moo;                  # or Moose, or something compatible
    with 'Role::Markup::XML'; # ...and this of course

    # write some other code...

    sub something_useful {
        my $self = shift;

        # put your XML-generating data structure here
        my %spec = (
            -name      => 'my:foo',              # element name
            -content   => { -name => 'my:bar' }, # element content
            hurr       => 'durr',                # attribute
            'my:derp'  => 'lulz',                # namespaced attribute
            'xmlns:my' => 'urn:x-bogus:foo',     # namespaces go inline
        );

        # create a document object to hang on to
        my $doc  = $self->_DOC;

        # returns the last node generated, which is <my:bar/>
        my $stub = $self->_XML(
            doc  => $doc,
            spec => \%spec,
        );

        my @contents = (
            # imagine a bunch of things in here
        );

        # since these nodes will be appended to $stub, we aren't
        # interested in the output this time
        $self->_XML(
            parent => $stub,          # owner document is derived
            spec   => \@contents,     # also accepts ARRAY refs
            args   => $self->cb_args, # some useful state data
        );

        # the rest of the ops come from XML::LibXML
        return $doc->toString(1);
    }

DESCRIPTION

This is indeed yet another module for lazy XML markup generation. It exists because it is different:

  • It converses primarily in reusable, inspectable, and most importantly, inert Perl data structures,

  • It also ingests existing XML::LibXML nodes,

  • It enables you to generate markup incrementally, rather than all at once,

  • It Does the Right Thing™ around a bunch of otherwise tedious boilerplate operations, such as namespaces, XHTML, or flattening token lists in attributes,

  • It has a callback infrastructure to help you create modular templates, or otherwise override behaviour you don't like,

  • It is implemented as a Role, to be more conducive to modern Perl development.

I began by using XML::LibXML::LazyBuilder. It is pretty good, definitely preferable to typing out reams of XML::LibXML DOM-like API any time I wanted to make some (guaranteed well-formed) XML. I even submitted a patch to it to make it better. Nevertheless, I have reservations about the general approach to terse markup-generating libraries, in particular about the profligate use of anonymous subroutines. (You also see this in lxml.etree for Python, Builder::XmlMarkup for Ruby, etc.)

The main issue is that these languages aren't Lisp: it costs something at runtime to gin up a stack of nested anonymous subroutines, run them once, and then immediately throw them away. It likewise costs in legibility to have to write a bunch of imperative code to do what is essentially data declaration. It also costs in sanity to have to write function-generating-function-generating functions just to get the mess under control. What you get for your trouble is an interim product that is impossible to inspect or manipulate. This ostensibly time-saving pattern quickly hits a wall in both development, and at runtime.

The answer? Use (in this case) Perl's elementary data structures to convey the requisite information: data structures which can be built up from bits and pieces, referenced multiple times, sliced, diced, spliced, frozen, thawed, inspected, and otherwise operated on by ordinary Perl routines. Provide mix-and-match capability with vanilla XML::LibXML, callbacks, and make the whole thing an unobtrusive mix-in that you can bolt onto your existing code.

METHODS

Methods in this module are named such as to stay out of the way of your module's interface.

_DOC [$VERSION,] [$ENCODING]

Generate a document node. Shorthand for "new" in XML::LibXML::Document.

_ELEM $TAG [, $DOC, \%NSMAP ]

Generate a single XML element. Generates a new document unless $DOC is specified. Defaults to XHTML if no namespace map is provided.

_XPC [ %NS | \%NS ]

Return an XPath context with the given (optional) namespaces registered.The XHTML namespace is registered by default with the prefix html. This context will persist if called from an instance.

_XML $SPEC [, $PARENT, $DOC, \@ARGS | @ARGS ] | %PARAMS

Generate an XML tree according to the specification format. Returns the last node generated by the process. Parameters are as follows:

spec

The node specification. Strictly speaking this is optional, but there isn't much of a point of running this method if there is no spec to run it over.

doc

The XML::LibXML::Document object intended to own the contents. Optional, however it is often desirable to supply a document object along with the initial call to this method, so as not to have to fish it out later.

parent

The XML::LibXML::Element (or, redundantly, Document) object which is intended to be the parent node of the spec. Optional; defaults to the document.

replace

Suppose we're doing surgery to an existing XML document. Instead of supplying a "parent", we can supply a node in said document which we want to replace. Note that this parameter is incompatible with "parent", is meaningless for some node types (e.g. -doctype), and may fail in some contexts (e.g. when the node to be replaced is the document).

before, after

Why stop at replacing nodes? Sometimes we need to snuggle a new set of nodes up to one side or the other of a sibling at the same level. Will fail if the sibling node has no parent. Will also fail if you do things like try to add a second root element. Optional of course. Once again, all these parameters, "parent", "replace", before and after, are mutually conflicting.

args

An ARRAY reference of arguments to be passed into CODE references embedded in the spec. Optional.

Specification Format

The building blocks of the spec are, unsurprisingly, HASH and ARRAY references. The former correspond to elements and other things, while the latter correspond to lists thereof. Literals become text nodes, and blessed objects will be treated like strings, so it helps if they have a string overload. CODE references may be used just about anywhere, and will be dereferenced recursively using the supplied "args" until there is nothing left to dereference. It is up to you to keep these data structures free of cycles.

Elements

Special keys designate the name and content of an element spec. These are, unimaginitively, -name and -content. They work like so:

    { -name => 'body', -content => 'hurr' }

    # produces <body>hurr</body>

Note that -content can take any primitive: literal, HASH, ARRAY or CODE reference, XML::LibXML::Node object, etc.

Attributes

Any key is not -name or -content will be interpreted as an attribute.

    { -name => 'body', -content => 'hurr', class => 'lolwut' }

    # produces <body class="lolwut">hurr</body>

When references are values of attributes, they are flattened into strings:

    { -name => 'body', -content => 'hurr', class => [qw(one two three)] }

    # produces <body class="one two three">hurr</body>
Namespaces

If there is a colon in either the -name key value or any of the attribute keys, the processor will expect a namespace that corresponds to that prefix. These are specified exactly as one would with ordinary XML, with the use of an xmlns:foo attribute>. (Prefix-free xmlns attributes likewise work as expected.)

    { -name => 'svg',
      xmlns => 'http://www.w3.org/2000/svg',
      'xmlns:xlink' => 'http://www.w3.org/1999/xlink',
      -content => [
          { -name => 'a', 'xlink:href' => 'http://some.host/' },
      ],
    }

    # produces:
    # <svg xmlns="http://www.w3.org/2000/svg"
    #      xmlns:xlink="http://www.w3.org/1999/xlink">
    #   <a xlink:href="http://some.host/"/>
    # </svg>
Other Nodes
-pi

Processing instructions are designated by the special key -pi and accept arbitrary pseudo-attributes:

    { -pi => 'xml-stylesheet', type => 'text/xsl', href => '/my.xsl' }

    # produces <?xml-stylesheet type="text/xsl" href="/my.xsl"?>
-doctype

Document type declarations are designated by the special key -doctype and accept values for the keys public and system:

    { -doctype => 'html' }

    # produces <!DOCTYPE html>
-comment

Comments are designated by the special key -comment and whatever is in the value of that key:

    { -comment => 'hey you guyyyys' }

    # produces <!-- hey you guyyyys -->
Callbacks

Just about any part of a markup spec can be replaced by a CODE reference, which can return any single value, including another CODE reference. These are called in the context of $self, i.e., as if they were a method of the object that does the role. The "args" in the original method call form the subsequent input:

    sub callback {
        my ($self, @args) = @_;

        my %node = (-name => 'section', id => $self->generate_id);

        # ...do things to %node, presumably involving @args...

        return \%node;
    }

    sub make_xml {
        my $self = shift;

        my $doc = $self->_DOC;
        $self->_XML(
            doc  => $doc,
            spec => { -name => 'p', -content => \&callback },
        );

       return $doc;
    }

CODE references can appear in attribute values as well.

_XHTML | %PARAMS

Generate an XHTML+RDFa stub. Return the <body> and the document when called in list context, otherwise return just the <body> in scalar context (which can be used in subsequent calls to "_XML").

  my ($body, $doc) = $self->_XHTML(%p);

  # or

  my $body = $self->_XHTML(%p);

Parameters

uri

The href attribute of the <base> element.

ns

A mapping of namespace prefixes to URIs, which by default will appear as both XML namespaces and the prefix attribute.

prefix

Also a mapping of prefixes to URIs. If this is set rather than ns, then the XML namespaces will not be set. Conversely, if this parameter is defined but false, then only the contents of ns will appear in the conventional xmlns:foo way.

vocab

This will specify a default vocab attribute in the <html> element, like http://www.w3.org/1999/xhtml/vocab/.

title

This can either be a literal title string, or CODE reference, or HASH reference assumed to encompass the whole <title> element, or an ARRAY reference where the first element is the title and subsequent elements are predicates.

This can either be an ARRAY reference of ordinary markup specs, or a HASH reference where the keys are the rel attribute and the values are one or more (via ARRAY ref) URIs. In the latter form the following behaviour holds:

  • Predicates are grouped by href, folded, and sorted alphabetically.

  • <link> elements are sorted first lexically by the sorted rel, then by sorted rev, then by href.

  • A special empty "" hash key can be used to pass in another similar structure whose keys represent rev, or reverse predicates.

  • A special -about key can be used to specify another HASH reference where the keys are subjects and the values are similar structures to the one described.

  {
    # ordinary links
    'rel:prop' => [qw(urn:x-target:1 urn:x-target:2)],

    # special case for reverse links
    '' => { 'rev:prop' => 'urn:x-demo-subject:id' },

    # special case for alternate subject
    -about => {
      'urn:x-demo-subject:id' => { 'some:property' => 'urn:x-target' } },
  }

The ARRAY reference form is passed along as-is.

meta

Behaves similarly to the link parameter, with the following exceptions:

  • No "" or -about pseudo-keys, as they are meaningless for literals.

  • Literal values can be expressed as an ARRAY reference of the form [$val, $lang, $type] with either the second or third element undef. They may also be represented as a HASH reference where the keys are the language (denoted by a leading @) or datatype (everything else), and the values are the literal values.

  {
    'prop:id' => ['foo', [2.3, undef, 'xsd:decimal']],
    'exotic'  => { '@en' => ['yo dawg', 'derp'] }
  }

This is an optional ARRAY reference of <head> elements that are neither <link> nor <meta> (or, if you want, additional unmolested <link> and <meta> elements).

attr

These attributes (including -content) will be passed into the <body> element.

content

This parameter enables us to isolate the <body> content without additional attributes.

Note that setting this parameter will cause the method to return the innermost, last node that is specified, rather than the <body>.

transform

This is the URI of a (e.g. XSLT) transform which will be included in a processing instruction if supplied.

args

Same as args in "_XML".

AUTHOR

Dorian Taylor, <dorian at cpan.org>

BUGS

Please report any bugs or feature requests to bug-role-markup-xml at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Role-Markup-XML. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc Role::Markup::XML

You can also look for information at:

SEE ALSO

LICENSE AND COPYRIGHT

Copyright 2016 Dorian Taylor.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.