The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

TUWF::XML - Easy XML and HTML generation with Perl

DESCRIPTION

This module provides an easy and simple method for generating XML (and with that, HTML) documents in Perl. Unlike most other TUWF modules, this one can be used separately, outside of the TUWF framework.

The goal of this module is to make HTML generation easier, it is certainly not a goal to abstract HTML generation behind generalized functions and objects. Nor is it a goal to ensure the correctness of the generated HTML, that remains the responsibility of the programmer (although this module can certainly help). You will still be writing HTML yourself, the only difference is that you use a more convenient syntax and you won't have to manually escape everything you output.

The primary aim of this module was to generate XHTML and HTML5, and since both can be expressed in proper XML, extending it to write XML was a small step. In fact, this module is basically an XML generator with convenience functions for HTML.

This module provides two interfaces: a function interface and an object interface. Both can be used, even at the same time. The object interface is required in threaded environments or when you want to generate multiple documents simultaneously, while the function interface is far more convenient, but has some limitations and contributes to namespace pollution.

The function interface looks like this:

  use TUWF::XML ':html5';
  
  TUWF::XML->new(default => 1);
  html sub {
    head sub {
      title 'Document title!';
    };
  };
  
  # -- or, in more imperative style:
  html;
    head;
      title 'Document title!';
    end;
  end 'html';

And the equivalent, using the object interface:

  # not required when used within a TUWF website
  use TUWF::XML;
  
  my $xml = TUWF::XML->new();
  $xml->html(sub {
    $xml->head(sub {
      $xml->title('Document title!');
    });
  });
  
  # -- or, again in more imperative style:
  $xml->html;
    $xml->head;
      $xml->title('Document title!');
    $xml->end;
  $xml->end('html');

You may also combine the two interfaces by setting the default option in new() and mixing method calls and function calls, but that is rather inconsistent and messy.

TUWF automatically calls TUWF::XML->new(default => 1, ..) at the start of each request, so you can start generating XML or HTML using the function interface without having to initialize this module. Of course, if you wish to generate an other XML document while processing a request, you should use the object interface for that, otherwise this may cause problems with other functions within the TUWF framework that assume that the default TUWF::XML object has been set to output to TUWF.

FUNCTION-ONLY FUNCTIONS

new(options)

Creates a new XML generator object, accepts the following options:

default

0/1. When set to a true value, the newly created object will be used as the default object: Any regular function call (that is, without an object) to any of the functions listed in METHODS & FUNCTIONS will act as if they were called on this object. Until a new object is created with the default option set, in which case the default object will be overwritten again. Default: 0.

write

Should contain a subroutine reference that accepts a string as argument. This subroutine will be called whenever there is data to output. If this option is not specified, a default function that writes to stdout is used.

pretty

Set to a positive integer to pretty-print the generated XML, set to 0 to disable pretty-printing. The integer indicates the number of spaces to use for each new level of indentation. It is recommended to have pretty-printing disabled when generating HTML, since white-space around HTML elements tends to have significance when being rendered, and with pretty-printing you will lose the control on where to (not) insert whitespace. Default: 0 (disabled).

mkclass(%classes)

Dynamically constructs a class attribute, which can be passed to tag() and friends. This function accepts a hash where the keys are the class names and the value indicates whether the class is enabled or not. This function returns an empty list if none of the classes are enabled, or returns (class => $list_of_enabled_classes).

This is convenient when the classes are dependant on other variables, e.g.:

  tag 'div', mkclass(hidden => $is_hidden, warning => $is_warning), 'Text';
  # Output:
  #  !$is_hidden && !$is_warning:  <div>Text</div>
  #  $is_hidden && !$is_warning:   <div class="hidden">Text</div>
  #  $is_hidden && $is_warning:    <div class="hidden warning">Text</div>

Note that the order in which classes are returned may be somewhat random.

xml_escape(string)

Returns the XML-escaped string. The characters &, <, and " will be replaced with their XML entity.

html_escape(string)

Does the same as xml_escape(), but also replaced newlines with <br /> tags.

METHODS & FUNCTIONS

lit(string)

Output the given string literally, without modification or escaping. This is equivalent to just calling the write subroutine passed to new().

txt(string)

XML-escape the string and then output it, equivalent to lit(xml_escape $string).

xml()

Writes the following XML header:

  <?xml version="1.0" encoding="UTF-8"?>

Since this function does not open a tag, it does not have to be end()'ed.

html(options)

Writes an (X)HTML doctype and opens an <html> tag. Accepts the following options:

doctype

Specify the doctype to use. Can be one of the following:

  xhtml1-strict xhtml1-transitional xhtml1-frameset
  xhtml11 xhtml-basic11 xhtml-math-svg html5

These refer to the doctypes found at http://www.w3.org/QA/2002/04/valid-dtd-list.html. Default: html5.

lang

Specifies the (human) language of the generated content. This will generate a lang (and xml:lang for XHTML) attribute for the html open tag.

anything else

All other arguments are passed to tag().

If you don't pass a contents argument to this function, you should take care to close the <html> tag with an end().

tag(name, attribute => value, .., contents)

Generates an XML tag or element. The first argument is the name of the tag, attributes can be specified after that with key/value pairs and finally the contents can be specified. If the contents argument is not present, an open tag will be generated, which should be closed later on using end(). If contents is present but undef, the generated tag will be self-closing, i.e. it will end with a /> instead of a regular >. If contents is a scalar, it will be used as the contents of the tag, after which the tag will be closed with a closing tag (</tagname>). If contents is a CODE reference, the subroutine will be called in between the start tag and the closing tag.

The tag name and attribute names are outputted as-is, after some very basic validation. The attribute values and contents are passed through xml_escape().

Some example function calls and their output:

  tag('items');
  # <items>
  end();
  # </items>
  
  tag('link', href => '/', undef);
  # <link href="/" />
  
  tag('a', href => '/?f&c', title => 'Homepage', 'link');
  # <a href="/f&amp;c" title="Homepage">link</a>
  
  tag('summary', type => 'html', 'I can write in <b>bold</b>');
  # <summary type="html">I can write in &lt;b&gt;bold&lt;/b&gt;</summary>
  
  tag qw{content type xhtml xml:base http://example.com/ xml:lang en}, $content;
  # is equivalent to:
  lit '<content type="xhtml" xml:base="http://example.com/" xml:lang="en">';
  txt $content;
  lit '</content>';
  # except tag() can do pretty-printing when requested

  tag 'div', sub {
    tag 'a', href => '/', 'Home';
  };
  # <div><a href="/">Home</a></div>

end(name)

Closes the last tag opened by tag() or html(). The name argument is optional, when given, it will be used as validation. If the given name does not equal the last opened tag, an error is thrown.

Usage of this function is discouraged, as it may not be easy keep track of which end() belongs to which tag(). An easier and more functional approach is to not use end() at all, and instead give a CODE reference to tag(). For example:

  tag 'body';
    tag 'b', 'text';
  end;

Is more safely written as:

  tag 'body', sub {
    tag 'b', 'text';
  };

<html-tag>(attribute => value, .., contents)

For convenience, all HTML5 and XHTML 1.0 tags have their own function that acts as a shorthand for calling tag(). For the function naming flavors, see "IMPORT OPTIONS" below.

Some tags are boolean, meaning that they should always be self-closing and not have any contents. To generate these tags with tag(), you have to specify undef as the contents argument. This is not required when using these convenience functions, the undef argument is automatically added for the following tags:

  area base br col command embed hr img input link meta param source

Again, some examples:

  br;  # tag 'br', undef;
  div; # tag 'div';

  title 'Page title';
  # tag 'title', 'Page title';

  Link rel => 'shortcut icon', href => '/favicon.ico';
  # tag 'link', rel => 'shortcut icon', href => '/favicon.ico', undef;

  textarea rows => 10, cols => 50, $content;
  # tag 'textarea', rows => 10, cols => 50, $content;

IMPORT OPTIONS

By default, TUWF::XML does not export anything. You can import any specific function (except new()) by specifying it on the use line:

  use TUWF::XML 'lit', 'html_escape', 'br';

  # after which you can call those functions as follows:
  lit html_escape $content;
  br;

Or you can import an entire group of functions by adding the :xml group or any of the HTML flavors to the list:

:xml

This group exports the functions xml(), lit(), txt(), tag(), and end(). All lower-case.

:html

This group exports the following functions:

  tag html lit txt end

And the following XHTML 1.0 functions:

  a abbr acronym address area b base bdo big blockquote body br button caption
  cite code col colgroup dd del dfn div dl dt em fieldset form h1 h2 h3 h4 h5
  h6 head i img input ins kbd label legend li Link Map meta noscript object ol
  optgroup option p param pre Q samp script Select small span strong style Sub
  sup table tbody td textarea tfoot th thead title Tr tt ul var

Note that some functions start with an upper-case character. This is to avoid problems with reserved keywords or overriding Perl core functions with the same name.

:html5

Same as :html, except that instead of XHTML 1.0, this exports all HTML 5 functions:

  a abbr address area article aside audio b base bb bdo blockquote body br
  button canvas caption cite code col colgroup command datagrid datalist dd
  del details dfn dialog div dl dt em embed fieldset figure footer form h1 h2
  h3 h4 h5 h6 head header hr i iframe img input ins kbd label legend li Link
  main Map mark meta meter nav noscript object ol optgroup option output p
  param pre progress Q rp rt ruby samp script section Select small source
  span strong style Sub sup table tbody td textarea tfoot th thead Time title
  Tr ul var video
:Html and :Html5

These are equivalent to :html and html5, respectively, except that all functions start with an upper case character for consistency. This flavor looks like:

  use TUWF:XML ':Html5';
  Html sub {
    Head sub {
      Title 'Document title!';
      Tag 'a', href => '/', 'Home';
      Lit '&nbsp;';
    };
  };
:html_ and :html5_

These are equivalent to :html and html5, respectively, except that all functions are lower-case and are suffixed with an underscore. This flavor is similar to Haskell's Lucid:

  use TUWF::XML ':html5_';
  html_ sub {
    head_ sub {
      title_ 'Document title!';
      tag_ 'a', href => '/', 'Home';
      lit_ '&nbsp;';
    };
  };

When using this module in a TUWF website, you can substitute TUWF::XML with TUWF. The main TUWF module will then redirect its import arguments to this module. This saves some typing, and allows you to import functions from other TUWF modules on the same use line.

SEE ALSO

TUWF.

This module was inspired by XML::Writer, which is more powerful but less convenient.

There's also DSL::HTML, a slightly more featureful, heavyweight and opinionated HTML-templating-inside-Perl module, based on HTML::Tree.

And there's HTML::Declare, which is conceptually simpler than both this and DSL::HTML, but its syntax isn't quite as nice.

And there's also HTML::FromArrayref, HTML::Tiny, HTML::Untidy and many more modules on CPAN. In fact I don't know why you should use this module instead of whatever is available on CPAN.

COPYRIGHT

Copyright (c) 2008-2019 Yoran Heling.

This module is part of the TUWF framework and is free software available under the liberal MIT license. See the COPYING file in the TUWF distribution for the details.

AUTHOR

Yoran Heling <projects@yorhel.nl>