The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

XML::Handler::HTMLWriter - SAX Handler for writing HTML 4.0

SYNOPSIS

  use XML::Handler::HTMLWriter;
  my $writer = XML::Handler::HTMLWriter->new(

DESCRIPTION

This module is based on the rules for outputting HTML according to http://www.w3.org/TR/xslt - the XSLT specification. It is based on the concepts in XML::Handler::YAWriter, and the usage is the same as that module.

HTML Output Method

Here is the relevant excerpt from TR/xslt [note that a bit of an understanding of XSLT is necessary to read this, but don't worry - understanding isn't necessary to use this module :-)]:

The html output method should not output an element differently from the xml output method unless the expanded-name of the element has a null namespace URI; an element whose expanded-name has a non-null namespace URI should be output as XML. If the expanded-name of the element has a null namespace URI, but the local part of the expanded-name is not recognized as the name of an HTML element, the element should output in the same way as a non-empty, inline element such as span.

The html output method should not output an end-tag for empty elements. For HTML 4.0, the empty elements are area, base, basefont, br, col, frame, hr, img, input, isindex, link, meta and param. For example, an element written as <br/> or <br></br> in the stylesheet should be output as <br>.

The html output method should recognize the names of HTML elements regardless of case. For example, elements named br, BR or Br should all be recognized as the HTML br element and output without an end-tag.

The html output method should not perform escaping for the content of the script and style elements. For example, a literal result element written in the stylesheet as

  <script>if (a &lt; b) foo()</script>

or

  <script><![CDATA[if (a < b) foo()]]></script>

should be output as

  <script>if (a < b) foo()</script>

The html output method should not escape < characters occurring in attribute values.

If the indent attribute has the value yes, then the html output method may add or remove whitespace as it outputs the result tree, so long as it does not change how an HTML user agent would render the output. The default value is yes.

The html output method should escape non-ASCII characters in URI attribute values using the method recommended in Section B.2.1 of the HTML 4.0 Recommendation.

The html output method may output a character using a character entity reference, if one is defined for it in the version of HTML that the output method is using.

The html output method should terminate processing instructions with > rather than ?>.

The html output method should output boolean attributes (that is attributes with only a single allowed value that is equal to the name of the attribute) in minimized form. For example, a start-tag written in the stylesheet as

  <OPTION selected="selected">

should be output as

  <OPTION selected>

The html output method should not escape a & character occurring in an attribute value immediately followed by a { character (see Section B.7.1 of the HTML 4.0 Recommendation). For example, a start-tag written in the stylesheet as

  <BODY bgcolor='&amp;{{randomrbg}};'>

should be output as

  <BODY bgcolor='&{randomrbg};'>

The encoding attribute specifies the preferred encoding to be used. If there is a HEAD element, then the html output method should add a META element immediately after the start-tag of the HEAD element specifying the character encoding actually used. For example,

  <HEAD>
  <META http-equiv="Content-Type" content="text/html; charset=EUC-JP">
  ...

It is possible that the result tree will contain a character that cannot be represented in the encoding that the XSLT processor is using for output. In this case, if the character occurs in a context where HTML recognizes character references, then the character should be output as a character entity reference or decimal numeric character reference; otherwise (for example, in a script or style element or in a comment), the XSLT processor should signal an error.

If the doctype-public or doctype-system attributes are specified, then the html output method should output a document type declaration immediately before the first element. The name following <!DOCTYPE should be HTML or html. If the doctype-public attribute is specified, then the output method should output PUBLIC followed by the specified public identifier; if the doctype-system attribute is also specified, it should also output the specified system identifier following the public identifier. If the doctype-system attribute is specified but the doctype-public attribute is not specified, then the output method should output SYSTEM followed by the specified system identifier.

The media-type attribute is applicable for the html output method. The default value is text/html.

SAX1 or SAX2?

This module is designed to work with either SAX1 or SAX2. It implements a transparent layer to allow either SAX1 or SAX2 events to work.

AUTHOR

Matt Sergeant, matt@sergeant.org

SEE ALSO

XML::Handler::YAWriter, XML::Parser::PerlSAX.