NAME

XML::YYLex - Perl module for using perl-byacc with XML-data

SYNOPSIS

  use XML::YYLex;

  ## create an object of a sublass of XML::YYLex suitable for your
  ## DOM-parser:

  my $parser = XML::YYLex::create_object(
	document => $xmldom_or_sablotron_dom_object
	debug => 0,              # or 1
	ignore_empty_text => 1,  # probably what you would expect
	yydebug => \&some_func,  # defaults to a croak
	yyerror => \&other_func  # defaults to a carp
  );

  ## return the result of yyparse
  my $result = $parser->run( "ByaccPackage" );

ABSTRACT

XML::YYLex is a perl module that helps you build XML-parsers with perl-byacc (a version of Berkeley Yacc that can produce perl-code). It uses a regular DOM-parser (currently XML::DOM or XML::Sablotron::DOM) as what would normally be called a scanner (hence the name 'yylex' which is what scanner-functions are traditionally called). You can then specify grammars in byacc in which XML-tags or text-blocks appear as tokens and thus simplifies interpretation of XML-data (sometimes :).

DESCRIPTION

XML::YYLex implements an abstract base-class that can be subclassed for specific DOM-parsers. As of this writing, XML::DOM and XML::Sablotron::DOM are supported, but others might be easily added. If you want to add support for another DOM-parser, copy one of the modules XML::DOM::YYLex or XML::Sablotron::DOM::YYLex to an appropriate name and modify it to work with your DOM-parser.

XML::YYLex contains two public functions:

create_object( %args )

serves as a static factory method that creates an instance of the approptiate subclass. The possible keye for %args are

document: a reference to your DOM-document (whichever class that may be). This is used for determining which parser-specific subclass to create. This argument must be given. If you pass a single scalar to create_object instead of a hash, it is assumed to be the document.
debug: when set to a true value, produces lots of debug information, as well from the yacc-parser as from XML::YYLex itself. Defaults to false.
yydebug and yyerror: code-refs with the same purpose as in byacc itself: called with a single argument which is a warning or an error. Defaults are the functions XML::YYLex::yydebug and XML::YYLex::yyerror (see below).
ignore_empty_text: when set to a true value, emtpy text-nodes are not considered to be tokens (which reduces your grammars complexity a lot). True by default.

run( $namespace_of_parser )

which calls the byacc-generated yyparse() function with the appropriate parameters and returns it's value. $namespace_of_parser is (you won't believe it) the namespace of the parser generated by perl-byacc (actually the same string that you specified with -P on the byacc command line).

Furthermore the following functions are implemented in this package, but you will most likely never call them directly. However, knowledge of these might be necassary when subclassing XML::YYLex.

_yylex( $self, $doc ): This function implements the traversal of the DOM-tree in an order that would be the order of nodes in the XML-file (why don't we use a SAX-parser right-away? Because SAX-parsers don't implement nice objects for Nodes of the tree and their attributes like DOM-parsers do, that's why). This one's where the magic happens.
_node_to_token( $self, $node, $prefix ): This function determines the token-number for a given node. $prefix equals $XML::YYLex::PREFIX_OPENING (usually empty) for opening tags and $XML::YYLex::PREFIX_CLOSING (the underscore "_" by default). The default behaviour is to look for a symbol with the name of the node (for elements) in the namespace of your byacc-generated parser. $XML::YYLex::TOKEN_TEXT is used for text-nodes and $XML::YYLex::TOKEN_OTHER (OTHER by default) is used for unknwon tags (when no token with that name exists). For closing elements, the prefix is prepended to the tagname (i.e. _html for </html>).
yyerror( $err ) and yydebug( $err ): These are the default debug- and error-handlers respectively if no other functions are given to create_object. yyerror croaks and yydebug carps it's arguments.
new( $unblessed_hashref ): Don't call it. It serves as constructor for child-classes and must be given an almost initialized object (an $unblessed_hashref). See the code for details.

SUBCLASSING

A subclass for a specific DOM-parser needs to implement the following methods:

_xml_getDocumentElement( $dom_document ): returns the root node of $dom_document.
_xml_isTextNode( $dom_node ): returns a true value if the given node is a text-node.
_xml_isElementNode( $dom_node ): returns a true value if the given node is an element.
_xml_isDocumentNode( $dom_node ): returns a true value if the given node is the root node of it's document.

EXAMPLE

Here's a simple example. Imagine the following XML-document:

<html>
<head>
<title>this is my document's title</title>
</head>
<body>
<haystack needle="XML::YYLex is such a great module."/>
</body>
</html>

Our byacc-input might look like this:

%token html _html head _head title _title body _body haystack \
	_haystack TEXT OTHER _OTHER


%start HTMLDOCUMENT

%%

HTMLDOCUMENT:	html HEAD BODY _html;

HEAD:		head _head
	|	head TITLE _head
		;

TITLE:		title _title
	|	title TEXT title
		{ print $2->getNodeValue."\n"; }
		;

BODY:		body _body
	|	body haystack _haystack _body
		{ print $2->getAttribute( "needle" )."\n"; }

%%

The parser-definition must be turned to a perl-module Demo.pm with the command

byacc -P Demo demo.y

(assuming that the definition resides in the file demo.y).

The glue between XML-file and parser-definition is the followin perl-program

use strict;
use XML::YYLex;

my $dom_document;
if ( &you_want_to_use_XML_DOM ) {

    ## XML::DOM initialization
    my $dom_parser = new XML::DOM::Parser
    $dom_document = $dom_parser->parsefile( "foo.xml" );
} elsif( &you_want_to_use_sablotron )  {

    ## XML::Sablotron::DOM initialization
    my $sit = new XML::Sablotron::Situation;
    $dom_document = XML::Sablotron::DOM::parse( $sit, "foo.xml" );
}

my $p = &XML::YYLex::create_object( document => $dom_document );
$p->run( "Demo" );

This hopefully produces the two lines of output

this is my document's title
XML::YYLex is such a great module.

In this case you were probably better off parsing the document by hand, but in more complex cases, XML::YYLex might significantly help you.

KNOWN BUGS

Comments an processing instructions cause errors.

AUTHOR

Daniel Boesswetter, <boesswetter@peppermind.de>

COPYRIGHT AND LICENSE

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 368:: You forgot a '=back' before '=head1'

To install XML::YYLex, copy and paste the appropriate command in to your terminal.

cpanm

cpanm XML::YYLex

CPAN shell

perl -MCPAN -e shell
install XML::YYLex

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	Go to GitHub issues (only if GitHub is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)