NAME
XML::YYLex - Perl module for using perl-byacc with XML-data
SYNOPSIS
use XML::YYLex;
## create an object of a sublass of XML::YYLex suitable for your
## DOM-parser:
my $parser = XML::YYLex::create_object(
document => $xmldom_or_sablotron_dom_object
debug => 0, # or 1
ignore_empty_text => 1, # probably what you would expect
yydebug => \&some_func, # defaults to a croak
yyerror => \&other_func # defaults to a carp
);
## return the result of yyparse
my $result = $parser->run( "ByaccPackage" );
ABSTRACT
XML::YYLex is a perl module that helps you build XML-parsers with perl-byacc (a version of Berkeley Yacc that can produce perl-code). It uses a regular DOM-parser (currently XML::DOM or XML::Sablotron::DOM) as what would normally be called a scanner (hence the name 'yylex' which is what scanner-functions are traditionally called). You can then specify grammars in byacc in which XML-tags or text-blocks appear as tokens and thus simplifies interpretation of XML-data (sometimes :).
DESCRIPTION
XML::YYLex implements an abstract base-class that can be subclassed for specific DOM-parsers. As of this writing, XML::DOM and XML::Sablotron::DOM are supported, but others might be easily added. If you want to add support for another DOM-parser, copy one of the modules XML::DOM::YYLex or XML::Sablotron::DOM::YYLex to an appropriate name and modify it to work with your DOM-parser.
XML::YYLex contains two public functions:
create_object( %args )-
serves as a static factory method that creates an instance of the approptiate subclass. The possible keye for %args are
document-
a reference to your DOM-document (whichever class that may be). This is used for determining which parser-specific subclass to create. This argument must be given. If you pass a single scalar to
create_objectinstead of a hash, it is assumed to be thedocument. debug-
when set to a true value, produces lots of debug information, as well from the yacc-parser as from
XML::YYLexitself. Defaults to false. yydebugandyyerror-
code-refs with the same purpose as in byacc itself: called with a single argument which is a warning or an error. Defaults are the functions
XML::YYLex::yydebugandXML::YYLex::yyerror(see below). ignore_empty_text-
when set to a true value, emtpy text-nodes are not considered to be tokens (which reduces your grammars complexity a lot). True by default.
run( $namespace_of_parser )-
which calls the byacc-generated
yyparse() function with the appropriate parameters and returns it's value.$namespace_of_parseris (you won't believe it) the namespace of the parser generated by perl-byacc (actually the same string that you specified with-Pon the byacc command line).
Furthermore the following functions are implemented in this package, but you will most likely never call them directly. However, knowledge of these might be necassary when subclassing XML::YYLex.
_yylex( $self, $doc )-
This function implements the traversal of the DOM-tree in an order that would be the order of nodes in the XML-file (why don't we use a SAX-parser right-away? Because SAX-parsers don't implement nice objects for Nodes of the tree and their attributes like DOM-parsers do, that's why). This one's where the magic happens.
_node_to_token( $self, $node, $prefix )-
This function determines the token-number for a given node.
$prefixequals$XML::YYLex::PREFIX_OPENING(usually empty) for opening tags and$XML::YYLex::PREFIX_CLOSING(the underscore "_" by default). The default behaviour is to look for a symbol with the name of the node (for elements) in the namespace of your byacc-generated parser.$XML::YYLex::TOKEN_TEXTis used for text-nodes and$XML::YYLex::TOKEN_OTHER(OTHER by default) is used for unknwon tags (when no token with that name exists). For closing elements, the prefix is prepended to the tagname (i.e._htmlfor</html>). yyerror( $err )andyydebug( $err )-
These are the default debug- and error-handlers respectively if no other functions are given to create_object.
yyerrorcroaks andyydebugcarps it's arguments. new( $unblessed_hashref )-
Don't call it. It serves as constructor for child-classes and must be given an almost initialized object (an
$unblessed_hashref). See the code for details.
SUBCLASSING
A subclass for a specific DOM-parser needs to implement the following methods:
_xml_getDocumentElement( $dom_document )-
returns the root node of
$dom_document. _xml_isTextNode( $dom_node )-
returns a true value if the given node is a text-node.
_xml_isElementNode( $dom_node )-
returns a true value if the given node is an element.
_xml_isDocumentNode( $dom_node )-
returns a true value if the given node is the root node of it's document.
EXAMPLE
Here's a simple example. Imagine the following XML-document:
<html>
<head>
<title>this is my document's title</title>
</head>
<body>
<haystack needle="XML::YYLex is such a great module."/>
</body>
</html>
Our byacc-input might look like this:
%token html _html head _head title _title body _body haystack \
_haystack TEXT OTHER _OTHER
%start HTMLDOCUMENT
%%
HTMLDOCUMENT: html HEAD BODY _html;
HEAD: head _head
| head TITLE _head
;
TITLE: title _title
| title TEXT title
{ print $2->getNodeValue."\n"; }
;
BODY: body _body
| body haystack _haystack _body
{ print $2->getAttribute( "needle" )."\n"; }
%%
The parser-definition must be turned to a perl-module Demo.pm with the command
byacc -P Demo demo.y
(assuming that the definition resides in the file demo.y).
The glue between XML-file and parser-definition is the followin perl-program
use strict;
use XML::YYLex;
my $dom_document;
if ( &you_want_to_use_XML_DOM ) {
## XML::DOM initialization
my $dom_parser = new XML::DOM::Parser
$dom_document = $dom_parser->parsefile( "foo.xml" );
} elsif( &you_want_to_use_sablotron ) {
## XML::Sablotron::DOM initialization
my $sit = new XML::Sablotron::Situation;
$dom_document = XML::Sablotron::DOM::parse( $sit, "foo.xml" );
}
my $p = &XML::YYLex::create_object( document => $dom_document );
$p->run( "Demo" );
This hopefully produces the two lines of output
this is my document's title
XML::YYLex is such a great module.
In this case you were probably better off parsing the document by hand, but in more complex cases, XML::YYLex might significantly help you.
KNOWN BUGS
Comments an processing instructions cause errors.
SEE ALSO
XML::DOM, XML::DOM::YYLex, XML::Sablotron::DOM, XML::Sablotron::DOM::YYLex
The XML-YYLex homepage: http://home.debitel.net/user/boesswetter/xml_yylex/
AUTHOR
Daniel Boesswetter, <boesswetter@peppermind.de>
COPYRIGHT AND LICENSE
Copyright 2002 by Daniel Boesswetter
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 368:
You forgot a '=back' before '=head1'