YAX::Parser - fast pure Perl tree and stream parser
use YAX::Parser; my $xml_str = <<XML <?xml version="1.0" ?> <doc> <content id="42"><![CDATA[ This is a cdata section, so >>anything goes!<< ]]> </content> <!-- comments are nodes too --> </doc> XML # tree parse - the common case my $xml_doc = YAX::Parser->parse( $xml_str ); my $xml_doc = YAX::Parser->parse_file( $path ); # shallow parse my @tokens = YAX::Parser->tokenize( $xml_str ); # stream parse YAX::Parser->stream( $xml_str, $state, %handlers ) YAX::Parser->stream_file( '/some/file.xml', $state, %handlers );
This module implements a fast DOM and stream parser based on Robert D. Cameron's regular expression shallow parsing grammar and technique. It doesn't implement the full W3C DOM API by design. Instead, it takes a more pragmatic approach. DOM trees are constructed with everything being an object except for attributes, which are stored as a hash reference.
We also borrow some ideas from browser implementations, in particular, nodes are keyed in a table in the document on their id attributes (if present) so you can say:
id
my $found = $xml_doc->get( $node_id );
Parsing is usually done by calling class methods on YAX::Parser, which, if invoked as a tree parser, returns an instance of YAX::Document
my $xml_doc = YAX::Parser->parse( $xml_str );
See the "SYNOPSIS" for, here's just the list for now:
Parse $xml_str and return a YAX::Document object.
Same as above by read the file at $path for the input.
Although not its main focus, YAX::Parser also provides for stream parsing. It tries to be a bit more sane than Expat, in that it allows you to specify a state holder which can be anything and is passed as the first argument to the handler functions. A typical case is to use a hash reference with a stack (for tracking nesting):
my $state = { stack => [ ] };
all handler functions are optional, but the full list is:
my %handlers = ( text => \&handle_text, # called for text nodes elmt => \&handle_element_open, # called for open tags elcl => \&handle_element_close, # called for tag close decl => \&handle_declaration, # called for declarations proc => \&handle_proc_inst, # called for processing instructions pass => \&handle_passthrough, # called when no handlers match );
an element handler is passed the state, tag name and attributes hash:
sub handle_element_open { my ( $state, $name, %attributes ) = @_; if ( $name eq 'a' and $attributes{href} ) { ... } }
element close handlers take two arguments: state and tag name:
sub handle_element_close { my ( $state, $name ) = @_; die "not well formed" unless pop @{ $state->{stack} } eq $name; }
all other handlers take the state and the entire matched token
sub handle_proc_inst { my ( $state, $token ) = @_; $token =~ /^<\?(.*?)\?>$/; my $instr = $1; ... }
Useful for quick and dirty tokenizing of $xml_str. Returns a list of tokens.
YAX::Document, YAX::Node
This program is free software and may be modified and distributed under the same terms as Perl itself.
Richard Hundt
To install YAX, copy and paste the appropriate command in to your terminal.
cpanm
cpanm YAX
CPAN shell
perl -MCPAN -e shell install YAX
For more information on module installation, please visit the detailed CPAN module installation guide.