The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Class::XML::Parser - Parses (and optionally validates against a DTD) an XML message into a user-defined class structure.

SYNOPSIS

 # parse result base class, just defines an autoloader

 package ParseResult::Base;

 sub new { bless {}, shift(); }
 sub AUTOLOAD {
     my ( $self, $val ) = @_;

     my $meth = $AUTOLOAD;
     $meth =~ s/.*:://;

     return if $meth eq 'DESTROY';

     if ( defined $val ) {
         $self->{ $meth } = $val;
     }

     $self->{ $meth };
 }

 # define classes that xml gets parsed into
 package ParseResult;

 use base qw( ParseResult::Base );

 # optionally define sub-classes that specific elements will be parsed into.
 # If this method doesn't exist, then all sub-elements and attributes thereof
 # will be parsed into this class
 sub __xml_parse_objects {
     {
         blah   => 'ParseResult::Blah',
     }
 }

 # optionally, have a class use a constructor other than 'new'.  Useful
 # for Class::Singleton objects
 sub __xml_parse_constructor {
     'new'
 }

 # optionally, have elements aliased to a method other than the XML
 # element name
 sub __xml_parse_aliases {
    {
        elem1   => 'bar',
    }
 }

 package ParseResult::Blah;

 use base qw( ParseResult::Base );

 package main;
 
 use Class::XML::Parser;
 
 my $xml = <<EOXML;
 <?xml version="1.0" encoding="UTF-8"?>
 <!DOCTYPE parser PUBLIC "-//Example//DTD Parse Example//EN"
                                 "http://example.com/parse.dtd">
 <parser>
   <elem1>
     <qwerty>uiop</qwerty>
     <blah>
       <wibble a="20">wobble</wibble>
     </blah>
   </elem1>
 </parser>
 EOXML

 my $parser = Class::XML::Parser->new(
     root_class      => 'ParseResult',   # top-level class to parse results into
     prune           => 1,
     validate        => 1,               # DTD validation should be done
     map_uri         => {
         # maps from XML SYSID or PUBID to URLs to replace.  Use to avoid
         # having to do a HTTP retrieval of the DTD, instead finding it on
         # the local filesystem
         'http://example.com/parse.dtd' => 'file:/tmp/parse.dtd',
     },
 );

 my $top = $parser->parse( $xml )
   or die $parser->last_error;

 print Dumper $top;

 # assuming the DTD exists, this will return a structure of:
 #$VAR1 = bless( {
 #    'blah' => bless( {
 #        'wibble' => 'wobble',      # sub-element of <blah>
 #        'a' => '20'                # attributes are also handled
 #    }, 'ParseResult::Blah' ),      # created as new object, as blah
 #                                   # defined in higher-level
 #                                   # __xml_parse_objects
 #    'qwerty' => 'uiop'             # sub-element of root
 #}, 'ParseResult' );                # top object is blessed into
 #                                   # 'root_class'

DESCRIPTION

This module allows for XML to be parsed into an user-defined object hierarchy. Additionally, the XML will be validated against it's DTD, if such is defined within the XML body, and XML::Checker::Parser is available.

A note as to how the parsing is done. When the ->parse method is called, the each element name is checked against the current class' (root_class by default) __xml_parse_objects result. If an entry exists for this element in the __xml_parse_objects hash, a new instance of the destination class is created. All further elements and attributes will be called as mutators on that object, until the closing tag for the element is found, at which time the previous object would be restored, and all further elements will default to calling accessors on that object. If nested elements are found, but no __xml_parse_objects definition exists for them, any data elements and attributes will be folded with the current object (container-only elements are *not* added).

constructor

 my $parser = Class::XML::Parser->new(
    root_class  => 'DataClass',
    validate    => 1,
    map_uri     => { 'http://example.com/data.dtd' => 'file:/tmp/data.dtd' },
    prune       => 1,
    parser      => 'XML::Parser',
 );

The following describes the parameters for the constructor:

root_class

The root class that the parse results will be blessed into. If not defined, this will be the calling class.

validate

Whether DTD validation should be performed. Internally, if this is set to a true value, XML::Checker::Parser is used for parsing. If not set, the parsing class will be XML::Parser.

map_uri

This is only meaningful when 'validate' is true. This allows replacements URLs to be defined for DTD SYSIDs and PUBIDs. This should be given as a hash-ref. If the given URL is a 'file:' type, the filename must be fully- qualified. See XML::Checker::Parser for more details.

prune

If true, all parsed data values will not be assigned if they're found to be empty of all but whitespace.

strip

If true, all data values will be stripped of leading/trailing whitespace.

*

Any other items will be passed to the internal XML parser class used, either XML::Checker::Parser, if validiate is specified, or XML::Parser.

Possibly the most useful other item would be a Namespaces paramater, which will cause namespaces within the XML to be ignored when parsing. See XML::Parser::Expat for more details.

Object Methods

parse( $xml )

Attempts to parse (and validate if specified) the given XML into an object hierarchy. Upon an error, this will return undef, and last_error will be set. NOTE: This method is NOT thread-safe.

last_error()

Returns the last parsing or validation error for this object, or undef on no previous error.

Data Object Methods

__xml_parse_objects

This method, if defined for any parser classes, will define which XML elements will be deserialized as new objects.

This method should return a hash-ref, of the form { xml-tag => package_name }, where an XML element of <xml-tag> is found, a new instance of <package_name> is created, and all attributes and sub-element will then be parsed into that class.

__xml_parse_constructor

If defined, the value returned by this method will be used as the constructor method for objects that parse into this class, instead of the typical 'new' method.

__xml_parse_aliases

If defined, this method should return a hash-ref, which maps XML elements to alternate method, rather than using a method of the same name as the element.

CAVEATS

IMPORTANT: No checks are done to determine if the element/attribute deserialization would cause a previous definition to be overwritten. Where there is a possibility of this, and it is not the desired behaviour, this can be overcome by creating a mutator for that element in the package that it will be parsed into, to push it onto an array, or hash, as appropriate. See t/05_hierarchy_custom_mutator.t for an example of this.

Due to a limitation of XML::Parser Stream handling, elements that are completely empty (no content or attributes) will NOT be assigned to. This could possibly be overcome, but I didn't need this, so didn't bother. :)

If namespaces exist in the parsed XML, there are 2 options for handling. The first is to pass a Namespaces => 1 to the Class::XML::Parser constructor, and ensure that the xmlns attribute is defined (see t/11_namespaces.t for an example of this). The alternative would be to make liberal use of __xml_parse_aliases in all parse result classes.

SEE ALSO

XML::Parser

XML::Checker::Parser (used for DTD validation internally)

AUTHOR

makk384@gmail.com

2 POD Errors

The following errors were encountered while parsing the POD:

Around line 378:

Expected text after =item, not a bullet

Around line 462:

'=end' without a target?