The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

SOAP::WSDL::Parser - How SOAP::WSDL parses XML messages

Which XML message does SOAP::WSDL parse ?

Naturally, there are two kinds of XMLdocuments (or messages) SOAP::WSDL has to parse:

  • WSDL definitions

  • SOAP messages

Parser implementations

There are different parser implementations available for SOAP messages - currently there's only one for WSDL definitions.

WSDL definitions parser

  • SOAP::WSDL::SAX::WSDLHandler

    This is a SAX handler for parsing WSDL files into object trees SOAP::WSDL works with.

    It's built as a native handler for XML::LibXML, but will also work with XML::SAX::ParserFactory.

    To parse a WSDL file, use one of the following variants:

     my $parser = XML::LibXML->new();
     my $handler = SOAP::WSDL::SAX::WSDLHandler->new();
     $parser->set_handler( $handler );
     $parser->parse( $xml );
     my $data = $handler->get_data();
     
     
     my $handler = SOAP::WSDL::SAX::WSDLHandler->new({
            base => 'XML::SAX::Base'
     });
     my $parser = XML::SAX::ParserFactor->parser(
        Handler => $handler
     );
     $parser->parse( $xml );
     my $data = $handler->get_data();

SOAP messages parser

All SOAP message handler use class resolvers for finding out which class a particular XML element should be of, and type libs containing these classes.

Creating a class resolver

The easiest way for creating a class resolver is to run SOAP::WSDL's generator.

See wsdl2perl.pl

The class resolver must implement a class method "get_class", which is passed a list ref of the current element's XPath (relative to Body), split by /.

This method must return a class name appropriate for a XML element.

A class resolver package might look like this:

 package ClassResolver;

 my %class_list = (
    'EnqueueMessage' => 'Typelib::TEnqueueMessage',
    'EnqueueMessage/MMessage' => 'Typelib::TMessage',
    'EnqueueMessage/MMessage/MRecipientURI' => 'SOAP::WSDL::XSD::Builtin::anyURI',
    'EnqueueMessage/MMessage/MMessageContent' => 'SOAP::WSDL::XSD::Builtin::string',
 );

 sub new { return bless {}, 'ClassResolver' };

 sub get_class {
    my $name = join('/', @{ $_[1] });
    return ($class_list{ $name }) ? $class_list{ $name }
        : warn "no class found for $name";
 };
 1;

Skipping unwanted items

Sometimes there's unneccessary information transported in SOAP messages.

To skip XML nodes (including all child nodes), just edit the type map for the message and set the type map entry to '__SKIP__'.

In the example above, EnqueueMessage/StuffIDontNeed and all child elements are skipped.

 my %class_list = (
    'EnqueueMessage' => 'Typelib::TEnqueueMessage',
    'EnqueueMessage/MMessage' => 'Typelib::TMessage',
    'EnqueueMessage/MMessage/MRecipientURI' => 'SOAP::WSDL::XSD::Builtin::anyURI',
    'EnqueueMessage/MMessage/MMessageContent' => 'SOAP::WSDL::XSD::Builtin::string',
    'EnqueueMessage/StuffIDontNeed' => '__SKIP__',
    'EnqueueMessage/StuffIDontNeed/Foo' => 'SOAP::WSDL::XSD::Builtin::string',
    'EnqueueMessage/StuffIDontNeed/Bar' => 'SOAP::WSDL::XSD::Builtin::string',
 );

Note that only SOAP::WSDL::Expat::MessageParser implements skipping elements at the time of writing.

Creating type lib classes

Every element must have a correspondent one in the type library.

Builtin types should be resolved as SOAP::WSDL::XSD::Builtin::* classes

Creating a type lib is easy: Just run SOAP::WSDL's generator - it will create both a typemap and the type lib classes for a WSDL file.

Sometimes it is nessecary to create type lib classes by hand - not all WSDL definitions are complete.

For writing your own lib classes, see SOAP::WSDL::XSD::Typelib::Element, SOAP::WSDL::XSD::Typelib::ComplexType and SOAP::WSDL::XSD::Typelib::SimpleType.

Parser implementations

  • SOAP::WSDL::SAX::MessageHandler

    This is a SAX handler for parsing WSDL files into object trees SOAP::WSDL works with.

    It's built as a native handler for XML::LibXML, but will also work with XML::SAX::ParserFactory.

    Can be used for parsing both streams (chunks) and documents.

    See SOAP::WSDL::SAX::MessageHandler for details.

  • SOAP::WSDL::Expat::MessageParser

    A XML::Parser::Expat based parser. This is the fastest parser for most SOAP messages and the default for SOAP::WSDL::Client.

  • SOAP::WSDL::Expat::MessageStreamParser

    A XML::Parser::ExpatNB based parser. Useful for parsing huge HTTP responses, as you don't need to keep everything in memory.

    See SOAP::WSDL::Expat::MessageStreamParser for details.

Performance

SOAP::WSDL::Expat::MessageParser is the fastest way of parsing SOAP messages into object trees and only slightly slower than converting them into hash data structures:

 Parsing a SOAP message with a length of 5962 bytes:
 SOAP::WSDL::Expat::MessageParser:
    3 wallclock secs ( 3.28 usr +  0.05 sys =  3.33 CPU) @ 60.08/s (n=200)
  
 SOAP::WSDL::SAX::MessageHandler (with raw XML::LibXML):   
   5 wallclock secs ( 4.95 usr +  0.00 sys =  4.95 CPU) @ 40.38/s (n=200)
 
 XML::Simple (XML::Parser):
   3 wallclock secs ( 2.36 usr +  0.03 sys =  2.39 CPU) @ 83.65/s (n=200)
 
 XML::Simple (XML::SAX::Expat):
   7 wallclock secs ( 6.50 usr +  0.03 sys =  6.53 CPU) @ 30.62/s (n=200)

As the benchmark shows, all SOAP::WSDL parser variants are faster than XML::Simple with XML::SAX::Expat, and SOAP::WSDL::Expat::MessageParser almost reaches the performance of XML::Simple with XML::Parser as backend.

Parsing SOAP responses in chunks does not increase speed - at least not up to a response size of around 500k:

 Benchmark: timing 5 iterations of SOAP::WSDL::SAX::MessageHandler, 
   SOAP::WSDL::Expat::MessageParser, SOAP::WSDL::Expat::MessageStreamParser...
 
 SOAP::WSDL::Expat::MessageStreamParser: 
 13 wallclock secs ( 7.39 usr +  0.09 sys =  7.48 CPU) @  0.67/s (n=5)

 SOAP::WSDL::Expat::MessageParser: 
 10 wallclock secs ( 5.81 usr +  0.06 sys =  5.88 CPU) @  0.85/s (n=5)
 
 SOAP::WSDL::SAX::MessageHandler: 
 14 wallclock secs ( 8.78 usr +  0.03 sys =  8.81 CPU) @  0.57/s (n=5)
 
 Response size: 344330 bytes