The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

XML::Compile::Schema::XmlReader - bricks to translate XML to HASH

INHERITANCE

SYNOPSIS

 my $schema = XML::Compile::Schema->new(...);
 my $code   = $schema->compile(READER => ...);

DESCRIPTION

The translator understands schemas, but does not encode that into actions. This module implements those actions to translate from XML into a (nested) Perl HASH structure.

DETAILS

Processing Wildcards

If you want to collect information from the XML structure, which is permitted by any and anyAttribute specifications in the schema, you have to implement that yourself. The problem is XML::Compile has less knowledge than you about the possible data.

anyAttribute

By default, the anyAttribute specification is ignored. When TAKE_ALL is given, all attributes which are fulfilling the name-space requirement added to the returned data-structure. As key, the absolute element name will be used, with as value the related unparsed XML element.

In the current implementation, if an explicit attribute is also covered by the name-spaces permitted by the anyAttribute definition, then it will also appear in that list (and hence the handler will be called as well).

Use XML::Compile::Schema::compile(anyAttribute) to write your own handler, to influence the behavior. The handler will be called for each attribute, and you must return list of pairs of derived information. When the returned is empty, the attribute data is lost. The value may be a complex structure.

example: anyAttribute in XmlReader

Say your schema looks like this:

 <schema targetNamespace="http://mine"
    xmlns:me="http://mine" ...>
   <element name="el">
     <complexType>
       <attribute name="a" type="xs:int" />
       <anyAttribute namespace="##targetNamespace"
          processContents="lax">
     </complexType>
   </element>
   <simpleType name="non-empty">
     <restriction base="NCName" />
   </simpleType>
 </schema>

Then, in an application, you write:

 my $r = $schema->compile
  ( READER => pack_type('http://mine', 'el')
  , anyAttribute => 'ALL'
  );
 # or lazy: READER => '{http://mine}el'

 my $h = $r->( <<'__XML' );
   <el xmlns:me="http://mine">
     <a>42</a>
     <b type="me:non-empty">
        everything
     </b>
   </el>
 __XML

 use Data::Dumper 'Dumper';
 print Dumper $h;
 __XML__

The output is something like

 $VAR1 =
  { a => 42
  , '{http://mine}a' => ... # XML::LibXML::Node with <a>42</a>
  , '{http://mine}b' => ... # XML::LibXML::Node with <b>everything</b>
  };

You can improve the reader with a callback. When you know that the extra attribute is always of type non-empty, then you can do

 my $read = $schema->compile
  ( READER => '{http://mine}el'
  , anyAttribute => \&filter
  );

 my $anyAttRead = $schema->compile
  ( READER => '{http://mine}non-empty'
  );

 sub filter($$$$)
 {   my ($fqn, $xml, $path, $translator) = @_;
     return () if $fqn ne '{http://mine}b';
     (b => $anyAttRead->($xml));
 }

 my $h = $r->( see above );
 print Dumper $h;

Which will result in

 $VAR1 =
  { a => 42
  , b => 'everything'
  };

The filter will be called twice, but return nothing in the first case. You can implement any kind of complex processing in the filter.

any element

By default, the any definition in a schema will ignore all elements from the container which are not used. Also in this case TAKE_ALL is required to produce any results. SKIP_ALL will ignore all results, although this are being processed for validation needs.

The minOccurs and maxOccurs of any are ignored: the amount of elements is always unbounded. Therefore, you will get an array of elements back per type.

Schema hooks

hooks executed before the XML is being processed

The before hooks receives an XML::LibXML::Node object and the path string. It must return a new (or same) XML node which will be used from then on. You probably can best modify a node clone, not the original as provided by the user. When undef is returned, the whole node will disappear.

This hook offers a predefined PRINT_PATH.

example: to trace the paths

 $schema->addHook(path => qr/./, before => 'PRINT_PATH');

hooks executed as replacement

Your replace hook should return a list of key-value pairs. To produce it, it will get the XML::LibXML::Node, the translator settings as HASH, the path, and the localname.

This hook has a predefined SKIP, which will not process the found element, but simply return the string SKIPPED as value. This way, a whole tree of unneeded translations can be avoided.

hooks for post-processing, after the data is collected

The data is collect, and passed as second argument after the XML node. The third argument is the path. Be careful that the collected data might be a SCALAR (for simpleType).

This hook also offers a predefined PRINT_PATH. Besides, it has XML_NODE, ELEMENT_ORDER, and ATTRIBUTE_ORDER, which will result in additional fields in the HASH, respectively containing the CODE which was processed, the element names and the attribute names. The keys start with an underscore _.

SEE ALSO

This module is part of XML-Compile distribution version 0.67, built on February 04, 2008. Website: http://perl.overmeer.net/xml-compile/

LICENSE

Copyrights 2006-2008 by Mark Overmeer. For other contributors see ChangeLog.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See http://www.perl.com/perl/misc/Artistic.html