XML::Compile::Translate::Reader - translate XML to HASH
XML::Compile::Translate::Reader is a XML::Compile::Translate
my $schema = XML::Compile::Schema->new(...); my $code = $schema->compile(READER => ...);
The translator understands schemas, but does not encode that into actions. This module implements those actions to translate from XML into a (nested) Perl HASH structure.
Extends "DESCRIPTION" in XML::Compile::Translate.
Extends "METHODS" in XML::Compile::Translate.
Extends "DETAILS" in XML::Compile::Translate.
Extends "Translator options" in XML::Compile::Translate.
If you want to collect information from the XML structure, which is permitted by any and anyAttribute specifications in the schema, you have to implement that yourself. The problem is XML::Compile has less knowledge than you about the possible data.
any
anyAttribute
XML::Compile
By default, the anyAttribute specification is ignored. When TAKE_ALL is given, all attributes which are fulfilling the name-space requirement added to the returned data-structure. As key, the absolute element name will be used, with as value the related unparsed XML element.
TAKE_ALL
In the current implementation, if an explicit attribute is also covered by the name-spaces permitted by the anyAttribute definition, then it will also appear in that list (and hence the handler will be called as well).
Use XML::Compile::Schema::compile(any_attribute) to write your own handler, to influence the behavior. The handler will be called for each attribute, and you must return list of pairs of derived information. When the returned is empty, the attribute data is lost. The value may be a complex structure.
By default, the any definition in a schema will ignore all elements from the container which are not used. Also in this case TAKE_ALL is required to produce any results. SKIP_ALL will ignore all results, although this are being processed for validation needs.
SKIP_ALL
By default, the elements which have type "xsd:anyType" will return an XML::LibXML::Element when there are sub-elements. Otherwise, it will return the textual content.
If you pass your own CODE reference, you can change this behavior. It will get called with the path, the node, and the default handler. Be awayre the $node may actually be a string already.
$schema->compile(READER => ..., any_type => \&handle_any_type); sub handle_any_type($$$) { my ($path, $node, $handler) = @_; ref $node or return $node; $node; }
[available since 0.86] ComplexType and ComplexContent components can be declared with the <mixed="true"> attribute. This implies that text is not limited to the content of containers, but may also be used inbetween elements. Usually, you will only find ignorable white-space between elements.
<mixed="true"
In this example, the a container is marked to be mixed: <a id="5"> before <b>2</b> after </a>
a
Often the "mixed" option is bending one of both ways: either the element is needed as text, or the element should be parsed and the text ignored. The reader has various options to avoid the need of processing raw XML::LibXML nodes.
[1.00] When the return is a HASH, that HASH will also contain the _MIXED_ELEMENT_MODE key, to help people understand what happens. This is not possible for all modes, only for some.
_MIXED_ELEMENT_MODE
With XML::Compile::Schema::compile(mixed_elements) set to
a HASH is returned, the attributes are processed. The node is found as XML::LibXML::Element with the key '_'. Above example will produce $r = { id => 5, _ => $xmlnode };
Like the previous, but now the textual representation of the content is returned with key '_'. Above example will produce $r = { id => 5, _ => ' before 2 after '};
will remove all mixed-in text, and treat the element as normal element. The example will be transformed into $r = { id => 5, b => 2 };
return the XML::LibXML::Node itself. The example: $r = $xmlnode;
return the mixed node as XML string, just as in the source. Be warned that it is rather expensive: the string was parsed and then stringified again, which is costly for large nodes. Result: $r = '<a id="5"> before <b>2</b> after </a>';
the reference is called with the XML::LibXML::Node as first argument. When a value is returned (even undef), then the right tag with the value will be included in the translators result. When an empty list is returned by the code reference, then nothing is returned (which may result in an error if the element is required according to the schema)
When some of your mixed elements need different behavior from other elements, then you have to go play with the normal hooks in specific cases.
The before hooks receives an XML::LibXML::Node object and the path string. It must return a new (or same) XML node which will be used from then on. You probably can best modify a node clone, not the original as provided by the user. When undef is returned, the whole node will disappear.
before
undef
This hook offers a predefined PRINT_PATH.
PRINT_PATH
Your replace hook should return a list of key-value pairs. To produce it, it will get the XML::LibXML::Element, the translator settings as HASH, the path, and the localname.
replace
This hook has a predefined SKIP, which will not process the found element, but simply return the string "SKIPPED" as value. This way, a whole tree of unneeded translations can be avoided.
SKIP
[1.51] The predefined hook XML_NODE will not attempt to parse the selected element, but returns the XML::LibXML::Element node instead. This may break on some schema-contained validations.
XML_NODE
Sometimes, the Schema spec is such a mess, that XML::Compile cannot automatically translate it. I have seen cases where confusion over name-spaces is created: a choice between three elements with the same name but different types. Well, in such case you may use XML::LibXML::Simple to translate a part of your tree. Simply
use XML::LibXML::Simple qw/XMLin/; $schema->addHook ( action => 'READER' , type => 'tns:xyz' # or pack_type($tns,'xyz') # path => qr!/company$! # by element name , replace => sub { my ($xml, $args, $path, $type, $r) = @_; ($type => XMLin($xml, ...)); } );
Your code reference gets called with three parameters: the XML node, the data collected and the path. Be careful that the collected data might be a SCALAR (for simpleType). Return a HASH or a SCALAR. undef may work, unless it is the value of a required element you throw awy.
This hook also offers a predefined PRINT_PATH. Besides, it has INCLUDE_PATH, XML_NODE, NODE_TYPE, ELEMENT_ORDER, and ATTRIBUTE_ORDER, which will result in additional fields in the HASH, respectively containing the NODE which was processed (an XML::LibXML::Element), the type_of_node, the element names, and the attribute names. The keys start with an underscore _.
INCLUDE_PATH
NODE_TYPE
ELEMENT_ORDER
ATTRIBUTE_ORDER
_
In a typemap, a relation between an XML element type and a Perl class (or object) is made. Each translator back-end will implement this a little differently. This section is about how the reader handles typemaps.
Usually, an XML type will be mapped on a Perl class. The Perl class implements the fromXML method as constructor.
fromXML
$schema->addTypemaps($sometype => 'My::Perl::Class'); package My::Perl::Class; ... sub fromXML { my ($class, $data, $xmltype) = @_; my $self = $class->new($data); ... $self; }
Your method returns the data which will be included in the result tree of the reader. You may return an object, the unmodified $data, or undef. When undef is returned, this may fail the schema parser when the data element is required.
$data
In the simpelest implementation, the class stores its data exactly as the XML structure:
package My::Perl::Class; sub fromXML { my ($class, $data, $xmltype) = @_; bless $data, $class; } # The same, even shorter: sub fromXML { bless $_[1], $_[0] }
Another option is to implement an object factory: one object which creates other objects. In this case, the $xmltype parameter can come of use, to have one object spawning many different other objects.
$xmltype
my $object = My::Perl::Class->new(...); $schema->typemap($sometype => $object); package My::Perl::Class; sub fromXML { my ($object, $xmltype, $data) = @_; return Some::Other::Class->new($data); }
This object factory may be a very simple solution when you map XML onto objects which are not under your control; where there is not way to add the fromXML method.
The light version of an object factory works with CODE references.
$schema->typemap($t1 => \&myhandler); sub myhandler { my ($backend, $data, $type) = @_; return My::Perl::Class->new($data) if $backend eq 'READER'; $data; } # shorter $schema->typemap($t1 => sub {My::Perl::Class->new($_[1])} );
Internally, the typemap is simply translated into an "after" hook for the specific type. After the data was processed via the usual mechanism, the hook will call method fromXML on the class or object you specified with the data which was read. You may still use "before" and "replace" hooks, if you need them.
Syntactic sugar:
$schema->typemap($t1 => 'My::Package'); $schema->typemap($t2 => $object);
is comparible to
$schema->typemap($t1 => sub {My::Package->fromXML(@_)}); $schema->typemap($t2 => sub {$object->fromXML(@_)} );
with some extra checks.
This module is part of XML-Compile distribution version 1.59, built on December 28, 2017. Website: http://perl.overmeer.net/xml-compile/
Please post questions or ideas to the mailinglist at http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/xml-compile . For live contact with other developers, visit the #xml-compile channel on irc.perl.org.
#xml-compile
irc.perl.org
Copyrights 2006-2017 by [Mark Overmeer]. For other contributors see ChangeLog.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See http://dev.perl.org/licenses/
To install XML::Compile, copy and paste the appropriate command in to your terminal.
cpanm
cpanm XML::Compile
CPAN shell
perl -MCPAN -e shell install XML::Compile
For more information on module installation, please visit the detailed CPAN module installation guide.