The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

XML::Compile::Schema - Compile a schema

INHERITANCE

 XML::Compile::Schema
   is a XML::Compile

SYNOPSIS

 # compile tree yourself
 my $parser = XML::LibXML->new;
 my $tree   = $parser->parse...(...);
 my $schema = XML::Compile::Schema->new($tree);

 # get schema from string
 my $schema = XML::Compile::Schema->new($xml_string);

 # adding schemas
 $schema->addSchemas($tree);
 $schema->importSchema('http://www.w3.org/2001/XMLSchema');
 $schema->importSchema('2001-XMLSchema.xsd');

 # create and use a reader
 my $read   = $schema->compile(READER => '{myns}mytype');
 my $hash   = $read->($xml);
 
 # create and use a writer
 my $doc    = XML::LibXML::Document->new('1.0', 'UTF-8');
 my $write  = $schema->compile(WRITER => '{myns}mytype');
 my $xml    = $write->($doc, $hash);

 # show result
 print $xml->toString;

DESCRIPTION

This module collects knowledge about one or more schemas. The most important method is compile() which can create XML file readers and writers based on the schema information and some selected type.

Two implementations use the translator, and more can be added later. Both get created with the compile method.

READER (translate XML to HASH)

The XML reader produces a HASH from a XML::LibXML::Node tree or an XML string. Those represent the input data. The values are checked. An error produced when a value or the data-structure is not according to the specs.

Example: create an XML reader

 my $msgin  = $rules->compile(READER => '{myns}mytype');
 my $xml    = $parser->parse("some-xml.xml");
 my $hash   = $msgin->($xml);

or

 my $hash   = $msgin->($xml_string);
WRITER (translate HASH to XML)

The writer produces schema compliant XML, based on a HASH. To get the data encoding correct, you are required to pass a document in which the XML nodes may get a place later.

Example: create an XML writer

 my $doc    = XML::LibXML::Document->new('1.0', 'UTF-8');
 my $write  = $schema->compile(WRITER => '{myns}mytype');
 my $xml    = $write->($doc, $hash);
 print $xml->toString;
 

alternative

 my $write  = $schema->compile(WRITER => 'myns#myid');

Be warned that the schema itself is NOT VALIDATED; you can easily construct schema's which do work with this module, but are not valid according to W3C. Only in some cases, the translater will refuse to accept mistakes: mainly because it cannot produce valid code.

See chapter "DETAILS" and learn how the data is processed.

METHODS

Constructors

$obj->new(TOP, OPTIONS)

Accessors

$obj->addSchemaDirs(DIRECTORIES)

$obj->addSchemas(NODE|TEXT)

    Collect all the schemas defined below the NODE.

$obj->findSchemaFile(FILENAME)

$obj->importSchema(XMLDATA)

    Import (include) the schema information included in the XMLDATA. The XMLDATA must be acceptable for dataToXML().

$obj->knownNamespace(NAMESPACE)

XML::Compile::Schema->knownNamespace(NAMESPACE)

$obj->namespaces

$obj->top

Read XML

$obj->dataToXML(NODE|REF-XML|XML|FILENAME|KNOWN)

$obj->parse(STRING)

$obj->parseFile(FILENAME)

Filters

$obj->walkTree(NODE, CODE)

Compilers

$obj->compile(('READER'|'WRITER'), ELEMENT, OPTIONS)

    Translate the specified ELEMENT into a CODE reference which is able to translate between XML-text and a HASH.

    The ELEMENT is the starting-point for processing in the data-structure. It can either be a global element, or a global type. The NAME must be specified in {url}name format, there the url is the name-space. An alternative is the url#id which refers to an element or type with the specific id attribute value.

    When a READER is created, a CODE reference is returned which needs to be called with parsed XML (an XML::LibXML::Node) or an XML text. Returned is a nested HASH structure which contains the data from contained in the XML. When a simple element type is addressed, you will get a single value back,

    When a WRITER is created, a CODE reference is returned which needs to be called with a HASH, and returns a XML::LibXML::Node.

    Most options below are explained in more detailed in the manual-page XML::Compile::Schema::Translate.

     Option              --Defined in     --Default
     anyAttribute                           undef
     anyElement                             undef
     attributes_qualified                   <undef>
     check_occurs                           <false>
     check_values                           <true>
     elements_qualified                     <undef>
     ignore_facets                          <false>
     include_namespaces                     <true>
     invalid                                DIE
     namespace_reset                        <false>
     output_namespaces                      {}
     path                                   <expanded name of type>
     sloppy_integers                        <false>

    . anyAttribute CODE

      In general, anyAttribute schema components cannot be handled automatically. If you need to create or process anyAttribute information, then read about wildcards in the DETAILS chapter of the manual-page for the specific back-end.

    . anyElement CODE

      In general, any schema components cannot be handled automatically. If you need to create or process any information, then read about wildcards in the DETAILS chapter of the manual-page for the specific back-end.

    . attributes_qualified BOOLEAN

      When defined, this will overrule the attributeFormDefault flags in all schema's. When not qualified, the xml will not produce nor process prefixes on attributes.

    . check_occurs BOOLEAN

      Whether code will be produced to complain about elements which should or should not appear, and is between bounds or not. Elements which may have more than 1 occurence will still always be represented by an ARRAY.

    . check_values BOOLEAN

      Whether code will be produce to check that the XML fields contain the expected data format.

      Turning this off will improve the processing significantly, but is (of course) much less unsafer. Do not set it off when you expect data from external sources.

    . elements_qualified BOOLEAN

      When defined, this will overrule the elementFormDefault flags in all schema's. When not qualified, the xml will not produce or process prefixes on the elements.

    . ignore_facets BOOLEAN

      Facets influence the formatting and range of values. This does not come cheap, so can be turned off. Affects the restrictions set for a simpleType.

    . include_namespaces BOOLEAN

      Indicates whether the WRITER should include the prefix to namespace translation on the top-level element of the returned tree. If not, you may continue with the same name-space table to combine various XML components into one, and add the namespaces later.

    . invalid 'IGNORE','WARN','DIE',CODE

      What to do in invalid values (ignored when not checking). See invalidsErrorHandler() who initiates this handler.

    . namespace_reset BOOLEAN

      Use the same prefixes in output_namespaces as with some other compiled piece, but reset the counts to zero first.

    . output_namespaces HASH

      Can be used to predefine an output namespace (when 'WRITER') for instance to reserve common abbreviations like soap for external use. Each entry in the hash has as key the namespace uri. The value is a hash which contains uri, prefix, and used fields. Pass a reference to a private hash to catch this index.

    . path STRING

      Prepended to each error report, to indicate the location of the error in the XML-Scheme tree.

    . sloppy_integers BOOLEAN

      The decimal and integer types must support at least 18 digits, which is larger than Perl's 32 bit internal integers. Therefore, the implementation will use Math::BigInt objects to handle them. However, often an simple int type whould have sufficed, but the XML designer was lazy. A long is much faster to handle. Set this flag to use int as fast (but inprecise) replacements.

      Be aware that Math::BigInt and Math::BigFloat objects are nearly but not fully transparent mimicing the behavior of Perl's ints and floats. See their respective manual-pages. Especially when you wish for some performance, you should optimize access to these objects to avoid expensive copying which is exactly the spot where the difference are.

$obj->elements

    List all elements, defined by all schemas sorted alphabetically.

$obj->invalidsErrorHandler('IGNORE','USE'.'WARN','DIE',CODE)

    What to do when a validation error appears during validation? This method translates all string options into a single code reference which is returned. Please use the invalid options of compile() which will call this method indirectly.

    When IGNORE is specified, the process will ignore the specified value as if it was not specified at all. USE will not complain, and use the value found. With WARN, it will continue with the value but a warning is printed first. On DIE it will stop processing, as will the program (catch it with eval).

    When a CODE reference is specified, that will be called specifying the type path, actual type expected (expanded name), the errorneous value, and an error string.

$obj->template('XML'|'PERL', TYPE, OPTIONS)

    WARNING: under development! The implementation is far from complete.

    Schema's can be horribly complex and unreadible. Therefore, this template method can be called to create an example which demonstrates how data of the specified TYPE as XML or Perl is organized in practice.

    Some OPTIONS are explained in XML::Compile::Schema::Translate. There are some extra OPTIONS defined for the final output process.

     Option              --Defined in     --Default
     attributes_qualified                   <undef>
     elements_qualified                     <undef>
     include_namespaces                     <true>
     indent                                 " "
     show                                   ALL

    . attributes_qualified BOOLEAN

    . elements_qualified BOOLEAN

    . include_namespaces BOOLEAN

    . indent STRING

      The leading indentation string per nesting. Must start with at least one blank.

    . show STRING|'ALL'|'NONE'

      A comma seperated list of tokens, which explain what kind of comments need to be included in the output. The available tokens are: struct, type, occur, facets. A value of ALL will select all available comments. The NONE or empty string will exclude all comments.

$obj->types

    List all types, defined by all schemas sorted alphabetically.

DETAILS

Addressing components

Normally, external users can only address elements within a schema, and types are hidden to be used by other schema's only. For this reason, it is permitted to create an element and a type with the same name.

The compiler requires a starting-point. This can either be an element name or an element's id. The format of the element name is {url}name, for instance

 {http://library}book

refers to the built-in int data-type. You may also start with

 http://www.w3.org/2001/XMLSchema#float

as long as this ID refers to an element.

Representing data-structures

The code will do its best to produce a correct translation. For instance, an accidental 1.9999 will be converted into 2 when the schema says that the field is an int. It will also strip superfluous blanks when the data-type permits. Especially watch-out for the Integer types, which produce Math::BigInt objects unless compile(sloppy_integers) is used.

Elements can be complex, and themselve contain elements which are complex. In the Perl representation of the data, this will be shown as nested hashes with the same structure as the XML.

You should not take tare of character encodings, whereas XML::LibXML is doing that for us: you shall not escape characters like "<" yourself.

The schemas define kinds of data types. There are various ways to define them (with restrictions and extensions), but for the resulting data structure is that knowledge not important.

simpleType

A single value. A lot of single value data-types are built-in (see XML::Compile::Schema::BuiltInTypes).

Simple types may have range limiting restrictions (facets), which will be checked by default. Types may also have some white-space behavior, for instance blanks are stripped from integers: before, after, but also inside the number representing string.

Example: typical simpleType

In XML, it looks like this:

 <test1>42</test1>

In the HASH structure, the data will be represented as

 test1 => 42
complexType/simpleContent

In this case, the single value container may have attributes. The number of attributes can be endless, and the value is only one. This value has no name, and therefore gets a predefined name _.

Example: typical simpleContent example

In XML, this looks like this:

 <test2 question="everything">42</test2>

As a HASH, this looks like

 test2 => { _ => 42, question => 'everything' }
complexType and complexType/complexContent

These containers not only have attributes, but also multiple values as content. The complexContent is used to create inheritance structures in the data-type definition. This does not affect the XML data package itself.

Example: typical complexType element

The XML could look like:

 <test3 question="everything" by="mouse">
   <answer>42</answer>
   <when>5 billion BC</when>
 </test3>

Represented as HASH, this looks like

 test3 => { question => 'everything', by => 'mouse'
          , answer => 42, when => '5 billion BC' }

Processing elements

A second factor which determines the data-structure is the element occurence. Usually, elements have to appear once and exactly once on a certain location in the XML data structure. This order is automatically produced by this module. But elements may appear multiple times.

usual case

The default behavior for an element (in a sequence container) is to appear exactly once. When missing, this is an error.

maxOccurs larger than 1

In this case, the element can appear multiple times. Multiple values will be kept in an ARRAY within the HASH. Non-schema based XML processors will not return a single value as an ARRAY, which makes that code more complicated.

An error will be produced when the number of elements found is less than minOccurs or more than maxOccurs, unless compile(check_occurs) is false.

Example: two values for a

 <test4><a>12</a><a>13</a><b>14</b></test4>

will become

 test4 => { a => [12, 13], b => 14 };

Example: always an array

Even when there is only one element found, it will be returned as ARRAY (of one element). Therefore, you can write

 my $data = $reader->($xml);
 foreach my $a ( @{$data->{a}} ) {...}
use="optional" or minOccurs="0"

The element may be skipped. When found it is a single value.

use="forbidden"

When the element is found, an error is produced.

default="value"

When the XML does not contain the element, the default value is used... but only if this element's container exists. This has no effect on the writer.

fixed="value"

Produce an error when the value is not present or different (after the white-space rules where applied).

List type

List simpleType objects are also represented as ARRAY, like elements with a minOccurs or maxOccurs unequal 1.

Example: with a list of ints

  <test5>3 8 12</test5>

as Perl structure:

  test5 => [3, 8, 12]

substitutionGroup

A substitution group is kind-of choice between alternative (complex) types. However, in this case roles have reversed: instead a choice which lists the alternatives, here the alternative elements register themselves as valid for an abstract (head) element. All alternatives should be extensions of the head element's type, but there is no way to check that.

Example: substitutionGroup

 <xs:element name="price"  type="xs:int" abstract="true" />
 <xs:element name="euro"   type="xs:int" substitutionGroup="price" />
 <xs:element name="dollar" type="xs:int" substitutionGroup="price" />

 <xs:element name="product">
   <xs:complexType>
      <xs:element name="name" type="xs:string" />
      <xs:element ref="price" />
   </xs:complexType>
 </xs:element>
 

Now, valid XML data is

 <product>
   <name>Ball</name>
   <euro>12</euro>
 </product>

and

 <product>
   <name>Ball</name>
   <dollar>6</dollar>
 </product>

The HASH repesentation is respectively

 product => {name => 'Ball', euro  => 12}
 product => {name => 'Ball', dollar => 6}

Wildcards

The any and anyAttribute elements are referred to as wildcards: they specify groups of elements and attributes which can be used, in stead of being explicit.

The author of this module advices against the use of wildcards in schema's, because the purpose of schema's is to be explicit and that basic idea is simply thrown away by these wildcards. Let people cleanly extend the schema with inheritance! If you use a standard schema which facilitates these wildcards, then please do not use them!

Because wildcards are not explicit about the types to expect, the XML::Compile module can not prepare for them automatically. However, as user of the schema you probably know better about the possible contents of these fields. Therefore, you can translate that knowledge into code explicitly. Read about the processing of wildcards in the manual page for each of the back-ends, because it is different in each case.

DIAGNOSTICS

Error: cannot find pre-installed name-space files

Use $ENV{SCHEMA_LOCATION} or new(schema_dirs) to express location of installed name-space files, which came with the XML::Compile distribution package.

Error: don't known how to interpret XML data

Error: no XML data specified

SEE ALSO

This module is part of XML-Compile distribution version 0.14, built on January 30, 2007. Website: http://perl.overmeer.net/xml-compile/

LICENSE

Copyrights 2006-2007 by Mark Overmeer.For other contributors see ChangeLog.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See http://www.perl.com/perl/misc/Artistic.html

1 POD Error

The following errors were encountered while parsing the POD:

Around line 134:

Unterminated M<...> sequence

Deleting unknown formatting code M<>