The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

XML::XForms::Validate - Perl extension for validation of XForms submissions

SYNOPSIS

  use XML::XForms::Validate qw(validate);
  
  # For method="post":
  $msg = validate(input => $filename, xforms => $file, base => '../instances', model => 'form2') and die $msg;
  
  # For method="get", method="urlencoded-post" or method="form-data-post":
  $result = validate(input => \%parameters, xforms => \$xml_string);
  die $result if !ref($result);
  
  # OO usage:
  my $validator = XML::XForms::Validate->new(input => \$xml_string, model => $model, base => $base);
  $result = $validator->validate(input => $input);
  die $result if !ref($result);
  $result = $validator->normalize($validator->validate(input => $input2));
  die $result if !ref($result);

DESCRIPTION

This module validates input data against an XML document containing one or more XForms models. It is able to process all serializations except multipart/related, relying on pre-parsed data for multipart/form-data or application/x-www-form-urlencoded.

Usage is rather simple: Supply input data (usually a submitted XML instance), an XML document containing one or more XForms models, and possibly some optional arguments. The return value is a hash of validated (and possibly modified) result DOM trees, one entry per original instance, or an error message string if validation failed.

Since XForms is a sufficient complex standard to make perfect validation of submission data impossible in the general case, some assumptions must be made. Most forms should work fine, but it is possible (and easy, if you know how) to create forms that yield submissions which are rejected as invalid. Likewise, there are some constructions which can allow invalid submissions to pass as valid. These limitations are documented in "VALIDATION", so please read that section carefully.

RATIONALE

In a networked scenario, XForms is a client-side technology. Having a Perl module may seem a bit useless, since Perl is usually used on the server side. On the other hand, everyone knows that user input should always be validated, but client-side validation is inherently untrusted.

There are several options for server-side validation of XML data, for example XML Schema or RelaxNG/Schematron. This module, in contrast, tries to deduce the allowed modifications directly from the XForms document that was used to build the input. It makes life easier for simple forms that do not warrant a full-blown XML Schema document. Most importantly, it is able to perform additional checks that are impossible with standalone schema validation, like readonly value enforcement and calculation result checks.

VALIDATION

The submitted data is checked, and a result instance is built according to the following rules. Only if all checks succeed will the submitted instance be declared valid. Note that if a model item property relies on content of a non-relevant instance node, behaviour is undefined, since non-relevant nodes are not submitted.

Comparison to the original instance, relevant MIP check

The element tree must be equal to the original instance. If there are more nodes than in the original, validation fails. If nodes are missing, they are copied from the original instance to the result instance. For these added nodes, the relevant model item property must evaluate to false. If any added nodes are relevant, validation fails. If any non-added nodes are non-relevant, validation fails.

Only elements and attributes are checked (actually, their localName and namespaceURI). Text content is checked later, and all other nodes are ignored.

xforms:insert and xforms:delete are not processed, which means that instances that contain additional or less elements due to these actions are regarded as invalid, even though it may be valid to create such instances.

readonly nodes, unreferenced nodes

If a node is read-only in both, the original and the submitted instance, it will be reset to the original value. Validation continues, as the node might have been non-readonly at some time during user interaction. Otherwise, modification is allowed freely. Instance nodes not referenced by any form control or setvalue action are treated as readonly.

This step may alter whitespace-only text nodes in some rare cases, since some guessing is involved when non-relevant nodes are present.

Note that readonly checks may not work correctly if binding expressions reference text nodes directly (instead of their parent elements).

required, constraint, calculate and type model item properties

Only relevant nodes are checked in this step. Validation fails:

  • if the string length of any required node's text content is zero

  • if any node's constraint model item property evaluates to false()

  • if any node's calculate model item property evaluates to a value different than that node's text content

  • if any node's text content isn't valid according to the type model item property

For type, only of the built-in data types as specified in section 5 of the XForms specification are supported. Even this is incomplete, see XML::Schema::Type::Builtin. xsi:type attributes are not checked.

XML Schema validation

Schema documents may be specified by using the xsi:schemaLocation or xsi:noNamespaceSchemaLocation attributes on the original instance(s) root node(s). Each instance is validated using it's own XML Schema(s).

If the schema option is given, the given XML Schema will be used to validate the submitted data. No result instance is built, and none of the above checks are done. This is useful if the above assumptions and limitations reject valid documents. This can happen if the XForms document uses scripting, expressions that rely on non-relevant nodes, or certain combinations of XForms Actions. On success, the submission data is returned as a DOM tree.

METHODS AND FUNCTIONS

new(%options)

Creates a new validator object which contains preprocessed data structures. Thus, OO usage will need less processing time if multiple validations against one XForms model are done.

validate(%options)

Perform actual validation. Returns a hash of XML::LibXML::Document object on success (keyed by instance id, empty key '' for the default instance), or a plain string containing an error message in English language. Since validation errors are not supposed to occur on well-behaving XForms clients, no way to localize these messages is provided.

May be called as function or object method.

normalize($dom, $keep_extra_namespaces)

Normalize an XML::LibXML::Document (or a hash as returned by validate) by converting it (all of them) to its canonicalized form and stripping anything that is not an element, attribute, text node, or namespace node. It will strip nodes in the XInclude namespace. It will also strip namespace nodes that are unused unless you specify a true value as second parameter. It will return a new XML::LibXML::Document (or hash, respectively). The original DOM tree will be left unmodified.

The result should not contain any security-relevant or unexpected content anymore so that it is safe for further processing.

May be called as function or object method, and as a convenience, it will pass through strings unmodified.

OPTIONS

Behavior of the validator is controlled via named options.

For OO usage, the constructor takes the xforms, model and base options. These are ignored on the validate method call.

xforms

An XML document that contains at least one xforms:model element. The value is interpreted like this:

  • A plain scalar is taken as file name to parse as XML.

  • A scalarref is taken as reference to an XML string.

  • A GLOB or IO::Handle is taken as file handle to parse as XML.

  • An XML::LibXML::Document object is used as-is.

input

The submitted instance. Input type is autodetected using these rules:

  • A plain scalar is taken as file name to parse as XML.

  • A scalarref is taken as reference to an XML string.

  • A GLOB or IO::Handle is taken as file handle to parse as XML.

  • An XML::LibXML::Document object is used as-is.

  • A hashref is taken as a hash of parsed POST/GET parameters. Values may be arrayrefs if a parameter was submitted multiple times.

  • An arrayref is taken as a list of [ name => $value ] arrayref pairs, with multiple occurences of name permitted. The list may instead be flattened.

The latter two data types are used for multipart/form-data and application/x-www-form-urlencoded serializations. Note that rebuilding the instance from these involves a certain amount of guessing. If any element local-name occurs more than once in the submitted instance, correct association of submitted values with DOM nodes may fail.

The other data types assume text/xml serialization. multipart/related is currently unsupported.

base

A base URL for external references. Relative URLs are resolved as per the xml:base specification. This is only used for the src attribute of xforms:instance elements. For security reasons, no external DTD subsets, external entities or XIncludes are processed.

model

The model id to use, in case there are multiple models in the XForms file. If not specified, the first model in document order is used.

The contained instances (including those specified via the src attribute) are considered trusted. External references might be retrieved and XML Schema information is honoured (except when noted otherwise). Never use unchecked user input as original instance data!

submission

The id of a submission element that was used to submit the input. If not given, the first submission element is used.

instance

Override for instance data. If given and defined, the value is interpreted similar to the xforms option. The default xforms:instance node in the model is replaced by the resulting XML data.

If a hashref is given, keys are instance IDs to replace, and the corresponding values are processed as above.

schema

An XML Schema document that will be used for schema validation of the submitted instance instead of the usual checks. Value is a URL or file name relative to the current working directory.

SECURITY

Since validation is inherently about security, there are a few measures to allow this module to be used with potentially untrusted input:

  • Submitted input is considered untrusted: No DTD or XInclude processing is done, consequently no external entity references are resolved. No network access is allowed except for parsing the XForms document.

  • The readonly check semantics make sure that nodes that carry a constant readonly model item property are in fact unmodified, e.g. for immutable document IDs. Note that this may interfer with script-based modification of the instance data, which can't be detected. The XForms Action module is mostly accounted for, however.

  • The input document is checked as described above. This means that despite validation, there can be additional namespace declarations, processing instructions, comments, CDATA sections instead of text nodes, unresolved entity references, internal subsets and possibly more things you wouldn't expect. As a convenience, a normalize utility function is provided, which tries to ensure no content is present which could compromise security.

  • XML Schema validation can make sure that the result honours constraints not expressed in the XForms document. The schema parameter even allows to bypass the usual checks and rely solely on this.

  • Various information is taken from the XForms document, hence it is considered trusted, including any referenced instance data. This is particularly important if you incorporate submitted and validated data into your data storage: always normalize or postprocess.

XForms validation has some inherent limitations. It is difficult to associate original instance nodes with their corresponding submitted instance nodes, especially for text nodes. Furthermore, submissions do not contain non-relevant nodes, thus part of the DOM tree is guessed. See "VALIDATION" above for a detailed description of checks and their individual limitations.

EXPORT

None by default.

The validate and normalize functions can be imported on request. Both can be used as standalone functions or as object methods.

KNOWN BUGS / TODO

  • Construction of instance data in case none was specified isn't 100% standards conformant. For some highly unlikely forms, this may lead to rejection of valid submissions.

  • Nesting of form controls that belong to different models is may lead to undefined behaviour (nodes interpreted as readonly even though they aren't).

  • multipart/related is unsupported

  • Currently, XML::Schema is used for data type support, which isn't terribly complete. When XML::LibXML gets a binding for libxml's XML Schema Datatypes implementation, it will be used instead.

  • RelaxNG validation.

  • More XForms Action processing to allow and verify added/deleted nodesets.

  • check if calculate processing is too strict and should recalculate the whole document instead (according to the XForms rules).

  • xsi:type and xsi:nil processing

SEE ALSO

The XForms 1.0 specification.

XML::LibXML and http://www.libxml.org for supported features, especially regarding XML Schema validation (which isn't complete as of writing this documentation).

XML::Schema for supported data types.

AUTHOR

Jörg Walter, <info@syntax-k.de>

COPYRIGHT AND LICENSE

Copyright (C) 2008 by Jörg Walter

This library is free software; you can redistribute it and/or modify it under the same terms as Perl version 5.8.0 itself.

6 POD Errors

The following errors were encountered while parsing the POD:

Around line 993:

=back doesn't take any parameters, but you said =back 4

Around line 1066:

=back doesn't take any parameters, but you said =back 4

Around line 1088:

=back doesn't take any parameters, but you said =back 4

Around line 1168:

=back doesn't take any parameters, but you said =back 4

Around line 1209:

=back doesn't take any parameters, but you said =back 4

Around line 1223:

Non-ASCII character seen before =encoding in 'Jörg'. Assuming UTF-8