XML::Filter::DOMFilter::LibXML - SAX Filter allowing DOM processing of selected subtrees


  use XML::LibXML;
  use XML::Filter::DOMFilter::LibXML;

  my $filter = XML::Filter::DOMFilter::LibXML->new(
        Handler => $handler,
        XPathContext => XML::LibXML::XPathContext->new(),
        Process => [
                    '/foo[@A='aaa']/*/bar'    => \&process_bar,
                    'baz[parent::*/@B='bbb']' => \&process_baz

  my $parser = XML::SAX::YourFavoriteDriver->new( Handler => $filter );

  # Some DOM processing

  sub process_bar {
    my ($node)=@_;
    my $doc=$node->ownerDocument;
    $node->appendTextChild("note","hallo world!");

  sub process_baz {
    my ($node)=@_;


This module provides a compromise between SAX and DOM processing by allowing to use DOM API to process only reasonably small parts of an XML document. It works as a SAX filter temporarily building small DOM trees around parts selected by given XPath expressions (with some limitations, see "LIMITATIONS").

The filter has two states which will be refered to as A and B here. The initial state of the filter is A.

In the state A, only a limited vertical portion of the DOM tree is built. All SAX events other than start_element are immediatelly passed to Handler. On start_element event, a new element node is created in the DOM tree. All possible existing siblings of the newly created node are removed. Thus, while in state A, there is exactly one node on every level of the tree. Now all the XPath expressions are checked in the context of the newly created node. If none of the expressions matches, the parser remains in state A and passes the start_element event to Handler. Otherwise, the callback associated with the first expression that matched is remembered and the parser changes its state to B.

In state B the filter builds a complete DOM subtree of the new element according to the incomming events. No events are passed to Handler at this stage. When the subtree is complete (i.e. the corresponding end-tag is encountered), the callback associated with the XPath expression that matched is executed. The root element of the subtree is passed to the callback subroutine as the only argument.

The callback is allowed to do any DOM operations on the DOM subtree, even to replace it with one or more new subtrees. The callack must, however, preserve the element's parent node as well as all its ancestor nodes intact. Failing to do so can result in an error or unpredictable results.

When the callback returns, all subtrees that now appear in the DOM tree under the original element parent are serialized to SAX events and passed to Handler. After that, they are deleted from the DOM tree and the filter returns to state A.


Note that this type of processing highly limits the amount of information the XPath engine can use. Most notably, elements cannot be selected by their content. The only information present in the tree at the time of the XPath evaluation is the element's name and attributes and the same information for all its ancestors. There is nothing known about possible child nodes of the element as well as of its position within its siblings at the time the XPath expressions are evaluated.


This filter is built upon XML::LibXML::SAX::Builder module.


This is the constructor for this object. It takes a several parameters, some of which are optional.

         Handler => $handler,
         XPathContext => $xpath_context,
         Process => [ XPath => Code, XPath => Code, ... ]

Handler - Optional output SAX handler.

XPathContext - Optional XML::LibXML::XPathContext object to be used for XPath queries. In some cases it might be useful as it allows registering namespace prefixes etc.

Process - Required. An array reference of the form [ XPath => Code, XPath => Code, ...] where XPath is a string containing an XPath expression and Code is a callback CODE reference.




Petr Pajas, <>


XML::LibXML, XML::LibXML::SAX, XML::LibXML::XPathContext.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 300:

Expected text after =item, not a bullet