The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

XML::XPath::XMLParser - The default XML parsing class that produces a node tree

SYNOPSIS

        my $parser = XML::XPath::XMLParser->new(
                                filename => $self->get_filename,
                                xml => $self->get_xml,
                                ioref => $self->get_ioref,
                                parser => $self->get_parser,
                        );
        my $root_node = $parser->parse;

DESCRIPTION

This module generates a node tree for use as the context node for XPath processing. It aims to be a quick parser, nothing fancy, and yet has to store more information than most parsers. To achieve this I've used array refs everywhere - no hashes. I don't have any performance figures for the speedups achieved, so I make no appologies for anyone not used to using arrays instead of hashes. I think they make good sense here where we know the attributes of each type of node.

Node Structure

All nodes have the same first 3 entries in the array: node_type, node_parent and node_pos. The node_type entry contains a string saying what type the current node is. The node_parent always contains an entry for the parent of the current node - except for the root node which has undef in there. And node_pos is the position of this node in the array that it is in (think: $node == $node->[node_parent]->[node_children]->[$node->[node_pos]] )

Nodes are structured as follows:

Root Node

The root node is just an element node with no parent.

        [ 'element', # node_type
          undef, # node_parent - check for undef to identify root node
          undef, # node_pos
          undef, # node_prefix
          [ ... ], # node_children (see below)
        ]

Element Node

        [ 'element', # node_type
          $parent, # node_parent
          <position in current array>, # node_pos
          'xxx', # node_prefix - namespace prefix on this element
          [ ... ], # node_children
          'yyy', # node_name - element name
          [ ... ], # node_attribs - attributes on this element
          [ ... ], # node_namespaces - namespaces currently in scope
        ]

Attribute Node

        [ 'attribute', # node_type
          $parent, # node_parent - the element node
          <position in current array>, # node_pos
          'xxx', # node_prefix - namespace prefix on this element
          'href', # node_key - attribute name
          'ftp://ftp.com/', # node_value - value in the node
        ]

Namespace Nodes

Each element has an associated set of namespace nodes that are currently in scope. Each namespace node stores a prefix and the expanded name (retrieved from the xmlns:prefix="..." attribute).

        [ 'namespace',
          $parent,
          <pos>,
          'a', # node_prefix - the namespace as it was written as a prefix
          'http://my.namespace.com', # node_expanded - the expanded name.
        ]

Text Nodes

        [ 'text',
          $parent,
          <pos>,
          'This is some text' # node_text - the text in the node
        ]

Comment Nodes

        [ 'comment',
          $parent,
          <pos>,
          'This is a comment' # node_comment
        ]

Processing Instruction Nodes

        [ 'pi',
          $parent,
          <pos>,
          'target', # node_target
          'data', # node_data
        ]

Functions

There are a couple of utility function in here, located here because this is where specific knowledge of the node structure is.

as_string($node)

When passed a node this will correctly dump out XML that corresponds to that node. (actually that's not strictly true - if you pass it anything other than an element node then it won't be proper XML at all). It should do all the appropriate escaping, etc.

string_value($node)

This returns the "string-value" of a node, as per the spec. It probably doesn't need to be used by anyone except people developing XPath routines.

NOTICES

This file is distributed as part of the XML::XPath module, and is copyright 2000 Fastnet Software Ltd. Please see the documentation for the module as a whole for licencing information.