NAME

XML::PugiXML - Perl binding for pugixml C++ XML parser

SYNOPSIS

use XML::PugiXML;

my $doc = XML::PugiXML->new;
$doc->load_string('<root><item id="1">Hello</item></root>');

my $root = $doc->root;
my $item = $root->child('item');
print $item->text, "\n";           # Hello
print $item->attr('id')->value, "\n";  # 1

# XPath
my $node = $doc->select_node('//item[@id="1"]');
print $node->text, "\n";

# Compiled XPath (faster for repeated queries)
my $xpath = $doc->compile_xpath('//item');
my @items = $xpath->evaluate_nodes($root);

# Modification
my $new = $root->append_child('item');
$new->set_text('World');
$new->set_attr('id', '2');  # Convenience method
$doc->save_file('output.xml');

# Formatting options
print $doc->to_string("  ", XML::PugiXML::FORMAT_INDENT());

# Node cloning
my $copy = $root->append_copy($item);

DESCRIPTION

XML::PugiXML provides a Perl interface to the pugixml C++ XML parsing library. It offers fast parsing, XPath support, and a clean API.

METHODS

XML::PugiXML (Document)

new()

Create a new empty XML document.

load_file($path, $parse_options?)

Load and parse XML from a file. Returns true on success. Optional $parse_options (default PARSE_DEFAULT).

load_string($xml, $parse_options?)

Parse XML from a string. Returns true on success. Optional $parse_options (default PARSE_DEFAULT).

save_file($path, $indent?, $flags?)

Save the document to a file. Returns true on success. Optional $indent (default "\t") and $flags (default FORMAT_DEFAULT).

to_string($indent?, $flags?)

Serialize the document to an XML string. Optional $indent (default "\t") and $flags (default FORMAT_DEFAULT).

reset()

Clear the document, removing all nodes. Existing Node and Attr handles become invalid after this call.

root()

Return the document element (root node).

child($name)

Get a direct child by name.

select_node($xpath)

Execute XPath query, return single node result.

select_nodes($xpath)

Execute XPath query, return list of nodes.

Note: XPath expressions that select attributes (e.g. //@id) are not supported and will return undef/empty list. Only element nodes are returned.

compile_xpath($xpath)

Compile an XPath expression for repeated use. Returns an XML::PugiXML::XPath object.

Format Constants

FORMAT_DEFAULT()

Default formatting (indent with tabs).

FORMAT_INDENT()

Indent output.

FORMAT_NO_DECLARATION()

Omit XML declaration.

FORMAT_RAW()

No formatting (compact output).

FORMAT_WRITE_BOM()

Write BOM (byte order mark).

Parse Constants

PARSE_DEFAULT()

Default parsing options.

PARSE_MINIMAL()

Minimal parsing (fastest, no comments/PI/DOCTYPE).

PARSE_PI()

Parse processing instructions.

PARSE_COMMENTS()

Parse comments.

PARSE_CDATA()

Parse CDATA sections.

PARSE_WS_PCDATA()

Preserve whitespace-only PCDATA nodes.

PARSE_ESCAPES()

Parse character/entity references.

PARSE_EOL()

Normalize end-of-line characters.

PARSE_DECLARATION()

Parse XML declaration.

PARSE_DOCTYPE()

Parse DOCTYPE.

PARSE_FULL()

Full parsing (all features enabled).

XML::PugiXML::Node

name(), value(), text()

Get node name, value, or text content.

type()

Return the node type as an integer. Values: 0=null, 1=document, 2=element, 3=pcdata, 4=cdata, 5=comment, 6=pi, 7=declaration.

path($delimiter?)

Return the absolute XPath path to this node. Default delimiter is '/'.

hash()

Return a hash value for this node. Useful for comparison.

offset_debug()

Return the source offset of this node (for debugging).

valid()

Return true if this is a valid node handle.

root()

Return the document node (type=1) from any node. Note: this returns the document node, not the document element. Use $node->root->first_child to get the document element from a node.

parent()

Get parent node.

first_child(), last_child()

Get first or last child node.

next_sibling($name?), previous_sibling($name?)

Get next or previous sibling. Optionally filter by name.

child($name)

Get a named child node.

children($name?)

Return list of child nodes, optionally filtered by name.

find_child_by_attribute($tag, $attr_name, $attr_value)

Find first child with given tag name and attribute value.

Attributes

attr($name)

Get attribute by name.

attrs()

Return list of all attributes.

set_attr($name, $value)

Set attribute value (creates if doesn't exist). Returns the attribute.

append_attr($name), prepend_attr($name)

Add attribute at end or beginning.

remove_attr($name)

Remove an attribute by name. Returns true on success.

Modification

append_child($name), prepend_child($name)

Add child element at end or beginning.

insert_child_before($name, $ref_node), insert_child_after($name, $ref_node)

Insert child element before or after a reference node.

append_copy($source), prepend_copy($source)

Clone and append/prepend a node (deep copy).

insert_copy_before($source, $ref), insert_copy_after($source, $ref)

Clone and insert node before/after reference.

append_cdata($content)

Add a CDATA section with the given content.

append_comment($content)

Add a comment node with the given content.

append_pi($target, $data?)

Add a processing instruction. E.g., <?target data?>

remove_child($node)

Remove a child node. Returns true on success.

set_name($name), set_value($value), set_text($text)

Modify node properties.

XPath

select_node($xpath)

Execute XPath relative to this node, return single node.

select_nodes($xpath)

Execute XPath relative to this node, return list of nodes.

Note: XPath expressions that select attributes (e.g. //@id) are not supported and will return undef/empty list. Only element nodes are returned.

XML::PugiXML::Attr

name(), value()

Get attribute name and value.

as_int(), as_uint()

Get value as 32-bit signed/unsigned integer.

as_llong(), as_ullong()

Get value as 64-bit signed/unsigned integer.

as_double()

Get value as floating-point number.

as_bool()

Get value as boolean (recognizes "true", "1", "yes", "on").

set_value($value)

Set attribute value.

valid()

Return true if this is a valid attribute handle.

XML::PugiXML::XPath (Compiled Queries)

evaluate_node($context_node)

Evaluate XPath and return single node result.

evaluate_nodes($context_node)

Evaluate XPath and return list of nodes.

Note: evaluate_node and evaluate_nodes only return element nodes. Attribute-selecting XPath expressions return undef/empty list.

evaluate_string($context_node)

Evaluate XPath and return string result.

evaluate_number($context_node)

Evaluate XPath and return numeric result.

evaluate_boolean($context_node)

Evaluate XPath and return boolean result.

ERROR HANDLING

Parse and save operations return false on failure and set $@ with an error message. XPath syntax errors throw exceptions via croak().

# Parse errors - check return value
my $ok = $doc->load_string('<bad>');
if (!$ok) {
    warn "Parse failed: $@";
}

# XPath errors - use eval
eval { $doc->select_node('[invalid'); };
if ($@) {
    warn "XPath error: $@";
}

MEMORY MODEL

Node and attribute handles keep the parent document alive through reference counting. You can safely use a node after the document variable goes out of scope:

my $node;
{
    my $doc = XML::PugiXML->new;
    $doc->load_string('<root><item/></root>');
    $node = $doc->root->child('item');
}
# $node is still valid here

PERFORMANCE

Benchmarked against XML::LibXML (100-5000 element documents):

Parsing:          8-12x faster
XPath queries:    2-13x faster
Tree traversal:   15-17x faster
DOM modification: 2-11x faster
Serialization:    2-4x faster

See bench/benchmark.pl for details.

SECURITY

This module uses pugixml which does NOT process external entities (XXE) by default, making it safe against XXE attacks.

THREAD SAFETY

This module is not thread-safe. Each thread should use its own document instances.

AUTHOR

vividsnow

LICENSE

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.