NAME

XML::PugiXML - Perl binding for pugixml C++ XML parser

SYNOPSIS

use XML::PugiXML;

my $doc = XML::PugiXML->new;
$doc->load_string('<root><item id="1">Hello</item></root>');

my $root = $doc->root;
my $item = $root->child('item');
print $item->text, "\n";           # Hello
print $item->attr('id')->value, "\n";  # 1

# XPath
my $node = $doc->select_node('//item[@id="1"]');
print $node->text, "\n";

# Compiled XPath (faster for repeated queries)
my $xpath = $doc->compile_xpath('//item');
my @items = $xpath->evaluate_nodes($root);

# Modification
my $new = $root->append_child('item');
$new->set_text('World');
$new->set_attr('id', '2');  # Convenience method
$doc->save_file('output.xml');

# Formatting options
print $doc->to_string("  ", XML::PugiXML::FORMAT_INDENT());

# Node cloning
my $copy = $root->append_copy($item);

DESCRIPTION

XML::PugiXML provides a Perl interface to the pugixml C++ XML parsing library. It offers fast parsing, XPath support, and a clean API. All string inputs are automatically upgraded to UTF-8, and all outputs are UTF-8 flagged.

METHODS

XML::PugiXML (Document)

new()

Create a new empty XML document.

load_file($path, $parse_options?)

Load and parse XML from a file. Returns true on success. Optional $parse_options (default PARSE_DEFAULT).

load_string($xml, $parse_options?)

Parse XML from a string. Returns true on success. Optional $parse_options (default PARSE_DEFAULT).

save_file($path, $indent?, $flags?)

Save the document to a file. Returns true on success. Optional $indent (default "\t") and $flags (default FORMAT_DEFAULT).

to_string($indent?, $flags?)

Serialize the document to an XML string. Optional $indent (default "\t") and $flags (default FORMAT_DEFAULT).

reset()

Clear the document, removing all nodes. Existing Node and Attr handles become stale -- accessing them will croak with "Stale node/attribute handle". Use valid() to check without croaking. The same applies after load_file() or load_string() replaces content.

root()

Return the document element (root node).

child($name)

Get a direct child by name.

select_node($xpath)

Execute XPath query, return single result. Returns an XML::PugiXML::Node or XML::PugiXML::Attr depending on the query.

select_nodes($xpath)

Execute XPath query, return list of results. Returns a mix of XML::PugiXML::Node and XML::PugiXML::Attr objects as appropriate.

compile_xpath($xpath)

Compile an XPath expression for repeated use. Returns an XML::PugiXML::XPath object.

Format Constants

FORMAT_DEFAULT()

Default formatting (indent with tabs).

FORMAT_INDENT()

Indent output.

FORMAT_NO_DECLARATION()

Omit XML declaration.

FORMAT_RAW()

No formatting (compact output).

FORMAT_WRITE_BOM()

Write BOM (byte order mark).

Parse Constants

PARSE_DEFAULT()

Default parsing options.

PARSE_MINIMAL()

Minimal parsing (fastest, no comments/PI/DOCTYPE).

PARSE_PI()

Parse processing instructions.

PARSE_COMMENTS()

Parse comments.

PARSE_CDATA()

Parse CDATA sections.

PARSE_WS_PCDATA()

Preserve whitespace-only PCDATA nodes.

PARSE_ESCAPES()

Parse character/entity references.

PARSE_EOL()

Normalize end-of-line characters.

PARSE_DECLARATION()

Parse XML declaration.

PARSE_DOCTYPE()

Parse DOCTYPE.

PARSE_FULL()

Full parsing (all features enabled).

XML::PugiXML::Node

name(), value(), text()

Get node name, value, or text content.

type()

Return the node type as an integer. Values: 0=null, 1=document, 2=element, 3=pcdata, 4=cdata, 5=comment, 6=pi, 7=declaration.

path($delimiter?)

Return the absolute XPath path to this node. Default delimiter is '/'.

hash()

Return a hash value for this node. Useful for comparison.

offset_debug()

Return the source offset of this node (for debugging).

valid()

Return true if this is a valid node handle.

root()

Return the document element from any node (consistent with $doc->root).

parent()

Get parent node.

first_child(), last_child()

Get first or last child node.

next_sibling($name?), previous_sibling($name?)

Get next or previous sibling. Optionally filter by name.

child($name)

Get a named child node.

children($name?)

Return list of child nodes, optionally filtered by name.

find_child_by_attribute($tag, $attr_name, $attr_value)

Find first child with given tag name and attribute value.

Attributes

attr($name)

Get attribute by name.

attrs()

Return list of all attributes.

set_attr($name, $value)

Set attribute value (creates if doesn't exist). Returns the attribute.

append_attr($name), prepend_attr($name)

Add attribute at end or beginning.

remove_attr($name)

Remove an attribute by name. Returns true on success.

Modification

append_child($name), prepend_child($name)

Add child element at end or beginning.

insert_child_before($name, $ref_node), insert_child_after($name, $ref_node)

Insert child element before or after a reference node.

append_copy($source), prepend_copy($source)

Clone and append/prepend a node (deep copy).

insert_copy_before($source, $ref), insert_copy_after($source, $ref)

Clone and insert node before/after reference.

append_cdata($content)

Add a CDATA section with the given content.

append_comment($content)

Add a comment node with the given content.

append_pi($target, $data?)

Add a processing instruction. E.g., <?target data?>

remove_child($node)

Remove a child node. Returns true on success.

set_name($name), set_value($value), set_text($text)

Modify node properties.

XPath

select_node($xpath)

Execute XPath relative to this node, return single result (Node or Attr).

select_nodes($xpath)

Execute XPath relative to this node, return list of results (Node and/or Attr).

XML::PugiXML::Attr

name(), value()

Get attribute name and value.

as_int(), as_uint()

Get value as 32-bit signed/unsigned integer.

as_llong(), as_ullong()

Get value as 64-bit signed/unsigned integer. On 32-bit Perl (IVSIZE < 8), returns a string to avoid truncation.

as_double()

Get value as floating-point number.

as_bool()

Get value as boolean (recognizes "true", "1", "yes", "on").

element()

Return the parent element node that owns this attribute.

set_value($value)

Set attribute value.

valid()

Return true if this is a valid attribute handle.

XML::PugiXML::XPath (Compiled Queries)

evaluate_node($context_node)

Evaluate XPath and return single result (Node or Attr).

evaluate_nodes($context_node)

Evaluate XPath and return list of results (Node and/or Attr).

evaluate_string($context_node)

Evaluate XPath and return string result.

evaluate_number($context_node)

Evaluate XPath and return numeric result.

evaluate_boolean($context_node)

Evaluate XPath and return boolean result.

ERROR HANDLING

Parse and save operations return false on failure and set $@ with an error message. XPath syntax errors throw exceptions via croak().

# Parse errors - check return value
my $ok = $doc->load_string('<bad>');
if (!$ok) {
    warn "Parse failed: $@";
}

# XPath errors - use eval
eval { $doc->select_node('[invalid'); };
if ($@) {
    warn "XPath error: $@";
}

MEMORY MODEL

Node and attribute handles keep the parent document alive through reference counting. You can safely use a node after the document variable goes out of scope:

my $node;
{
    my $doc = XML::PugiXML->new;
    $doc->load_string('<root><item/></root>');
    $node = $doc->root->child('item');
}
# $node is still valid here

PERFORMANCE

Benchmarked against XML::LibXML (100-5000 element documents):

Parsing:          8-12x faster
XPath queries:    2-13x faster
Tree traversal:   15-17x faster
DOM modification: 2-11x faster
Serialization:    2-4x faster

See bench/benchmark.pl for details.

SECURITY

This module uses pugixml which does NOT process external entities (XXE) by default, making it safe against XXE attacks.

THREAD SAFETY

Different document instances can be used in different threads safely. Concurrent access to the same document from multiple threads is not safe.

AUTHOR

vividsnow

LICENSE

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.