NAME
XML::PugiXML - Perl binding for pugixml C++ XML parser
SYNOPSIS
use XML::PugiXML;
my $doc = XML::PugiXML->new;
$doc->load_string('<root><item id="1">Hello</item></root>');
my $root = $doc->root;
my $item = $root->child('item');
print $item->text, "\n"; # Hello
print $item->attr('id')->value, "\n"; # 1
# XPath
my $node = $doc->select_node('//item[@id="1"]');
print $node->text, "\n";
# Compiled XPath (faster for repeated queries)
my $xpath = $doc->compile_xpath('//item');
my @items = $xpath->evaluate_nodes($root);
# Modification
my $new = $root->append_child('item');
$new->set_text('World');
$new->set_attr('id', '2'); # Convenience method
$doc->save_file('output.xml');
# Formatting options
print $doc->to_string(" ", XML::PugiXML::FORMAT_INDENT());
# Node cloning
my $copy = $root->append_copy($item);
DESCRIPTION
XML::PugiXML provides a Perl interface to the pugixml C++ XML parsing library. It offers fast parsing, XPath support, and a clean API. All string inputs are automatically upgraded to UTF-8, and all outputs are UTF-8 flagged.
METHODS
XML::PugiXML (Document)
- new()
-
Create a new empty XML document.
- load_file($path, $parse_options?)
-
Load and parse XML from a file. Returns true on success. Optional $parse_options (default PARSE_DEFAULT).
- load_string($xml, $parse_options?)
-
Parse XML from a string. Returns true on success. Optional $parse_options (default PARSE_DEFAULT).
- save_file($path, $indent?, $flags?)
-
Save the document to a file. Returns true on success. Optional $indent (default "\t") and $flags (default FORMAT_DEFAULT).
- to_string($indent?, $flags?)
-
Serialize the document to an XML string. Optional $indent (default "\t") and $flags (default FORMAT_DEFAULT).
- reset()
-
Clear the document, removing all nodes. Existing Node and Attr handles become stale -- accessing them will croak with "Stale node/attribute handle". Use
valid()to check without croaking. The same applies afterload_file()orload_string()replaces content. - root()
-
Return the document element (root node).
- child($name)
-
Get a direct child by name.
- select_node($xpath)
-
Execute XPath query, return single result. Returns an
XML::PugiXML::NodeorXML::PugiXML::Attrdepending on the query. - select_nodes($xpath)
-
Execute XPath query, return list of results. Returns a mix of
XML::PugiXML::NodeandXML::PugiXML::Attrobjects as appropriate. - compile_xpath($xpath)
-
Compile an XPath expression for repeated use. Returns an XML::PugiXML::XPath object.
Format Constants
- FORMAT_DEFAULT()
-
Default formatting (indent with tabs).
- FORMAT_INDENT()
-
Indent output.
- FORMAT_NO_DECLARATION()
-
Omit XML declaration.
- FORMAT_RAW()
-
No formatting (compact output).
- FORMAT_WRITE_BOM()
-
Write BOM (byte order mark).
Parse Constants
- PARSE_DEFAULT()
-
Default parsing options.
- PARSE_MINIMAL()
-
Minimal parsing (fastest, no comments/PI/DOCTYPE).
- PARSE_PI()
-
Parse processing instructions.
- PARSE_COMMENTS()
-
Parse comments.
- PARSE_CDATA()
-
Parse CDATA sections.
- PARSE_WS_PCDATA()
-
Preserve whitespace-only PCDATA nodes.
- PARSE_ESCAPES()
-
Parse character/entity references.
- PARSE_EOL()
-
Normalize end-of-line characters.
- PARSE_DECLARATION()
-
Parse XML declaration.
- PARSE_DOCTYPE()
-
Parse DOCTYPE.
- PARSE_FULL()
-
Full parsing (all features enabled).
XML::PugiXML::Node
- name(), value(), text()
-
Get node name, value, or text content.
- type()
-
Return the node type as an integer. Values: 0=null, 1=document, 2=element, 3=pcdata, 4=cdata, 5=comment, 6=pi, 7=declaration.
- path($delimiter?)
-
Return the absolute XPath path to this node. Default delimiter is '/'.
- hash()
-
Return a hash value for this node. Useful for comparison.
- offset_debug()
-
Return the source offset of this node (for debugging).
- valid()
-
Return true if this is a valid node handle.
- root()
-
Return the document element from any node (consistent with
$doc->root).
Navigation
- parent()
-
Get parent node.
- first_child(), last_child()
-
Get first or last child node.
- next_sibling($name?), previous_sibling($name?)
-
Get next or previous sibling. Optionally filter by name.
- child($name)
-
Get a named child node.
- children($name?)
-
Return list of child nodes, optionally filtered by name.
- find_child_by_attribute($tag, $attr_name, $attr_value)
-
Find first child with given tag name and attribute value.
Attributes
- attr($name)
-
Get attribute by name.
- attrs()
-
Return list of all attributes.
- set_attr($name, $value)
-
Set attribute value (creates if doesn't exist). Returns the attribute.
- append_attr($name), prepend_attr($name)
-
Add attribute at end or beginning.
- remove_attr($name)
-
Remove an attribute by name. Returns true on success.
Modification
- append_child($name), prepend_child($name)
-
Add child element at end or beginning.
- insert_child_before($name, $ref_node), insert_child_after($name, $ref_node)
-
Insert child element before or after a reference node.
- append_copy($source), prepend_copy($source)
-
Clone and append/prepend a node (deep copy).
- insert_copy_before($source, $ref), insert_copy_after($source, $ref)
-
Clone and insert node before/after reference.
- append_cdata($content)
-
Add a CDATA section with the given content.
- append_comment($content)
-
Add a comment node with the given content.
- append_pi($target, $data?)
-
Add a processing instruction. E.g.,
<?target data?> - remove_child($node)
-
Remove a child node. Returns true on success.
- set_name($name), set_value($value), set_text($text)
-
Modify node properties.
XPath
- select_node($xpath)
-
Execute XPath relative to this node, return single result (Node or Attr).
- select_nodes($xpath)
-
Execute XPath relative to this node, return list of results (Node and/or Attr).
XML::PugiXML::Attr
- name(), value()
-
Get attribute name and value.
- as_int(), as_uint()
-
Get value as 32-bit signed/unsigned integer.
- as_llong(), as_ullong()
-
Get value as 64-bit signed/unsigned integer. On 32-bit Perl (IVSIZE < 8), returns a string to avoid truncation.
- as_double()
-
Get value as floating-point number.
- as_bool()
-
Get value as boolean (recognizes "true", "1", "yes", "on").
- element()
-
Return the parent element node that owns this attribute.
- set_value($value)
-
Set attribute value.
- valid()
-
Return true if this is a valid attribute handle.
XML::PugiXML::XPath (Compiled Queries)
- evaluate_node($context_node)
-
Evaluate XPath and return single result (Node or Attr).
- evaluate_nodes($context_node)
-
Evaluate XPath and return list of results (Node and/or Attr).
- evaluate_string($context_node)
-
Evaluate XPath and return string result.
- evaluate_number($context_node)
-
Evaluate XPath and return numeric result.
- evaluate_boolean($context_node)
-
Evaluate XPath and return boolean result.
ERROR HANDLING
Parse and save operations return false on failure and set $@ with an error message. XPath syntax errors throw exceptions via croak().
# Parse errors - check return value
my $ok = $doc->load_string('<bad>');
if (!$ok) {
warn "Parse failed: $@";
}
# XPath errors - use eval
eval { $doc->select_node('[invalid'); };
if ($@) {
warn "XPath error: $@";
}
MEMORY MODEL
Node and attribute handles keep the parent document alive through reference counting. You can safely use a node after the document variable goes out of scope:
my $node;
{
my $doc = XML::PugiXML->new;
$doc->load_string('<root><item/></root>');
$node = $doc->root->child('item');
}
# $node is still valid here
PERFORMANCE
Benchmarked against XML::LibXML (100-5000 element documents):
Parsing: 8-12x faster
XPath queries: 2-13x faster
Tree traversal: 15-17x faster
DOM modification: 2-11x faster
Serialization: 2-4x faster
See bench/benchmark.pl for details.
SECURITY
This module uses pugixml which does NOT process external entities (XXE) by default, making it safe against XXE attacks.
THREAD SAFETY
Different document instances can be used in different threads safely. Concurrent access to the same document from multiple threads is not safe.
AUTHOR
vividsnow
LICENSE
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.