Michael Roberts
and 1 contributors

NAME

XML::Snap - Makes simple XML tasks a snap!

VERSION

Version 0.04

SYNOPSIS

XML::Snap is a quick and relatively modern way to work with XML. If, like me, you have little patience for the endless reams of standards the XML community burdens you with, maybe this is the module for you. If you want to maintain compatibility with normal people, though, and you want to avoid scaling problems later, you're probably better off sitting down and understanding XML::LibXML and the SAX ecosystem.

The other large omission from the model at present is namespaces. If you use namespaces (and honestly, most applications do) then again, you should be using libxml or one of the SAX parsers.

Still here? Cool. XML::Snap is my personal way of dealing with XML when I can't avoid it. It's roughly based on my experiences with my ANSI C library "xmlapi", which I wrote back in 2000 to wrap the Expat parser. Along the way, I ended up building a lot of handy functionality into the library that made C programming palatable - and a lot of that was string and list manipulation that Perl renders superfluous. So after working with a port for a while, I tossed it. This is what I ended up with.

XML::Snap works in DOM mode. That is, it reads in XML from a string or file and puts it into a tree for you to manipulate, then allows you to write it back out. The tree is pretty minimalistic. The children of a node can be either plain text (as strings) or elements (as XML::Snap objects or a subclass), and each element can have a hash of attributes. Order of attributes is maintained, as this is actually significant in XML. There is also a clear distinction between content and tags. So some of the drawbacks to XML::Simple are averted with this setup.

Right at the moment, comments in the XML are not preserved. If you need to work with XML comments, XML::Snap is not your module.

Right at the moment, a streaming mode (like SAX) is also not provided, but it's something I want to get to soon. In streaming mode, comments will be preserved, but not available to the user until further notice. But since streaming has not yet been implemented, that's kind of moot. Streaming will be implemented in a separate module, probably to be named XML::Skim.

Some examples!

   use XML::Snap;
   
   XML::Snap->load ('myfile.xml');
   my $query = XML::Snap->search ('mynode');
   while (my $hit = <$query>) {
       ... do things with $hit ...
   }
   

CREATING AND LOADING XML ELEMENTS

new (name, [attribute, value, ...])

The new function just creates a new, empty XML node, simple as that. It has a name and optional attributes with values. Note that the order of attributes will be retained. Duplicates are not permitted (storage is in a hash); this departs from the XML model so it might cause you troubles - but I know I've never personally encountered XML where it would make a difference.

parse (string), parse_with_refs (string)

The parse function uses the Expat parser wrapped in XML::Parse to parse the string supplied, building a tree from it. If you want text to be blessed scalar refs instead of just strings, use parse_with_refs. (This can be easier, depending on what you're going to do with the data structure later.)

load (filename)

The load function does the same as parse but takes a filename instead.

name, is

The name method returns the name of the node, that is, the tag used to create it, while the is method tests for equality to a given string (it's just a convenience function).

oob(key, value), unoob(key)

Sets/gets an out-of-band (OOB) value on a node. This isn't anything special, just a hash attached to each node, but it can be used by a template output for parameterization, and it doesn't affect the output or actions of the XML in any other way.

If a value isn't set in a given node, it will ask its parent.

Call unoob($key) to remove an OOB value, or unoob() to remove all OOB values on a node.

parent, ancestor, root

parent returns the node's parent, if it has been attached to a parent, while ancestor finds the ancestor with the tag you supply, or the root if you don't give a tag. root is provided as a shorthand for ancestor().

delete

Deletes a child from a node. Pass the actual reference to the child - or if you're using non-referenced text, the text itself. (In this case, duplicate text will all be deleted.)

detach

Detaches the node from its parent, if it is attached. This not only removes the parent reference, but also removes the child from its parent's list of children.

WORKING WITH ATTRIBUTES

Each tag in XML can have zero or more attributes, each of which has a value. Order is significant and preserved.

set, unset

The set method sets one or more attributes; its parameter list is considered to be key, value, key, value, etc. The unset method removes one or more attributes from the list.

get (attribute, default), attr_eq (attribute, value)

Obviously, get retrieves an attribute value - specify a default value to be used if the attribute is not found, otherwise returns undef.

Since it's inconvenient to test attributes that can be undefined, there's a attr_eq method that checks that the given attribute is defined and equal to the value given.

attrs (attribute list)

The attrs method retrieves a list of the attributes set.

getlist (attribute list)

The getlist method retrieves a list of attribute values given a list of attributes. (It's just a map.)

getctx (attribute, default)

The getctx method looks at an attribute in the given node, but if it's not found, looks in the parent instead. If there is no parent, the default value is returned.

attr_order (attribute list)

Moves the named attributes to the front of the list; if any appear that aren't set, they stay unset.

WORKING WITH PLAIN TEXT CONTENT

Depending on your needs, XML::Snap can store plain text embedded in an XML structure as simple strings, or as scalar references blessed to XML::Snap. Since text may therefore not be blessed, you need to handle it with care unless you're sure it's all references (by parsing with parse_with_refs, for instance).

istext

Returns a flag whether a given thing is text or not. "Text" means a scalar or a scalar reference; anything else will not be considered text.

This is a class method or an instance method - note that if you're using it as an instance method and you try to call it on a string, your call will die.

gettext

Returns the actual text of either a string (which is obviously just the string) or a scalar reference. Again, can be called as an instance method if you're sure it's an instance.

bless_text

Iterates through the node given, and converts all plain texts into referenced texts.

unbless_text

Iterates through the node given, and converts all referenced texts into plain texts.

WORKING WITH XML STRUCTURE

add, add_pretty

The add method adds nodes and text as children to the current node. The add_pretty method is a convenience method that ensures that there is a line break if a node is inserted directly at the beginning of its parent (this makes building human-readable XML easier).

In addition to nodes and text, you can also add a coderef. This will have no effect on normal operations except for appearing in the list of children for the node, but during writing operations (either for string output or to streams) the coderef will be called to retrieve an iterator that delivers XML snippets. Those snippets will be inserted into the output as though they appeared at the point in the structure where the coderef appears. Extraction from the iterator stops when it returns undef.

The next time the writer is used, the original coderef will be called again to retrieve a new iterator.

The writer functions (string, stringcontent, write, etc.) can be called with optional parameters that will be passed to each coderef in the structure, if any. This allows an XML::Snap structure to be used as a generic template, for example for writing XML structures extracted from database queries.

When adding a node that is already a child of another node, the source node will be copied into the target, not just added. (Otherwise confusion could ensue!)

Text is normally added as a simple string, but this can cause problems for consumers, as the output of an iterator might then return a mixture of unblessed strings and blessed nodes, so you end up having to test for blessedness when processing them. For ease of use, you can also add a reference to a string; it will work the same in terms of neighboring strings being coalesced, but they'll be stored as blessed string references. Then, use istext or is_node to determine what each element is when iterating through structure.

prepend, prepend_pretty

These do the same as add and add_pretty except at the beginning of the child list.

replacecontent, replacecontent_from

The replacecontent method first deletes the node's children, then calls add to add its parameters. Use replacecontent_from to use the children of the first parameter, with optional matches to effect filtration as the rest of the parameters.

These are holdovers from my old xmlapi C library, where I was using in-memory XML structures as "bags of data". Since Perl is basically built on bags of data to start with, I'm not sure these will ever get used in a real situation (certainly I've never needed them yet in Perl).

replace

The replace method is a little odd; it actually acts on the given node's parent, by replacing the callee with the passed parameters. In other words, the parent's children list is modified directly. If there's nothing provided as a replacement, this simply deletes the callee from its parent's child list.

children, elements

The children method just returns the list of children added with add (or the other addition-type methods). The elements method returns only those children that are elements, omitting text, comments, and generators.

COPYING AND TRANSFORMATION

copy, copy_from, filter

The copy method copies out a new node (recursively) that is independent, i.e. has no parent. If you give it some matches of the form [name, key, value, coderef], then the coderef will be called on the copy before it gets added, if the copy matches the match. If a match is just a coderef, it'll apply to all text instead.

filter is just an alias that's a little more self-documenting.

Note that the transformations specified will not fire for the root node you're copying, just its children.

STRING/FILE OUTPUT

The obvious thing to do with an XML structure once constructed is of course to write it to a file or extract a string from it. XML::Snap gives you one powerful option, which is the use of embedded generators to act as a live template.

string, rawstring

Extracts a string from the XML node passed in; string gives you an escaped string that can be parsed back into an equivalent XML structure, while rawstring does not escape anything, so you can't count on equivalence or even legal XML. This is useful if your XML structure is being used to build strings, otherwise it's the wrong tool to use.

content, rawcontent

These do the same, but don't include the parent tag or its closing tag in the string.

write

Given a filename, an optional prefix to write to the file, writes the XML to a file.

writestream

Writes the XML to an open stream.

escape/unescape

These are convenience functions that escape a string for use in XML, or unescape the escaped string for non-XML use.

BOOKMARKING AND SEARCHING

Finally, there are searching and bookmarking functions for finding and locating given XML in a tree.

getloc

Retrieves a location for a given node in its tree, effectively a bookmark. The rules are simple. The bookmark consists of a set of dotted pairs, each being the name of the tag plus a disambiguator if necessary. If the tag is the first of its sibs with its own tag, no disambiguator is necessary. If the tag has an attribute named 'id' that doesn't have a dot or square brackets in it, then square brackets surrounding that value are used as the disambiguator. Otherwise, a number in parentheses identifies the sequence of the tag within the list of siblings with its own tag name.

So mytag[one] matches mytag id="one" and mytag(1) matches the second 'mytag' in its parent's list of elements. mytag[one].next(3) matches the fourth 'next' in mytag id="one".

This is essentially a much simplified XMLpath (I may be wrong, but I think I came up with it before XMLpaths had been defined). It's quick and dirty, but works.

loc

Given such a bookmark and the tree it pertains to, finds the bookmarked node.

all

Returns a list of XML snippets that meet the search criteria.

WALKING THE TREE

XML is a tree structure, and what do we do with trees? We walk them!

A walker is an iterator that visits each node in turn, then its children, one by one. Walkers come in two flavors: full walk or element walk; the element walk ignores text.

The walker constructor optionally takes a closure that will be called on each node before it's returned; the return from that closure will be what's returned. If it returns undef, the walk will skip that node and go on with the walk in the same order that it otherwise would have; if it returns a list of (value, 'prune') then the walk will not visit that node's children, and "value" will be taken as the return value (and it can obviously be undef as well).

walk

walk is the complete walk. It returns an iterator. Pass it a closure to be called on each node as it's visited. Modifying the tree's structure is entirely fine as long as you're just manipulating the children of the current node; if you do other things, the walker might get confused.

walk_elem

For the sake of convenience, walk_elem does the same thing, except it only visits nodes, not text.

walk_all

A simplified walk that simply returns matching nodes.

    my $w = $self->{body}->walk(sub {
        my $node = shift;
        return ($node, 'prune') if $node->is('trans-unit'); # Segments are returned whole.
        return undef; # We don't want the details for anything else, but still walk into its children if it has any.
    });

first

Returns the first XML element (i.e. non-node thing) that meets the search criteria.

AUTHOR

Michael Roberts, <michael at vivtek.com>

BUGS

Please report any bugs or feature requests to bug-xml-snap at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=XML-Snap. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc XML::Snap

You can also look for information at:

ACKNOWLEDGEMENTS

LICENSE AND COPYRIGHT

Copyright 2013 Michael Roberts.

This program is free software; you can redistribute it and/or modify it under the terms of the the Artistic License (2.0). You may obtain a copy of the full license at:

http://www.perlfoundation.org/artistic_license_2_0

Any use, modification, and distribution of the Standard or Modified Versions is governed by this Artistic License. By using, modifying or distributing the Package, you accept this license. Do not use, modify, or distribute the Package, if you do not accept this license.

If your Modified Version has been derived from a Modified Version made by someone other than you, you are nevertheless required to ensure that your Modified Version complies with the requirements of this license.

This license does not grant you the right to use any trademark, service mark, tradename, or logo of the Copyright Holder.

This license includes the non-exclusive, worldwide, free-of-charge patent license to make, have made, use, offer to sell, sell, import and otherwise transfer the Package with respect to any patent claims licensable by the Copyright Holder that are necessarily infringed by the Package. If you institute patent litigation (including a cross-claim or counterclaim) against any party alleging that the Package constitutes direct or contributory patent infringement, then this Artistic License to you shall terminate on the date that such litigation is filed.

Disclaimer of Warranty: THE PACKAGE IS PROVIDED BY THE COPYRIGHT HOLDER AND CONTRIBUTORS "AS IS' AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES. THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT ARE DISCLAIMED TO THE EXTENT PERMITTED BY YOUR LOCAL LAW. UNLESS REQUIRED BY LAW, NO COPYRIGHT HOLDER OR CONTRIBUTOR WILL BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING IN ANY WAY OUT OF THE USE OF THE PACKAGE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.