The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

XML::Trivial - The trivial tool representing parsed XML as tree of read only objects.

VERSION

Version 0.06

SYNOPSIS

 use XML::Trivial ();
 my $xml = XML::Trivial::parseFile('filename');
 print "Names and text contents of /root/child/* elements:\n";
 foreach ($$xml{0}{child}->ea) {
   print "name:".$_->en;
   print " text:".$_->ts."\n";
 }

DESCRIPTION

This module provides easy read only and random access to previously parsed XML documents in Perl. The xml declaration, elements, attributes, comments, text nodes, CDATA sections and processing instructions are implemented. Following limitations are assumed:

* The XML files are small, respectively, parsed XML data are storable in memory.

* Perl structure representing XML file is NOT serializable by Data::Dumper. (But every element is serializable by its own sr() method.)

* Perl structure is read only.

The module is namespace-aware.

IDEAS

This module is designed for reading and traversing the small XML files in Perl. There are no expectations of xml structure before parse time, every well-formed document can be parsed and traversed, every element can be serialized, all without any lose of information.

DEPENDENCIES

XML::Parser::Expat is used for parsing of the XML files. This may change or may get optional.

USAGE

 use XML::Trivial ();

Module functions

parseFile('filename')

See next chapter.

parse($string)

See next chapter.

Parsing

 my $xml = XML::Trivial::parseFile('filename');

If specified filename does not exist or the content is not well formed xml document, the subroutine dies with origin expat's message, because this module has no opinion about what to do in these situations.

Or:

 my $xml = XML::Trivial::parse(q{<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
 <root>
   <home>/usr/local/myApplication</home>
   <sections>
     <section name="A" version="1.8" escaped="',&quot;,&lt;">
       <a_specific>aaa</a_specific>
     </section>
     <section name="B">bbb</section>
     <text>
     ...and there is another stuff
     <![CDATA[<html><body><hr>Hello, world!<hr></body></html>]]>
     ...more stuff here...
       <element/>
     <![CDATA[2nd CDATA]]>
     ...]]&gt;...
     </text>
   </sections>
 <!--processing instructions-->
   <?first do something ?>
   <?second do st. else ?>
   <?first fake ?>
 <!--namespaces-->
   <meta xmlns="meta_ns" xmlns:p1="first_ns" xmlns:p2="second_ns">
     <desc a="v" p1:a="v1" p2:a="v2"/>
     <p1:desc a="v" p1:a="v1" p2:a="v2"/>
     <p2:desc a="v" p1:a="v1" p2:a="v2"/>
   </meta>
 </root>});

This xml document, represented by $xml, is used in examples below.

XML declaration

 print "xml version: ".$xml->xv()."\n";

If xml declaration is not present in parsed document, '1.0' is returned as xml version.

 print "xml encoding: ".$xml->xe()."\n";

If xml declaration is not present in parsed document or encoding is not specified, 'UTF-8' is returned. REMEMBER that returned value reflects origin encoding of parsed document, perl internal representation is already in UTF-8.

 print "xml standalone: ".$xml->xs()."\n";

If xml declaration is not present in parsed document or standalone is not present, undef is returned. Otherwise, 1 is returned when standalone="yes" or 0 is returned when standalone="no".

Doctype

 print "doctype name: ".$xml->dn()."\n";

If document type declaration present, it returns its name, otherwise returns undef.

 print "doctype system: ".$xml->ds()."\n";

If document type declaration and system part of external entity declaration present, it is returned, otherwise undef is returned.

 print "doctype public: ".$xml->dp()."\n";

If document type declaration and public part of external entity declaration present, it is returned, otherwise undef is returned.

Document tree

Parsed xml is organized into tree datastructure, which nodes represents the rootnode and the elements. All nodes have the same class, XML::Trivial::Element. The simplest navigation through the tree is possible according to following examples (the sr() method of final element serializes that element, just for demonstration):

Navigation by element name:

 print "homeelement: ".$$xml{root}{home}->sr."\n";
 print "prefix based access: ".$$xml{root}{meta}{'p1:desc'}->sr."\n";
 print "namespace based access: ".$$xml{root}{meta}{'first_ns*desc'}->sr."\n";

BE CAREFULL, if more sibbling elements would belong to the same hashkey, the first sibbling is already returned.

Navigation by element position:

 print "first child element of rootelement: ".$$xml{root}{0}->sr."\n";

If the non-negative integer is used as a key, the sibling on that position is returned.

Element methods

Describing particular methods, terms 'hash(ref)' and 'array(ref)' are used when returned type depends on calling context - in scalar context, method returns hashref or arrayref, in list context, method returns list (hash or array).

All XML declaration methods and Doctype methods (see above) are usable on elements.

p()

parentnode. Returns parent element or root node.

 print "serializes whole document: ".$$xml{0}->p->sr."\n";
en()

element (qualified) name

 print "home element name: ".$$xml{0}{0}->en."\n";
 print "name of 3rd childelement of meta: ".$$xml{0}{meta}{2}->en."\n";

Returns qualified element name (including namespace prefix).

ep()

element prefix

 print "home element prefix: '".$$xml{0}{0}->ep."'\n";
 print "prefix of 3rd childelement of meta: '".$$xml{0}{meta}{2}->ep."'\n";

Returns prefix of qualified element name.

ln()

element local (unqualified) name

 print "home element localname: '".$$xml{0}{0}->ln."'\n";
 print "localname of 3rd childelement of meta: '".$$xml{0}{meta}{2}->ln."'\n";

Returns unqualified element name (excludes namespace prefix).

ns()

namespaces. Returns hash(ref) of namespaces in the element's scope.

 print "all namespaces of 'desc' element:\n";
 for (my %h = $$xml{0}{meta}{desc}->ns(); 
      my ($key, $val) = each %h; 
      print " '$key'='$val'\n"){}; 
ns(undef)

namespace of the element.

 print "namespace of 'p2:desc' element: ".$$xml{0}{meta}{'p2:desc'}->ns(undef)."\n";
ns($prefix)

namespace of specified prefix.

 print "namespace of 'p2' prefix in <desc> element: ".$$xml{0}{meta}{desc}->ns('p2')."\n";

Returns namespace of specified prefix, valid in the element.

ah()

attribute hash(ref). Returns the hash (in list context) or hashref (in scalar context) of all attributes - the keys of the hash are qualified attribute names.

 print "all attributes of 'desc' element:\n";
 for (my %h = $$xml{0}{meta}{desc}->ah(); 
      my ($key, $val) = each %h; 
      print " '$key'='$val'\n"){}; 
ah($attrname)

attribute hash. Returns the value of specified attribute name.

 print "\n1st section version: ".$$xml{0}{sections}{section}->ah('version')."\n";
 print "p1:a value of p2:desc element: ".$$xml{0}{meta}{'p2:desc'}->ah('p1:a')."\n";

This usage of this method (with 1 argument) is namespace naive - the argument have to be qualified attribute name with the same prefix as in parsed document.

ah($unprefixedattrname, $namespace)

attribute hash. If both arguments are defined, it returns the value of specified attribute unprefixed name in specified namespace.

 print "attrval of 'a' in 'first_ns' in 'desc' element: ".$$xml{0}{2}{0}->ah('a','first_ns')."\n";
ah($unprefixedattrname, undef)

attribute hash. If second argument is not defined but present, it returns the hash or hashref of attribute values of all namespaces, where such attribute unprefixed name actually occurs.

 print "values of 'a' attrs of 'desc' element:\n";
 for (my %h = $$xml{0}{meta}{desc}->ah('a',undef); 
      my ($key, $val) = each %h; 
      print " '$key'='$val'\n"){}; 
ah(undef, $namespace)

attribute hash. If first argument is not defined, it returns the hash or hashref of attributes in specified namespace.

 print "attributes of 'desc' element in 'second_ns':\n";
 for (my %h = $$xml{0}{meta}{desc}->ah(undef,'second_ns'); 
      my ($key, $val) = each %h; 
      print " '$key'='$val'\n"){}; 
ah(undef, undef)

attribute hash. If both arguments are not defined but present, it returns the hash or hashref of attributes in the element's namespace.

 print "attributes of 'p1:desc' element in its namespace:\n";
 for (my %h = $$xml{0}{meta}{'p1:desc'}->ah(undef,undef); 
      my ($key, $val) = each %h; 
      print " '$key'='$val'\n"){};

Remember, that unprefixed attribute does NOT inherit namespace from its element.

eh()

element hash(ref). Returns hash or hashref (depends on calling context) of child elements. If more than one child element have the same qualified name, only the first one is present in return.

 print "hash of child elements of 'sections':\n";
 for (my %h = $$xml{0}{sections}->eh(); 
      my ($key, $val) = each %h; 
      print " '$key'='".$val->sr."'\n"){}; 
eh($childname)

element hash. Returns the first child element with specified name.

 print "first section: ".$$xml{0}{sections}->eh('section')->sr."\n";
ea()

element array(ref). Returns the array or arrayref of child elements.

 print "all childelements of sections:\n";
 foreach ($$xml{0}{sections}->ea) {
     print " element name:".$_->en."\n";
 }
ea($index)

element array. Returns the $index'th child element.

 print "second childelement of sections: ".$$xml{0}{sections}->ea(1)->sr."\n";
ta()

text array(ref). Returns array(ref) of all textnodes, including CDATA sections.

 print "all texts under <text>:\n";
 foreach ($$xml{0}{sections}{text}->ta) {
     print " piece of text:".$_."\n";
 }
ta($index)

text array. Returns $index'th textnode under element, including CDATA sections.

 print "second text under <text>: ".$$xml{0}{sections}{text}->ta(1)."\n";
ca()

cdata array(ref). Returns array(ref) of CDATA sections.

 print "all cdatas under <text>:\n";
 foreach ($$xml{0}{sections}{text}->ca) {
     print " cdata: ".$_."\n";
 }
ca($index)

cdata array. Returns $index'th CDATA section under element.

 print "first cdata section under <text>: ".$$xml{0}{sections}{text}->ca(0)."\n";
ts()

text serialized. Returns all textnodes, serialized into scalar string.

 print "whole serialized text under <text>:".$$xml{0}{sections}{text}->ts."\n";
pa()

processing instruction array(ref). Returns array(ref) of all processing instructions if called without arguments. Items of returned array are arrayrefs of two items, target and body.

 print "processing instructions under rootelement:\n";
 foreach ($$xml{0}->pa) {
     print " target:$$_[0] body:$$_[1]\n";
 }
pa($index)

processing instruction array. Returns $index'th processing instruction under element. Returned processing instruction is arrayref of two items, target and body.

 print "first processing instruction under rootelement: ".join(' ',@{$$xml{0}->pa(0)})."\n";
ph()

processing instruction hash(ref). Returns the hash(ref) of processing instructions (the first occur of target wins) if called without arguments.

 print "processing instructions with different targets under rootelement:\n";
 for (my %h = $$xml{0}->ph(); 
      my ($key, $val) = each %h; 
      print " '$key'='".$val."'\n"){};  
ph($target)

processing instruction hash. Returns the first processing instruction with specified target.

 print "first processing instruction having target 'first' under rootelement: ".$$xml{0}->ph('first')."\n";
na()

note array(ref). Returns array(ref) of all comments if called without arguments.

 print "notes under rootelement:\n";
 foreach ($$xml{0}->na) {
     print " $_\n";
 }
na($index)

note array. Returns $index'th note under element.

 print "second note under rootelement: ".$$xml{0}->na(1)."\n";
a($index)

all. Returns internal representation of element. Helpfull if the order of mixed elements, text nodes, PI's etc. does matter. See the code, for instance body of sr() method.

sr()

serialize.

 print "whole document, serialized:\n";
 print $xml->sr;

Returns serialized element or root node. For attribute values, it outputs apostrophes as delimiters, escaping ampersands, apostrophes and left brackets inside. For text values, it escapes ampersands, left brackets and ]]> sequence (the last one to ]]&gt;). For better readability, the "\n" is appended when serializing child of root node which occurs before root element (xml declaration, doctype declaration, comment, processing instruction).

SEE ALSO

XML::Parser::Expat

XML::Simple for much more sophisticated XML2perlstruct transformations.

XML::Twig for parsing and traversing huge xml documents.

XML::LibXML for more complex review of the XML possibilities in Perl.

AUTHOR

Jan Poslusny aka Pajout, <pajout at cpan.org>

BUGS

Please report any bugs or feature requests to bug-xml-trivial at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=XML-Trivial. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc XML::Trivial

You can also look for information at:

COPYRIGHT

Copyright 2007 Jan Poslusny.

LICENSE

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.