XML::Rabbit - Consume XML with Moose and xpath queries
version 0.4.1
my $xhtml = W3C::XHTML->new( file => 'index.xhtml', # or string... xml => $xml, # or filehandle... fh => $fh, # or XML::LibXML::Document object dom => $dom, ); print "Title: " . $xhtml->title . "\n"; print "First image source: " . $xhtml->body->images->[0]->src . "\n"; exit; package W3C::XHTML; use XML::Rabbit::Root; add_xpath_namespace 'xhtml' => 'http://www.w3.org/1999/xhtml'; has_xpath_value 'title' => '/xhtml:html/xhtml:head/xhtml:title'; has_xpath_object 'body' => '/xhtml:html/xhtml:body' => 'W3C::XHTML::Body'; has_xpath_object_list 'all_anchors_and_images' => '//xhtml:a|//xhtml:img', { 'xhtml:a' => 'W3C::XHTML::Anchor', 'xhtml:img' => 'W3C::XHTML::Image', }, ; finalize_class(); package W3C::XHTML::Body; use XML::Rabbit; has_xpath_object_list 'images' => './/xhtml:img' => 'W3C::XHTML::Image'; finalize_class(); package W3C::XHTML::Image; use XML::Rabbit; has_xpath_value 'src' => './@src'; has_xpath_value 'alt' => './@alt'; has_xpath_value 'title' => './@title'; finalize_class(); package W3C::XHTML::Anchor; use XML::Rabbit; has_xpath_value 'href' => './@src'; has_xpath_value 'title' => './@title'; finalize_class();
XML::Rabbit is a Moose-based class construction toolkit you can use to make XPath-based XML extractors with very little code. Each attribute in your class created with the above helper function is linked to an XPath query that is executed on your XML document when you request the value. Creating object hierarchies that mimic the layout of the XML document is almost as easy as doing a search and replace on the XML DTD (if you have one).
You can return multiple values from the XML either as arrays or hashes, depending on how you need to work with the data from the XML document.
The example in the synopsis shows how to create a class hierarchy that enables easy retrival of certain information from an XHTML document.
Also notice that if you specify an xpath query that can return multiple XML elements, you need to specify a hash map (xml tag => class name) instead of just specifying the class name returned.
All the array and hash-returning attributes are tagged with the Array and Hash native traits, so it is quick and easy to specify additional delegations. All the has_xpath_value* helpers expect the value to be of a Str type constraint. Array and hash-returning attributes automatically set their isa to ArrayRef[Str] and HashRef[Str] respectively.
has_xpath_value*
Str
ArrayRef[Str]
HashRef[Str]
All the helper methods and their associated arguments are explained in XML::Rabbit::Sugar detail.
Be aware that if your XML document contains a default XML namespace (like XHTML does), you MUST specify it with add_xpath_namespace(), or else your xpath queries will not match anything. The XML document is not scanned for XML namespaces during initialization.
add_xpath_namespace()
Automatically loads namespace::autoclean into the caller's package and dispatches to "import" in XML::Rabbit::Sugar (tail call).
Dispatches to "unimport" in XML::Rabbit::Sugar (tail call).
Initializes the metaclass of the calling class and adds the role XML::Rabbit::Node.
When you specify use XML::Rabbit::Root (or use XML::Rabbit in child nodes) you do the equivalent of the following code:
use XML::Rabbit::Root
use XML::Rabbit
use Moose; with "XML::Rabbit::RootNode"; # if you use XML::Rabbit::Root with "XML::Rabbit::Node"; # if you use XML::Rabbit use namespace::autoclean; use XML::Rabbit::Sugar;
This ensures that you don't have to specify no Moose at the end of your code to clean up imported functions you have used in your class.
no Moose
The finalize_class() call at the end of the file is the same as writing the following piece of code:
finalize_class()
__PACKAGE__->meta->make_immutable(); 1;
This optimizes the class and ensures it always returns a true value, which is required to successfully load a Perl script file.
The trait applied to the attribute will wrap the type constraint union in an ArrayRef if the trait name is XPathObjectList and as a HashRef if the trait name is XPathObjectMap. As all the traits that end with List return array references, their isa must be an ArrayRef. The same is valid for the *Map traits, just that they return HashRef instead of ArrayRef.
isa
The namespace prefix used in isa_map MUST be specified in the namespace_map. If a prefix is used in isa_map without a corresponding entry in namespace_map an exception will be thrown.
isa_map
namespace_map
Be aware of the syntax of XPath when used with namespaces. You should almost always define namespace_map when dealing with XML that use namespaces. Namespaces explicitly declared in the XML are usable with the prefix specified in the XML (except if you use isa_map). Be aware that a prefix must usually be declared for the default namespace (xmlns=...) to be able to use it in XPath queries. See the example above (on XHTML) for details. See "findnodes" in XML::LibXML::Node for more information.
Because XML::Rabbit uses XML::LibXML's DOM parser it is limited to handling XML documents that can fit in available memory. Unfortunately there is no easy way around this, because XPath queries need to work on a tree model, and I am not aware of any way of doing that without keeping the document in memory. Luckily XML::LibXML's DOM implementation is written in C, so it should use much less memory than a pure Perl DOM parser.
This module uses semantic versioning concepts from http://semver.org/.
The following people have helped to review or otherwise encourage me to work on this module.
Chris Prather (perigrin)
Matt S. Trout (mst)
Stevan Little (stevan)
Moose
XML::LibXML
namespace::autoclean
XML::Toolkit
XML::Twig
Mojo::DOM
XPath tutorial
XPath Specification
Implementing WWW::LastFM, a client library to the Last.FM API, with XML::Rabbit
You can find documentation for this module with the perldoc command.
perldoc XML::Rabbit
The following websites have more information about this module, and may be of help to you. As always, in addition to those websites please use your favorite search engine to discover more resources.
MetaCPAN
A modern, open-source CPAN search engine, useful to view POD in HTML format.
http://metacpan.org/release/XML-Rabbit
Search CPAN
The default CPAN search engine, useful to view POD in HTML format.
http://search.cpan.org/dist/XML-Rabbit
RT: CPAN's Bug Tracker
The RT ( Request Tracker ) website is the default bug/issue tracking system for CPAN.
http://rt.cpan.org/NoAuth/Bugs.html?Dist=XML-Rabbit
AnnoCPAN
The AnnoCPAN is a website that allows community annotations of Perl module documentation.
http://annocpan.org/dist/XML-Rabbit
CPAN Ratings
The CPAN Ratings is a website that allows community ratings and reviews of Perl modules.
http://cpanratings.perl.org/d/XML-Rabbit
CPAN Forum
The CPAN Forum is a web forum for discussing Perl modules.
http://cpanforum.com/dist/XML-Rabbit
CPANTS
The CPANTS is a website that analyzes the Kwalitee ( code metrics ) of a distribution.
http://cpants.perl.org/dist/overview/XML-Rabbit
CPAN Testers
The CPAN Testers is a network of smokers who run automated tests on uploaded CPAN distributions.
http://www.cpantesters.org/distro/X/XML-Rabbit
CPAN Testers Matrix
The CPAN Testers Matrix is a website that provides a visual overview of the test results for a distribution on various Perls/platforms.
http://matrix.cpantesters.org/?dist=XML-Rabbit
CPAN Testers Dependencies
The CPAN Testers Dependencies is a website that shows a chart of the test results of all dependencies for a distribution.
http://deps.cpantesters.org/?module=XML::Rabbit
Please report any bugs or feature requests by email to bug-xml-rabbit at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=XML-Rabbit. You will be automatically notified of any progress on the request by the system.
bug-xml-rabbit at rt.cpan.org
The code is open to the world, and available for you to hack on. Please feel free to browse it and play with it, or whatever. If you want to contribute patches, please send me a diff or prod me to pull from your repository :)
http://github.com/robinsmidsrod/XML-Rabbit
git clone git://github.com/robinsmidsrod/XML-Rabbit.git
Robin Smidsrød <robin@smidsrod.no>
This software is copyright (c) 2015 by Robin Smidsrød.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.
To install XML::Rabbit, copy and paste the appropriate command in to your terminal.
cpanm
cpanm XML::Rabbit
CPAN shell
perl -MCPAN -e shell install XML::Rabbit
For more information on module installation, please visit the detailed CPAN module installation guide.