NAME
XML::RSS::LibXML - XML::RSS with XML::LibXML (parse-only)
SYNOPSIS
use XML::RSS::LibXML;
my $rss = XML::RSS::LibXML->new;
$rss->parsefile($file);
print "channel: $rss->{channel}->{title}\n";
foreach my $item (@{ $rss->{items} }) {
print " item: $item->{title} ($item->{link})\n";
}
# Add custom modules
$rss->add_module(uri => $uri, prefix => $prefix);
# Add custom parse contexts
$rss->add_parse_context(
context => $context, # 'channel', 'item'
field => $field_name,
xpath => $xpath
);
$rss->parse(...); # now parse with new context
DESCRIPTION
XML::RSS::LibXML uses XML::LibXML (libxml2) for parsing RSS instead of XML::RSS' XML::Parser (expat), while trying to keep interface compatibility with XML::RSS.
XML::RSS is an extremely handy tool, but it is unfortunately not exactly the most lean or efficient RSS parser, especially in a long-running process. So for a long time I had been using my own version of RSS parser to get the maximum speed and efficiency - this is the re-packaged version of that module, such that it adheres to the XML::RSS interface.
XML::RSS::LibXML is NOT 100% compatible with XML::RSS. For example, XML::RSS::LibXML is not capable of outputting RSS in various formats, and namespaces aren't exactly supported the way they are in XML::RSS (patches welcome).
Use this module when you have severe performance requirements in parsing RSS files.
PARSED FIELDS
METHODS
new
Creates a new instance of XML::RSS::LibXML
parse($string)
Parse a string containing RSS.
parse_file($filename)
Parse an RSS file specified by $filename
as_string()
Return the string representation of the parsed RSS.
add_module(uri =< $uri, prefix =< $prefix)
Adds a new module. You should do this before parsing the RSS. XML::RSS::LibXML understands a few modules by default:
rdf => "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
dc => "http://purl.org/dc/elements/1.1/",
sy => "http://purl.org/rss/1.0/modules/syndication/",
admin => "http://webns.net/mvcb/",
content => "http://purl.org/rss/1.0/modules/content/",
cc => "http://web.resource.org/cc/",
taxo => "http://purl.org/rss/1.0/modules/taxonomy/",
So you do not need to add these explicitly.
add_parse_context(context =< $context, field =< $field, xpath =< $xpath)
Adds new parse contexts. XML::RSS::LibXML attempts to parse most of the oft-used fields from RSS feeds, but often there are times when you want finer grain of control.
If, for example, you want to include a custom field in within the <channel> element called foo
, you may add something like this:
$rss->add_parse_context(
context => 'channel',
field => 'foo',
xpath => 'foo', # XPath relative to the current context, which is
# 'channel'
);
$rss->parsefile($file);
Then after parsing, $rss will contain a structure like this:
$rss = {
channel => {
foo => $value_of_foo
# other fields
},
# other fields
};
PERFORMANCE
Here's a simple benchmark using benchmark.pl in this distribution:
daisuke@localhost XML-RSS-LibXML$ perl -Mlib=lib benchmark.pl index.rdf
Rate rss rss_libxml
rss 8.00/s -- -97%
rss_libxml 262/s 3172% --
CAVEATS
No support whatsover for writing RSS. No plans to support it either.
TODO
Tests. Currently tests are simply stolen from XML::RSS. It would be nice to have tests that do more extensive testing for correctness
SEE ALSO
XML::RSS, XML::LibXML, XML::LibXML::XPathContext
AUTHORS
Copyright 2005 Daisuke Maki <dmaki@cpan.org>, Tatsuhiko Miyagawa <miyagawa@bulknews.net>. All rights reserved.
Development partially funded by Brazil, Ltd. <http://b.razil.jp>