The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

XML::RSS::LibXML - XML::RSS with XML::LibXML (parse-only)

SYNOPSIS

use XML::RSS::LibXML;
my $rss = XML::RSS::LibXML->new;
$rss->parsefile($file);

print "channel: $rss->{channel}->{title}\n";
foreach my $item (@{ $rss->{items} }) {
   print "  item: $item->{title} ($item->{link})\n";
}

# Add custom modules
$rss->add_module(uri => $uri, prefix => $prefix);

# Add custom parse contexts
$rss->add_parse_context(
  context => $context, # 'channel', 'item'
  field   => $field_name,
  xpath   => $xpath
);
$rss->parse(...); # now parse with new context

DESCRIPTION

XML::RSS::LibXML uses XML::LibXML (libxml2) for parsing RSS instead of XML::RSS' XML::Parser (expat), while trying to keep interface compatibility with XML::RSS.

XML::RSS is an extremely handy tool, but it is unfortunately not exactly the most lean or efficient RSS parser, especially in a long-running process. So for a long time I had been using my own version of RSS parser to get the maximum speed and efficiency - this is the re-packaged version of that module, such that it adheres to the XML::RSS interface.

XML::RSS::LibXML is NOT 100% compatible with XML::RSS. For example, XML::RSS::LibXML is not capable of outputting RSS in various formats, and namespaces aren't exactly supported the way they are in XML::RSS (patches welcome).

Use this module when you have severe performance requirements in parsing RSS files.

PARSED FIELDS

METHODS

new

Creates a new instance of XML::RSS::LibXML

parse($string)

Parse a string containing RSS.

parse_file($filename)

Parse an RSS file specified by $filename

as_string()

Return the string representation of the parsed RSS.

add_module(uri =< $uri, prefix =< $prefix)

Adds a new module. You should do this before parsing the RSS. XML::RSS::LibXML understands a few modules by default:

rdf     => "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
dc      => "http://purl.org/dc/elements/1.1/",
sy      => "http://purl.org/rss/1.0/modules/syndication/",
admin   => "http://webns.net/mvcb/",
content => "http://purl.org/rss/1.0/modules/content/",
cc      => "http://web.resource.org/cc/",
taxo    => "http://purl.org/rss/1.0/modules/taxonomy/",

So you do not need to add these explicitly.

add_parse_context(context =< $context, field =< $field, xpath =< $xpath)

Adds new parse contexts. XML::RSS::LibXML attempts to parse most of the oft-used fields from RSS feeds, but often there are times when you want finer grain of control.

If, for example, you want to include a custom field in within the <channel> element called foo, you may add something like this:

$rss->add_parse_context(
  context => 'channel',
  field   => 'foo',
  xpath   => 'foo', # XPath relative to the current context, which is
                    # 'channel'
);
$rss->parsefile($file);

Then after parsing, $rss will contain a structure like this:

$rss = {
  channel => {
    foo => $value_of_foo
    # other fields
  },
  # other fields
};

PERFORMANCE

Here's a simple benchmark using benchmark.pl in this distribution:

daisuke@localhost XML-RSS-LibXML$ perl -Mlib=lib benchmark.pl index.rdf 
             Rate        rss rss_libxml
rss        8.00/s         --       -97%
rss_libxml  262/s      3172%         --

CAVEATS

No support whatsover for writing RSS. No plans to support it either.

TODO

Tests. Currently tests are simply stolen from XML::RSS. It would be nice to have tests that do more extensive testing for correctness

SEE ALSO

XML::RSS, XML::LibXML, XML::LibXML::XPathContext

AUTHORS

Copyright 2005 Daisuke Maki <dmaki@cpan.org>, Tatsuhiko Miyagawa <miyagawa@bulknews.net>. All rights reserved.

Development partially funded by Brazil, Ltd. <http://b.razil.jp>