The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

XML::FeedPP -- Parse/write/merge/edit RSS/RDF/Atom syndication feeds

SYNOPSIS

Get an RSS file and parse it:

    my $source = 'http://use.perl.org/index.rss';
    my $feed = XML::FeedPP->new( $source );
    print "Title: ", $feed->title(), "\n";
    print "Date: ", $feed->pubDate(), "\n";
    foreach my $item ( $feed->get_item() ) {
        print "URL: ", $item->link(), "\n";
        print "Title: ", $item->title(), "\n";
    }

Generate an RDF file and save it:

    my $feed = XML::FeedPP::RDF->new();
    $feed->title( "use Perl" );
    $feed->link( "http://use.perl.org/" );
    $feed->pubDate( "Thu, 23 Feb 2006 14:43:43 +0900" );
    my $item = $feed->add_item( "http://search.cpan.org/~kawasaki/XML-TreePP-0.02" );
    $item->title( "Pure Perl implementation for parsing/writing xml file" );
    $item->pubDate( "2006-02-23T14:43:43+09:00" );
    $feed->to_file( "index.rdf" );

Convert some RSS/RDF files to Atom format:

    my $feed = XML::FeedPP::Atom->new();                # create empty atom file
    $feed->merge( "rss.xml" );                          # load local RSS file
    $feed->merge( "http://www.kawa.net/index.rdf" );    # load remote RDF file
    my $now = time();
    $feed->pubDate( $now );                             # touch date
    my $atom = $feed->to_string();                      # get Atom source code

DESCRIPTION

XML::FeedPP is an all-purpose syndication utility that parses and publishes RSS, RDF, and Atom feeds. It allows you to add new content, merge feeds, and convert among various formats. It is a pure Perl implementation and does not require any other module except for XML::TreePP.

METHODS FOR FEED

$feed = XML::FeedPP->new( 'index.rss' );

This constructor method creates an XML::FeedPP feed instance. The only argument is the local filename. The format of $source must be one of the supported feed formats -- RSS, RDF or Atom -- or execution is halted.

$feed = XML::FeedPP->new( 'http://use.perl.org/index.rss' );

The URL on the remote web server is also available as the first argument. The LWP::UserAgent module is required to download it.

$feed = XML::FeedPP->new( '<?xml?><rss version="2.0"><channel>....' );

The XML source code is also available as the first argument.

$feed = XML::FeedPP::RSS->new( $source );

This constructor method creates an instance for an RSS-formated feed. The first argument is optional, but must be valid RSS if specified. This method returns an empty instance when $source is undefined.

$feed = XML::FeedPP::RDF->new( $source );

This constructor method creates an instance for RDF-formatted feed. The first argument is optional, but must be RDF if specified. This method returns an empty instance when $source is undefined.

$feed = XML::FeedPP::Atom->new( $source );

This constructor method creates an instance for an Atom-formatted feed. The first argument is optional, but must be Atom if specified. This method returns an empty instance when $source is undefined.

This constructor method creates an empty instance and sets <link>, <title> elements etc.

$feed->load( $source );

This method loads an RSS/RDF/Atom file, much like the new() method does.

$feed->merge( $source );

This method merges an RSS/RDF/Atom file into the existing $feed instance. Top-level metadata from the imported feed is incorporated only if missing from the present feed.

$string = $feed->to_string( $encoding );

This method generates XML source as string and returns it. The output $encoding is optional, and the default encoding is 'UTF-8'. On Perl 5.8 and later, any encodings supported by the Encode module are available. On Perl 5.005 and 5.6.1, only four encodings supported by the Jcode module are available: 'UTF-8', 'Shift_JIS', 'EUC-JP' and 'ISO-2022-JP'. 'UTF-8' is recommended for overall compatibility.

$feed->to_file( $filename, $encoding );

This method generate an XML file. The output $encoding is optional, and the default is 'UTF-8'.

$item = $feed->add_item( $url );

This method creates a new item/entry and returns its instance. A mandatory $link argument is the URL of the new item/entry. RSS's <item> element is an instance of XML::FeedPP::RSS::Item class. RDF's <item> element is an instance of XML::FeedPP::RDF::Item class. Atom's <entry> element is an instance of XML::FeedPP::Atom::Entry class.

$item = $feed->add_item( $srcitem );

This method duplicates an item/entry and adds it to $feed. $srcitem is a XML::FeedPP::*::Item class's instance which is returned by the get_item() method, as described above.

This method creates an new item/entry and sets <link>, <title> elements etc.

$item = $feed->get_item( $index );

This method returns item(s) in a $feed. A valid zero-based array $index returns the corresponding item in the feed. An invalid $index yields undef. If $index is undefined in array context, it returns an array of all items. If $index is undefined in scalar context, it returns the number of items.

This method finds item(s) which match all regular expressions given. This method returns an array of all matched items in array context. This method returns the first matched item in scalar context.

$feed->remove_item( $index );

This method removes an item/entry from $feed, where $index is a valid zero-based array index.

$feed->clear_item();

This method removes all items/entries from the $feed.

$feed->sort_item();

This method sorts the order of items in $feed by <pubDate>.

$feed->uniq_item();

This method makes items unique. The second and succeeding items that have the same link URL are removed.

$feed->normalize();

This method calls both the sort_item() and uniq_item() methods.

$feed->limit_item( $num );

Removes items in excess of the specified numeric limit. Items at the end of the list are removed. When preceded by sort_item() or normalize(), this deletes more recent items.

$feed->xmlns( 'xmlns:media' => 'http://search.yahoo.com/mrss' );

Adds an XML namespace at the document root of the feed.

$url = $feed->xmlns( 'xmlns:media' );

Returns the URL of the specified XML namespace.

@list = $feed->xmlns();

Returns the list of all XML namespaces used in $feed.

METHODS FOR CHANNEL

$feed->title( $text );

This method sets/gets the feed's <title> value, returning the current value when $title is undefined.

$feed->description( $html );

This method sets/gets the feed's <description> value in plain text or HTML, returning the current value when $html is undefined.

$feed->pubDate( $date );

This method sets/gets the feed's <pubDate> value for RSS, <dc:date> value for RDF, or <modified> value for Atom. It returns the current value when $date is undefined. See also the DATE/TIME FORMATS section.

$feed->copyright( $text );

This method sets/gets the feed's <copyright> value for RSS/Atom, or <dc:rights> element for RDF. It returns the current value when $text is undefined.

$feed->link( $url );

This method sets/gets the URL of the web site as the feed's <link> value for RSS/RDF/Atom. It returns the current value when the $url is undefined.

$feed->language( $lang );

This method sets/gets the feed's <language> value for RSS, <dc:language> element for RDF, or <feed xml:lang=""> attribute for Atom. It returns the current value when the $lang is undefined.

$feed->image( $url, $title, $link, $description, $width, $height )

This method sets/gets the feed's <image> value and its child nodes for RSS/RDF, returning a list of current values when any arguments are undefined. This method is ignored for Atom feeds.

METHODS FOR ITEM

$item->title( $text );

This method sets/gets the item's <title> value, returning the current value when the $text is undefined.

$item->description( $html );

This method sets/gets the item's <description> value in HTML or plain text, returning the current value when $text is undefined.

$item->pubDate( $date );

This method sets/gets the item's <pubDate> value for RSS, RDF's <dc:date> element, or Atom's <issued> element. This method returns the current value when $date is undefined. See also the DATE/TIME FORMATS section.

$item->category( $text );

This method sets/gets the item's <category> value for RSS/RDF, but is ignored for Atom. It returns the current value when $text is undefined.

$item->author( $text );

This method sets/gets the item's <author> value for RSS, <creator> value for RDF, or <author><name> value for Atom. It returns the current value when $text is undefined.

This method sets/gets the item's <guid> value for RSS or <id> value for Atom; it is ignored for RDF. The second argument is optional. This method returns the current value when $guid is undefined.

$item->set( $key => $value, ... );

This method sets customized node values or attributes. See also the GENERAL SET/GET section that follows.

$value = $item->get( $key );

This method returns the node value or attribute. See also the GENERAL SET/GET section that follows.

This method returns the item's <link> value.

GENERAL SET/GET

XML::FeedPP understands only <rdf:*>, <dc:*> modules and RSS/RDF/ATOM's default namespaces. There are NO native methods for any other external modules, such as <media:*>. But set()/get() methods are available to get/set the value of any elements or attributes for these modules.

$item->set( 'module:name' => $value );

This sets the value of the child node: <item><module:name>$value

$item->set( 'module:name@attr' => $value );

This sets the value of the child node's attribute: <item><module:name attr="$value">

$item->set( '@attr' => $value );

This sets the value of the item's attribute: <item attr="$value">

$item->set( 'hoge/pomu@hare' => $value );

This code sets the value of the child node's child node's attribute: <item><hoge><pomu attr="$value">

DATE/TIME FORMATS

XML::FeedPP allows you to describe date/time using any of the three following formats:

$date = "Thu, 23 Feb 2006 14:43:43 +0900";

This is the HTTP protocol's preferred format and RSS 2.0's native format, as defined by RFC 1123.

$date = "2006-02-23T14:43:43+09:00";

W3CDTF is the native format of RDF, as defined by ISO 8601.

$date = 1140705823;

The last format is the number of seconds since the epoch, 1970-01-01T00:00:00Z. You know, this is the native format of Perl's time() function.

MODULE DEPENDENCIES

XML::FeedPP requires only XML::TreePP, which likewise is a pure Perl implementation. The standard LWP::UserAgent module is required to download feeds from remote web servers. The Jcode module is required to convert Japanese encodings on Perl 5.006 and 5.6.1, but is NOT required on Perl 5.8.x and later.

AUTHOR

Yusuke Kawasaki, http://www.kawa.net/

COPYRIGHT AND LICENSE

Copyright (c) 2006-2007 Yusuke Kawasaki. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.