rss2leafnode -- post RSS or Atom feeds and web pages to newsgroups
rss2leafnode [--options]
RSS2Leafnode downloads RSS or Atom feeds and posts items as messages to an NNTP news server. It's designed to make text items available for reading in local newsgroups, not propagating anywhere (though that's not enforced).
Desired feeds are given in a configuration file .rss2leafnode.conf in your home directory. For example to put a feed into group "r2l.perl"
fetch_rss ('r2l.perl', 'http://log.perl.org/atom.xml');
This is actually Perl code, so comment lines begin with # and you can use conditionals, variables, etc (see perlintro or perlsyn). The target newsgroup must exist (see for example "Leafnode" below). With that done, run rss2leafnode as
#
rss2leafnode
You can automate with cron or similar. If you do it under user news it could be just after a normal news fetch. The --config option below lets you run different config files at different times, etc. Code in the conf file could do that too. See examples/rss2leafnode.conf in the RSS2Leafnode sources for a complete sample.
cron
news
--config
Messages are added to the news spool using NNTP "POST" commands. When a feed is re-downloaded any items previously added are not repeated. Multiple feeds can be put in a single newsgroup. Each feed is posted as it's downloaded, so the first feed's articles appear while other feeds are still being downloaded.
The target news server follows the Net::NNTP defaults, or the newsgroup name can be in the form of a news: or nntp: URL of a server on a different host or port. For example a personal server on a high port number,
news:
nntp:
fetch_rss('news://somehost.mydomain.org:8119/r2l.weather', 'http://feeds.feedburner.com/PTCC');
Plain web pages can be downloaded too. Each time the page changes a new article is injected. This is good for a latest news or status page. For example
fetch_html ('r2l.music', 'http://www.abc.net.au/rage/playlist/print/saturday_print.htm');
The target can be an image or similar directly too. It's simply put in a news message with its indicated MIME type. How well it displays depends on your newsreader.
fetch_html('r2l.weather', 'http://www.bom.gov.au/difacs/IDX0604.gif');
The message "Subject" is the HTML <title> or possibly something better from URI::Title or Image::ExifTool if you have those. URI::Title has special cases for a couple of unhelpful sites and Image::ExifTool can get a PNG image title.
<title>
URI::Title
Image::ExifTool
If a web page isn't at a fixed location you can write some Perl code in .rss2leafnode.conf to construct a URL with a date etc. It might be worth attempting a couple of nearby dates if you're not certain when the new one becomes available.
HTTP ETag and Last-Modified headers are used, if provided by the server, to avoid re-downloading unchanged content (feeds or web pages). RSS <thr:count> or <slash:comments> are used to skip unchanged comments feeds. Values seen from the last run are saved in a .rss2leafnode.status file in your home directory. The --verbose option shows when a server doesn't have ETag or Last-Modified.
ETag
Last-Modified
<thr:count>
<slash:comments>
--verbose
If you have XML::RSS::Timing then it's used for RSS <ttl>, <updateFrequency>, etc from a feed. This means the feed is not re-downloaded until its declared update times. But only a few feeds have good timing info, most merely have a ttl advising for instance 5 minutes between rechecks.
XML::RSS::Timing
<ttl>
<updateFrequency>
ttl
With --verbose the next calculated update time is printed, in case you wonder why nothing is happening. The easiest way to force a re-download is to delete the ~/.rss2leafnode.status file. Old status file entries are automatically dropped if you don't fetch a particular feed for a while, so that file should normally need no maintenance.
rss2leafnode was originally created with the leafnode program in mind, but can be used with any server accepting posts. It's your responsibility to be careful where a target newsgroup propagates. Don't make automated postings to the world!
leafnode
For leafnode version 2 see its README file section "LOCAL NEWSGROUPS" on creating local-only groups. Add a line to the /etc/news/leafnode/local.groups file like
r2l.stuff y My various feeds
The group name is arbitrary and the description is optional, but note it must be a tab character between the name and the "y" and between the "y" and any description. "y" means posting is allowed.
The Small News "sn" program is a another possible local server. Create groups in it with command
snnewgroup r2l.something
When running the snntpd daemon from inetd or similar don't forget a logger program argument on the command line as described its INSTALL.run file, otherwise log messages go to the client connection and will upset most client program code, including Net::NNTP used by rss2leafnode.
snntpd
inetd
Net::NNTP
It's your responsibility to check the terms of use for any feeds or web pages you download with rss2leafnode. Pay particular attention if propagating or re-transmitting resulting messages.
Copyright or license statements in a feed are included in the messages as X-Copyright headers. Unless the content is in the public domain such copyright notices should be retained.
X-Copyright
The transformations RSS2Leafnode makes to turn feed items into messages are purely mechanical and for that reason the author believes the program's terms (ie. GPL, per "LICENSE" below) are not imposed on the results.
The command line options are
--config=/some/filename
Read the specified configuration file instead of ~/.rss2leafnode.conf.
--help
Print some brief help information.
Print some diagnostics about what's being done. With --verbose=2 print various technical details.
--verbose=2
--version
Print the program version number and exit.
The following config options can be set either in global variables, or on a per-feed basis in an individual fetch_rss() or fetch_html().
fetch_rss()
fetch_html()
fetch_rss ("group", "url", rss_get_links => G)
If set to 1 then download links in each item and include the content in the news message. For example,
$rss_get_links = 1; fetch_rss ('r2l.finance', 'http://au.biz.yahoo.com/financenews/htt/financenews.xml');
Not all feeds have interesting things at their link. Sometimes the RSS has the full item text already. But if the RSS is a summary then $rss_get_links makes the full article ready to read immediately, instead of having to click through from the message.
$rss_get_links
Only the immediate link target URL is retrieved. No images within the page are downloaded, which is often a good thing to reduce bloat or avoid occasional advertising in feeds. You'll probably have trouble if the link target uses frames (a set of HTML pages instead of just one).
fetch_rss ("group", "url", rss_get_comments => G)
If true then download the comments feeds for items and post as followup news articles. For example,
fetch_rss ('r2l.food', 'http://wickedgooddinner.blogspot.com/feeds/posts/default', rss_get_comments => 1);
To send a followup comment you usually must go to the links in the original article (or the followups) and use some sort of web form. Posting a message to the newsgroup goes nowhere.
When a feed is available in both Atom and RSS formats sometimes only the Atom one includes a comments feed URL.
Comments feeds are followed for as long as an article appears in the feed, though in the current implementation it might be checked for new comments only when the originating feed changes.
fetch_rss ("group", "url", render => R)
fetch_html ("group", "url", render => R)
If true then render HTML as plain text in the news messages. Normally item text, downloaded parts from $rss_get_links, and fetch_html() pages are all presented as text/html. If your newsreader doesn't handle HTML very well then render is a good way to see just the text. Setting 1 uses HTML::FormatText,
text/html
render
1
HTML::FormatText
$render = 1; # to use HTML::FormatText fetch_rss ('r2l.weather', 'http://xml.weather.yahoo.com/forecastrss?p=ASXX0001&u=f');
Setting "WithLinks" uses the HTML::FormatText::WithLinks variant (you must have that module) which shows HTML links as footnotes.
"WithLinks"
HTML::FormatText::WithLinks
fetch_rss ('r2l.stuff', 'http://rss.sciam.com/sciam/basic-science', render => 'WithLinks');
Settings "elinks", "lynx" or "w3m"use the respective external program. You must have HTML::FormatExternal and the program.
"elinks"
"lynx"
"w3m"
HTML::FormatExternal
fetch_rss ('r2l.sport', 'http://fr.news.yahoo.com/rss/rugby.xml', rss_get_links => 1, render => 'lynx');
"vilistextum" can be used too if it is built with --enable-multibyte for UTF-8 output. Other HTML::FormatExternal programs generally can't be used as they don't have output charset UTF-8.
"vilistextum"
--enable-multibyte
The number of columns to use when rendering HTML to plain text or when wrapping Atom text. You can set this to whatever you find easiest to read, or any special width needed by a particular feed.
fetch_rss ("group", "url", get_icon => G)
fetch_html ("group", "url", get_icon => G)
Download an RSS/Atom icon or HTML favicon as an image for the Face header. Image::Magick is required for image processing if not already PNG format and maximum size 48x48 (in size attributes).
Face
Image::Magick
The Face header is shown by Gnus and perhaps only a few other news readers. In Gnus it appears with "From:" in article mode on a graphical screen. It can be a good visual cue to the origin, but may not always be worth the extra download.
$get_icon = 1; fetch_rss ('r2l.whatsnew', 'http://www.archive.org/services/collection-rss.php');
Banners much wider than high are suppressed as probably advertising and anyway not suited to 48x48 size limit of the Face header specification. A 48x48 image might add around 4 kbytes or more to each message.
For plain RSS and Atom feeds an image is normally a per-channel attribute so it's the same for all articles from the feed. An itunes:image or activity:actor can be per-item and is used if present.
itunes:image
activity:actor
fetch_rss ("group", "url", rss_newest_only => $count)
fetch_rss ("group", "url", rss_newest_only => $period)
Take only newest items from an RSS feed. The default is 0 which means take all items from the feed. The value is either a number for the latest few items, eg. 10 items,
fetch_rss('r2l.test', 'http://www.cpantesters.org/author/K/KRYDE-nopass.rss', rss_newest_only => 10);
Or it can be a string giving a period of time. Only items newer than this are taken
"60 minutes" "1 hour" "36 hours" "1 day" "2 days" "1 day" "2 days" "1 month" "5 months" "1 year" "0.75 years"
rss_newest_only can be good if you're only interested in the most recent item from a status or weather feed, or if you only want to get a few items as a random taste of a feed.
rss_newest_only
If a feed goes back further than the news server retains then giving a period such as "90 days" or whatever corresponding to the server time will prevent old articles being re-added when the server discards them. (It'd be better if the news server could be asked for its retention time, but this option here is better than nothing.)
fetch_html ("group", "url", html_extract_main => 1)
fetch_rss ("group", "url", html_extract_main => 1)
Use HTML::ExtractMain on downloaded HTML to pick out the "main" text from the page. For fetch_rss() this is applied to downloaded link parts (rss_get_links above). HTML::ExtractMain version 0.63 or higher is required.
rss_get_links
HTML::ExtractMain
This is good for removing boilerplate headers or side columns on a page. For reading text those things tend to waste space and often look particularly poor from a non-tables renderer such as HTML::FormatText or lynx.
lynx
The algorithm in HTML::ExtractMain is a simple paragraph scoring system (as of its version 0.63). It does a surprisingly good job but you might check how much it discards, in case something good was not reckoned part of the main text. Option value "attach_full" includes the full page as an attachment
$html_extract_main = 'attach_full';
fetch_rss ("group", "url", user_agent => "string")
fetch_html ("group", "url", user_agent => "string")
Set the User-Agent string which RSS2Leafnode reports in its download requests. The default is RSS2Leafnode and LWP version numbers,
User-Agent
RSS2Leafnode/123 libwww-perl/456
Occasionally a HTTP server will look at the User-Agent and do something different or perhaps even allow access only for certain kinds of clients. Generally speaking this is very bad. The user_agent option here lets RSS2Leafnode masquerade as some other client, for instance as a browser if a server will only speak properly to a browser.
user_agent
$user_agent = 'Mosaic/1.0';
If the string ends with a space then LWP::UserAgent will append itself to the string.
fetch_rss ("group", "url", rss_charset_override => "CHARSET")
If set then force RSS content to be interpreted in this charset, irrespective of what the document says. See "ENCODINGS" in XML::Parser for the charsets supported (it has some builtins and then .enc files under /usr/lib/perl5/XML/Parser/Encodings/).
Use this option if the document is wrong, or if it has no charset specified and isn't the XML default UTF-8. Usually you'll only want this for a particular offending feed. For example,
# AIR is latin-1, but doesn't have a <?xml> saying that fetch_rss ('r2l.finance', 'http://www.aireview.com.au/rss.php', rss_charset_override => 'iso-8859-1');
By default RSS2Leafnode tries to cope with bad multibyte sequences by re-coding to the feed's claimed charset. If that works then the text will have some substitute characters (either U+FFFD or question marks "?") and a warning is given like
Feed http://example.org/feed.xml recoded utf-8 to parse, expect substitutions for bad non-ascii (line 214, column 75, byte 13196)
Bad single-byte codings generally aren't detected and will just go through to display something incorrect (eg. if MS-DOS codepage 1252 used where Latin-1 is claimed). Nose around the raw feed to see where it goes wrong.
fetch_rss ("group", "url", html_charset_from_content => H)
fetch_html ("group", "url", html_charset_from_content => H)
If true then the charset used for HTML content is taken from the HTML itself, rather than the server's HTTP headers. Normally the server should be believed, but if a particular server is misconfigured then you can try this.
fetch_html ('r2l.stuff', 'http://www.somebadserver.com/newspage.html', html_charset_from_content => 1);
Variables take effect from the point they're set, through to the end of the file, or until a new setting.
Options like render => 'lynx' in a particular fetch_rss() or fetch_html() override the global settings, just for that call.
render => 'lynx'
The Perl local feature and a braces block can confine a variable setting to a group of particular feeds. Eg.
local
{ local $rss_get_links = 1; fetch_rss ('r2l.debian', 'http://www.debian.org/News/weekly/dwn.en.rdf'); fetch_rss ('r2l.finance', ...); }
In Emacs, .rss2leafnode.conf can be put into perl-mode with the usual mode setup in the file
perl-mode
# -*- mode: perl-mode -*-
Or an auto-mode-alist setup in your .emacs,
auto-mode-alist
(add-to-list 'auto-mode-alist '("/\\.rss2leafnode\\.conf\\'" . perl-mode))
The Debian package of rss2leafnode has this setup, plus a completions ignore for the .rss2leafnode.status file. See /etc/emacs/site-start.d/50rss2leafnode.el in the package, or debian/emacsen-startup in the RSS2Leafnode sources.
.rss2leafnode.status
Non-ascii RSS text, Atom text and rendered HTML text are coded as UTF-8 in the generated messages so for non-ascii content you'll need a newsreader which supports that. Unrendered HTML is left in the charset the server gave, to ensure it matches any <meta http-equiv> in the document. In all cases the charset is specified in the MIME message headers or attachment parts. Transfer coding in the message body is chosen by MIME::Entity which normally means quoted-printable if any non-ascii or any very long lines. Atom <content> already in base64 is left that way.
<meta http-equiv>
MIME::Entity
<content>
Links are shown at the end of each message for
<link> RSS and Atom <enclosure> RSS <comments> RSS <content> Atom externals, except other XML feeds <source> RSS and Atom <prism:url> <sioc:has_creator> <sioc:has_discussion> <sioc:links_to> <sioc:reply_of> <wfw:comment> well-formed web <wiki:diff> <wiki:history> Author <url> Atom and wiki, not downloaded
Comment or reply links show a count of replies from any of
<thr:total> <link count="123" attribute <link thr:count="123" attribute <slash:comments> sub-element of <comments>
RSS comment feeds for $rss_get_comments are as follows. "appication" is a mis-spelling from WordPress pre 2.5 still sometimes found in use (as of Oct 2012).
$rss_get_comments
<wfw:commentRss> <link rel='replies' type='application/atom+xml' ...> <link rel='replies' type='appication/atom+xml' ...>
Comments links are shown as "Replies" or "RSS Replies". If an RSS comment feed hasn't been detected as RSS it may show up as plain "Replies" instead of "RSS Replies". In that case it won't be downloaded by the rss_get_comments option.
rss_get_comments
<media:group> links are shown as blocks of links. Not sure about the quality of the formatting yet, and they're not downloaded by rss_get_links.
<media:group>
Common Alerts Protocol (CAP) fields for weather alerts etc are shown if present (eg. from the US NOAA). This can have more detail than just the text. Pseudo-link footnotes are shown for
<geo:lat>,<geo:long> <geo:Point> <georss:point> <statusnet:origin> possibly with URL target too <media:credit> <re:rank> <hlxcd:helex-company-data> symbol and name
Unrecognised item fields are shown in XML at the end of the message. This is a bit technical but tries not to drop information and might suggest extra things RSS2Leafnode could present or interpret.
An attempt is made to repair bad XML from a feed with XML::Liberal if you have that module. It uses XML::LibXML and the libxml library and often succeeds on annoying things like bad &foo; entities, at least enough to present something. On hopelessly malformed data it might be a bit slow.
XML::Liberal
XML::LibXML
libxml
&foo;
The most common XML problem is too much or too little &foo; entity escaping. Too little can turn HTML markup into nested XML elements. RSS2Leafnode attempts to treat that as if it was XHTML style sub-elements but the result is likely to be imperfect. Too much escaping results in raw or semi-raw HTML <p> or &foo; coming through. ' may be from XHTML instead of HTML, though many browsers support that entity anyway. Perhaps an option for extra unescaping could improve some bad feeds but in practice is unlikely to be wholly successful. Every bad feed tends to be bad in its own special way.
<p>
'
For reference the message headers fields are generated roughly as follows,
First non-empty of
<author> <jf:author> <slate:author> <dc:creator> <dc:contributor> <wiki:username> <itunes:author> <managingEditor> <webMaster> <dc:publisher> <itunes:owner> channel <title>
The dc bits in RDF might have sub-elements <rdf:description><rdf:value> containing the actual text.
dc
<rdf:description><rdf:value>
<dc:contributor> <rdf:Description ...> <rdf:value>Joe Bloggs</rdf:value> </rdf:Description> </dc:contributor>
Atom has <name> and <email> sub-elements. <itunes:owner> may have an <itunes:email> sub-element. Such sub-elements are checked without worrying whether the feed is supposed to be Atom or RSS etc. If there's no email in the item but the name matches the channel owner then the email is taken from there. When there's no sub-elements the text is free-form and might be things like
<name>
<email>
<itunes:owner>
<itunes:email>
owner
Name Name <foo@example.com> foo@example.com (Name)
If there's no identifiable email mailbox part in the text and no <email> element then nobody@HOSTNAME is added to make a valid RFC 822 address.
nobody@HOSTNAME
The channel <title> as a final fallback is meant to at least show something about where the message came from if there's no author identified. An author <url> is shown in the message links as described above.
<url>
<dc:creator> can appear multiple times for multiple authors. They're combined as a multiple From per RFC 5322, but currently without attempting to pick out a Sender: from among them. Atom feeds can have multiple <contributor> but for now only the primary author or authors are shown.
<dc:creator>
From
Sender:
<contributor>
First present of
<title> <dc:title> <dc:subject>
<dc:subject> is normally only a keyword but might be better than nothing.
<dc:subject>
<pubDate> <dc:date> <jf:creationDate> <modified> <updated> <issued> <dcterms:issued> <created> <lastBuildDate> <published> <prism:publicationDate>
dc:date is ISO format "2000-01-01T12:00:00Z" etc and anything in that form is converted to RFC 822 style for the messages. An unrecognised form is put through unmodified.
dc:date
<jf:creationDate> is not used. It's apparently meant to be locale-based for human readability and is probably accompanied by <pubDate> anyway so not needed.
<jf:creationDate>
<pubDate>
The date/time when rss2leafnode made the message.
First of
<id> (Atom) <guid isPermaLink="true"> <link> Yahoo Finance special case <guid isPermaLink="false"> and feed URL MD5 hash of various fields and feed URL
Yahoo Finance items repeated in different feeds are noticed using a special match of the <link> so that just one copy is posted. (As of March 2010 those items don't offer RSS guid identifiers.)
<link>
guid
All of
<category> <itunes:category> <cap:category> <itunes:keywords> <media:keywords> <dc:subject> <slash:section> <slate:topic>
The sub-category system of <itunes:category> is not currently put through.
<itunes:category>
Some blog feeds have a big set of categories, maybe an aggregate of everything in the blog or some such, making an unattractively long Keywords: header. It's kept in full for the sake of completeness, but if viewing it in a newsreader then some sort of line limit might be wanted.
Keywords:
<thr:in-reply-to> elements (per RFC 4685) turned into Message-IDs the same way as an Atom <id>. This might help thread display in a news reader if the parent item was downloaded too.
<thr:in-reply-to>
<sioc:reply_of> is not used. It'd be a possibility, but would probably need a hard-coded mapping of URL to Message-ID. For now it's just shown as a link as described above.
<sioc:reply_of>
The URL of a fetch_html() or a $get_links attachment part. Good newsreaders can use this to resolve relative links in a HTML part.
$get_links
This same URL and any xml:base attribute is used as a <base href=""> when making a HTML fragment, so the location is present when saving a message body and when rendering it to plain text.
xml:base
<base href="">
<language> <dc:language> <twitter:lang> xml:lang="" HTTP response Content-Language header
xml:lang is the standard XML attribute present on any element and sometimes found on Atom <content> text.
xml:lang
The language code is also added to a generated HTML body in HTML4 style, but whether any renderers/browsers do much with it is another matter.
<html lang="en">
From the corresponding HTTP header of a fetch_html() or $get_links download part, though in practice this is almost never sent by HTTP servers.
These headers are only supposed to be for X.400 inter-operation. Common Alerts Protocol and Wiki (http://www.meatballwiki.org/wiki/ModWiki) are treated as
<cap:severity> "Extreme" and "Severe" -> "Importance: high" and "Priority: urgent" <wiki:importance> "minor" -> "Importance: low"
"list" for certain Google Groups lists, identified by their link URLs per List-Post below. Perhaps other feeds which come from mailing lists could be identified too.
List-Post
Per the $get_icons option described above, the first item or channel element
$get_icons
<image> RSS <icon> Atom <logo> Atom <itunes:image> <statusnet:postIcon> <media:thumbnail> <activity:actor><link rel="avatar"> <author><gd:image> HTML favicon for fetch_html()
Gnus and perhaps other newsreaders can display Face:, see http://quimby.gnus.org/circus/face.
Face:
It'd be possible to generate an X-Face: as well or instead, but X-Face: is black and white and converting a colour image from the feeds is unlikely to look good.
X-Face:
Mailbox of a Google Groups mailing list feeds such as http://groups.google.com/group/cake-php/feed/rss_v2_0_msgs.xml. This may help post a followup to the list, depending on the newsreader. (A followup to an rss2leafnode newsgroup will normally go nowhere.)
Channel <rating>. Perhaps <itunes:explicit> or <media:adult> could be turned into a rating too.
<rating>
<itunes:explicit>
<media:adult>
"RSS2Leafnode/VERSION" plus the usual from MIME::Entity (see "build PARAMHASH" in MIME::Entity).
An RSS2Leafnode extension, being all of following. See "Copyright" above.
<rights> Atom <copyright> RSS <dc:rights> <dcterms:license> <creativeCommons:license> <link rel="license" href="..."> Atom
These are sought in the channel, the item, and also any Atom style <source> within the item.
<source>
An RSS2Leafnode extension, being the originating fetch_rss() feed URL downloaded. This is handy if an item has come out badly and you want to check the raw feed.
An RSS2Leafnode extension, being the channel <generator>. This might help assign blame for bad feed content etc.
<generator>
Of course all this conversion and endless variant DTDs wouldn't be necessary if RSS had been news in the first place. A news server already serves short messages, either read-only or with followups, and if news servers hadn't gained a well-deserved reputation for being a pain to administer, and if news hadn't been based on transferring gigabytes of "full feed" instead of by demand, then RSS might never have been wanted. Of course the other side is that if you're a web page author accustomed to HTTP then everything looks like a HTTP and if you like HTML then a ridiculous edifice like XML to encapsulate a half dozen lines of text might even seem like a good idea.
The way Message-IDs are checked on the news server means that the server should be setup to retain messages for at least as long as the feed retains items, or as long as the rss_newest_only option you select for the feed. If that's not so then old articles will be re-posted by the next fetch_rss() and will look like new articles to a newsreader. (Letting the news server track articles keeps down the amount of state rss2leafnode must maintain and means multiple users can insert a feed without duplication.)
No retries are attempted if a news server disconnects, at least not unless posting to a different news server then coming back. Not sure if that's good or bad, but the current repeated error messages for a disconnect are unattractive. The intention for the future is to attempt a reconnect.
Some pre-releases of leafnode 2 might have trouble posting to local newsgroups while a fetchnews run is in progress. When this happens the local articles don't show up until after a subsequent further fetchnews. Or was this only for the rnews inject?
fetchnews
rnews
No attention is paid to <atom:updated> or other changes in an item. Should an updated item be re-posted? Is the Supersedes: header better, to replace the article? Something allowing readers to see or not see updates according to user preference might be good. Currently the item is reposted if <atom:id> changes or if there's no id and the content changes enough to make a new MD5 hash. Is id supposed to stay the same for an update?
<atom:updated>
Supersedes:
<atom:id>
id
The way $rss_get_links only gets the immediate link target could perhaps be extended to fetch images or frame sub-parts etc of a HTML page and include them in the message as RFC 2557 style "MHTML". But do any news readers actually display that?
Perhaps there should be a limit on the size of links to be downloaded. Sometimes podcast links have both a html page and a full audio link. If the audio is bigger than some threshold then might like to download the html but not the audio.
The entire XML feed is read into memory, which might be a little too much for large feeds. RSS was conceived as a "site summary" but is used for bigger content too. Twig has a partial-tree parse for one item at a time, though applying the rss_newest_only option would require a first pass to choose items. A progressive parse might help show the first few items if there's a fatal syntax error or truncation part-way through. Some care would be needed that small changes by the automated charset recoding or by XML::Liberal doesn't cause duplicated posts.
NNTPSERVER
NEWSHOST
Default news server as per Net::NNTP. If unset then localhost is used.
localhost
Configuration file.
Status file, recording "last modified" dates for downloads. This can be deleted if something bad seems to have happened to it; the next rss2leafnode run will recreate it.
Defaults per Net::NNTP and Net::Config.
Net::Config
leafnode(8), HTML::FormatText, HTML::FormatText::WithLinks, HTML::FormatExternal, lynx(1), URI::Title, XML::Parser, XML::Liberal, Image::Magick, Net::NNTP, Net::Config
Plagger, feed2imap(1), rss2email(1), rssdrop(1), toursst(1), http://www.gwene.org
http://user42.tuxfamily.org/rss2leafnode/index.html
Copyright 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2017 Kevin Ryde
RSS2Leafnode is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3, or (at your option) any later version.
RSS2Leafnode is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with RSS2Leafnode. If not, see http://www.gnu.org/licenses/.
To install App::RSS2Leafnode, copy and paste the appropriate command in to your terminal.
cpanm
cpanm App::RSS2Leafnode
CPAN shell
perl -MCPAN -e shell install App::RSS2Leafnode
For more information on module installation, please visit the detailed CPAN module installation guide.