The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

distlinks -- check URL links, with database cache

SYNOPSIS

 distlinks [--options] filename-or-dirname...

DESCRIPTION

Distlinks checks URLs found in files or a directory tree of files. An SQLite database avoids rechecking links between multiple program runs. It's a bit rough but is good for checking everything in a software distribution or similar.

Various file types are recognised and read appropriately to extract text parts to find URLs.

  • .gz and .bz2 gzip or bzip2.

  • .tar and .tar.gz Unix tar.

  • .zip

  • Text with UTF-16 or UTF-32 byte-order marker.

  • Image files per Image::ExifTool, so the text parts of PNG, JPEG, etc.

  • .mo message catalogue per gettext.

  • Skip executables ELF, MS-DOS, etc as identified by File::Type.

URLs are distilled from text with free-form matching so they can be in plain text, program code, etc. The following specific forms are recognised,

  • Angles <http://foo.com> and <URL:http://foo.com> as sometimes recommended for mail messages etc.

  • Quotes `http://foo.com' per Emacs docstrings.

  • Bare foo.com/index.html taken to be http:.

  • Texinfo @url{http://foo.com}.

  • HTML href="foo.html", interpreted relative to a <base> or the file itself.

  • Skip variables $FOO in URLs, taken to be program code etc.

COMMAND-LINE OPTIONS

The command line options are

-V
--verbose
--verbose=N

Print some diagnostics about what's being done. With --verbose=2 or --verbose=3 print some technical details too. Eg.

    distlinks --verbose
--version

Print the distlinks program version number. With --verbose=2 also print version numbers of some modules used.

CHECKING

news

Newsgroup references like "news:some.group.name" are checked by asking the news server whether the group exists. The default server ends is per Net::NNTP, which means an NNTPSERVER or NEWSHOST environment variable, or a Net::Config setup. For convenience distlinks tries "localhost" if none of those are set.

rsync

Rsync URLs like rsync://hostname/module/path/foo.txt are checked with LWP::Protocol::rsync if you have such a module, or otherwise a builtin protocol module which runs the rsync program and does enough for distlinks.

ENVIRONMENT VARIABLES

NNTPSERVER
NEWSHOST

News server host name or IP number.

TMPDIR

Temporary directory for untarring archives etc, per File::Temp and File::Spec.

FILES

~/.distlinks.sqdb

SQLite database of information kept about checked URLs.

/etc/libnet.cfg
/etc/perl/Net/libnet.cfg

Net::Config configuration, for news server.

BUGS

A .tar or similar archive is extracted into a directory under /tmp so that actual files can be reported on, but those temporary directories are never deleted.

SEE ALSO

Net::Config

chklinks

HOME PAGE

http://user42.tuxfamily.org/distlinks/index.html

LICENSE

Copyright 2009, 2010 Kevin Ryde

Distlinks is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3, or (at your option) any later version.

Distlinks is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with Distlinks. If not, see http://www.gnu.org/licenses/.