The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Test::Pod::LinkCheck::Lite - Test POD links

SYNOPSIS

use Test::More 0.88;   # for done_testing();
use Test::Pod::LinkCheck::Lite;

my $t = Test::Pod::LinkCheck::Lite->new();
$t->all_pod_files_ok();

done_testing;

DESCRIPTION

This Perl module tests POD links. A given file generates one failure for each broken link found. If no broken links are found, one passing test is generated. This all means that there is no way to know how many tests will be generated, and you will need to use Test::More's done_testing() (or something equivalent) at the end of your test.

By its nature this module should be used only for author testing. The problem with using it in an installation test is that the validity of links external to the distribution being tested varies with things like operating system type and version, Perl version, installed Perl modules and their versions, and the Internet at large. Caveat user.

This module should probably be considered alpha-quality code at this point. It checks most of my modest corpus (correctly, I hope), but beyond that deponent sayeth not.

One thing perlpod is silent on (at least, I could not find anything about it) is how (or even whether) to normalize links and section names. Maybe I looked in the wrong place?

Anyhow, because Meta CPAN has been observed to link

L<SOME
SECTION>

to =head1 SOME SECTION, this module normalizes both link and section names by removing leading and trailing white space, and replacing embedded white space with a single space. Yes, I know that Meta CPAN's observed handling of POD is far from being definitive.

This module started its life as a low-dependency version of Test::Pod::LinkCheck. Significant differences from that module include:

Minimal use of the shell

This module shells out only to check man links.

That is, a skipped test is generated for each. Note that Test::Pod::LinkCheck appears to fail the link in at least some such cases.

This seemed to be an easy enough addition.

Dependencies are minimized

Given at least Perl 5.13.9, the only non-core module used is B::Keywords.

POD links come in the following flavors:

  • man

    These links are of the form L<manpage (section)>. They will only be checked if the man attribute is true, and can only be successfully checked if the man command actually displays man pages, and man -w can be executed.

  • url

    These links are of the form L<http://...> (or https: or whatever). They will only be checked if the check_url attribute is true, and can only be successfully checked if Perl has access to the specified URL.

    NOTE that https: links can only be checked if IO::Socket::SSL version 1.42 (at least) and Net::SSLeay version 1.49 (at least) are installed. These are NOT prerequisites of Test::Pod::LinkCheck::Lite because they are not in core, and I am trying to keep non-core dependencies to a minimum. If these modules are not present an attempt to check an https: link will result in a skipped test. In addition, a diagnostic will be issued for the first https: link skipped by the test object.

  • pod (internal)

    These links are of the form L<text|/section>. They are checked using the parse tree in which the link was found.

  • pod (external)

    This is pretty much everything else. There are a number of cases, and the only way to distinguish them is to run through them.

    Perl built-ins

    These links are of the form L<text|builtin>> or L<builtin>, and are checked against the lists in B::Keywords.

    Installed modules and pod files

    These are resolved to a file using Pod::Perldoc. If a section was specified, the file is parsed to determine whether the section name is valid.

    Uninstalled modules

    These are checked against modules/02packages.details.txt.gz, provided that (or some reasonable facsimile) can be found. Currently we can look for this information in the following places:

    File Metadata in the directory used by the CPAN client;
    Website https://cpanmetadb.plackperl.org/, a.k.a. the CPAN Meta DB.

    If more than one of these is configured (by default they all are), we look in the newest one.

    Sections can not be checked. If a link to a valid (but uninstalled) module has a section, a skipped test is generated.

The ::Lite refers to the fact that a real effort has been made to reduce non-core dependencies. Under Perl 5.14 and up, the only known non-core dependency is B::Keywords.

An effort has also been made to minimize the spawning of system commands.

METHODS

This class supports the following public methods:

new

my $t = Test::Pod::LinkCheck::Lite->new();

This static method instantiates an object. Optional arguments are passed as name/value pairs.

The following arguments are supported:

add_dir

This argument is the name of a directory to search for extra POD, or a reference to an array of such directories. Directories that do not actually exist will be eliminated.

The default is blib/script if it exists, because otherwise links from modules to scripts will not be resolved.

agent

This argument is the user agent string to use for web access.

Note that this probably should have been called something more verbose like user_agent_string, but I was influenced by the name used by HTTP::Tiny, and did not anticipate the need for the interface to be able to specify the actual user agent.

The default is undef, which specifies whatever the actual user agent's agent() method returns.

allow_man_spaces

This Boolean argument is set true to allow internal spaces in a 'man' link. Note that such links can not be checked under some operating systems (e.g. FreeBSD) because the man (1) program splits its arguments on spaces.

The default is false.

cache_url_response

This Boolean argument is set true to cache the responses from URL links. This means each URL is queried only once, no matter how many times it appears.

This is an in-memory cache, and persists only for the life of the Test::Pod::LinkCheck::Lite object.

The default is true.

check_external_sections

This Boolean argument is true if the sections of links outside the current Pod are to be checked. If it is false, such sections are not checked, and the link is considered valid if the external Pod exists at all.

The default is true.

check_url

This Boolean argument is true if url links are to be checked, and false if not.

The default is true.

ignore_url

This argument specifies one or more URLs to ignore when checking url links. It can be specified as:

A Regexp object

Any URL that matches this Regexp is ignored.

undef

No URLs are ignored.

a scalar

This URL is ignored.

a SCALAR reference

The URL referred to is ignored.

a HASH reference

The URL is ignored if the hash contains a true value for the URL.

a CODE reference

The code is called with the URL to ignore in the topic variable (a.k.a. $_). The URL is ignored if the code returns a true value.

an ARRAY reference

The array can contain any legal ignore specification, and any URL that matches any value in the array is ignored. Nested arrays are flattened.

The default is [].

Note that the order in which the individual checks are made is undefined. OK, the implementation is deterministic, but the order of evaluation is an implementation detail that the author reserves the right to change without warning.

man

This Boolean argument is true if man links are to be checked, and false if not.

The default is false (with a diagnostic) if $^O is 'DOS' or 'MSWin32'. Under any other operating system the default is the value of IPC::Cmd::can_run( 'man' ). If this returns false a diagnostic is generated, and man links are not checked.

In case you're wondering: the Windows testing was done under ReactOS, and that appears to come with a MAN.EXE which (at least under 0.4.11) causes can_run() to return true, but which does, as far as I can tell, nothing useful.

module_index

This argument specifies a list of module indices to consult, as either a comma-delimited string or an array reference. Even if specified a given index will only be used if it is actually available for use. If more than one index is found, the most-recently-updated index will be used. Possible indices are:

cpan

Use the module index found in the CPAN working directory.

cpan_meta_db

Use the CPAN Meta database. Because this is an on-line index it is considered to be current, but its as-of time is offset to favor local indices.

By default all indices are considered.

prohibit_redirect

Added in version 0.004.

This argument controls whether redirects are allowed in the resolution of a URL link.

If a code reference is specified, it is called whenever a URL link is successfully resolved. The arguments are the Test::Pod::LinkCheck::Lite object, the HTTP::Tiny response hash, and the URL from the link. The code returns true to declare the link in error, false to allow it, or a code reference to defer the decision to that code. This latter is provided because I found the case where I wanted to do a little pre-processing and then defer to ALLOW_REDIRECT_TO_INDEX, but could not find a clean way to use a manifest constant in a goto.

Any other value is interpreted as a Boolean. If the argument is true, any redirect is an error. If false, redirects are allowed.

This argument is ignored unless check_url is true.

The default is false, for historical reasons.

require_installed

This Boolean argument is true to disable the uninstalled module checks. This means links to modules not installed on the system will fail, even if the module exists.

By default this is false.

skip_server_errors

Added in version 0.002.

This Boolean argument is true to generate skips rather than failures if an attempt to check a URL link fails with a server error (status 5xx).

By default this is true; it can be made false by passing value 0 or ''.

The default represents a change in the default behaviour from version 0.001, which failed a URL link if the check returned a server error. The logic (if any) in changing the default behaviour is that 5xx errors can represent actual server problems rather than errors in the link being checked, so changing the default behaviour eliminates possible false positives.

user_agent

Added in version 0.011

This argument is either a class name or an object. Either way, it must be a subclass of HTTP::Tiny.

If a class name is passed, the class must already be loaded. An object of that class will be instantiated by calling its new() method -- with the agent argument if that was specified,

agent

This method returns the value of the 'agent' attribute.

all_pod_files_ok

$t->all_pod_files_ok();

This method takes as its arguments the names of one or more files, and tests any such that are deemed to be Perl files. Directories are recursed into.

Perl files are considered to be all text files whose names end in .pod, .pm, or .PL, plus any text files with a shebang line containing 'perl'. File name suffixes are case-sensitive except for .PL.

If no arguments are specified, the contents of blib/ are tested. This is the recommended usage.

If called in scalar context, this method returns the number of test failures encountered. If called in list context it return the number of failures, passes, and skipped tests, in that order.

allow_man_spaces

$t->allow_man_spaces()
  and say 'Embedded spaces are allowed in man page names';

This method returns the value of the 'allow_man_spaces' attribute.

cache_url_response

$t->cache_url_response()
  and say 'URL responses are cached';

This method returns the value of the 'cache_url_response' attribute.

can_ssl

Added in version 0.012.

This convenience method wraps the user agent's method of the same name. See can_ssl() in the user agent's documentation for what is returned. This will be HTTP::Tiny by default.

check_external_sections

$t->check_external_sections()
    and say 'Sections in external links are checked';

This method returns the value of the 'check_url' attribute.

check_url

$t->check_url() and say 'URL links are checked';

This method returns the value of the 'check_url' attribute.

configuration

say $t->configuration( '    ' );

This convenience method returns a string containing all attributes of the object in human-readable form. The argument, if any, is prefixed to each line of the returned string.

ignore_url

print 'Ignored URLs ', join ', ', $t->ignore_url();

This method returns the value of the 'ignore_url' attribute. If called in scalar context, it returns an array reference. If called in list context it returns an array. Either way, the results will not be in the same order as originally specified to new().

man

$t->man() and say 'man links are checked';

This method returns the value of the 'man' attribute.

module_index

say 'Module indices: ', join ', ', $self->module_index();

This method returns the value of the 'module_index' attribute. If called in scalar context it returns a comma-delimited string.

pod_file_ok

my $failures = $t->pod_file_ok( 'lib/Foo/Bar.pm' );

This method tests the links in the given file. Each failure appears in the TAP output as a test failure. If no failures are found, a passing test will appear in the TAP output.

If called in scalar context, this method returns the number of test failures encountered. If called in list context it return the number of failures, passes, and skipped tests, in that order.

prohibit_redirect

$t->prohibit_redirect()
    and say 'All URL links must resolve without redirection';

Added in version 0.004.

This method returns the value of the 'prohibit_redirect' attribute.

require_installed

$t->require_installed()
   and say 'All POD links must be to installed modules';

This method returns the value of the 'require_installed' attribute.

skip_server_errors

$t->skip_server_errors()
   and say 'URL links that return status 5xx are skipped';

Added in version 0.002.

This method returns the value of the 'skip_server_errors' attribute.

MANIFEST CONSTANTS

The following manifest constants can be imported by name, or using the :const tag:

ALLOW_REDIRECT_TO_INDEX

Added in version 0.003.

This manifest constant is intended to be used as a value of the 'prohibit_redirect' attribute. It is a reference to a piece of code that accepts old-style redirects of an hierarchical URL ending in a '/' to an index of that leaf of the hierarchy.

Because this is a minimal-dependency module, the code referred to by this constant works by hand-checking for an hierarchical scheme (anything but 'data:', 'mailto:', or 'urn:'). If a URL with an hierarchical scheme ends in '/', the URL in the response has everything after the last '/' removed before comparison to the original URL.

This mess exists because of my bias that old-style redirection to an index is a different beast than indirection in general, and ought to be allowed. If you disagree you can ignore this functionality, or re-implement to suit yourself.

MAYBE_IGNORE_GITHUB

Added in version 0.009.

This manifest constant is intended to be used as a value of the 'ignore_url' attribute. It is a reference to a piece of code that ignores GitHub urls unless the directory specified by environment variable GIT_DIR (default: .git/ exists, and GitHub is a remote for the repository.

This is (maybe) a convenience for developers whose boilerplate includes GitHub links but have not yet uploaded to GitHub.

SEE ALSO

Test::Pod::LinkCheck by Apocalypse (APOCAL) checks all POD links except for URLs. It is Moose-based.

Test::Pod::Links by Sven Kirmess (SKIRMESS) checks all URLs or URL-like things in the document, whether or not they are actual POD links.

Test::Pod::No404s by Apocalypse (APOCAL) checks URL POD links.

ACKNOWLEDGMENTS

The author would like to acknowledge the following, without whom this module would not exist -- at least, not in anything like its current form.

Mohammed Anwar (MANWAR) who submitted the "broken POD link" ticket that started me thinking about testing for this kind of thing.

The CPAN Testers who, by testing my code under such a broad range of configurations, gave me an opportunity to make this module much more robust than it would otherwise have been. It is probably unfair to single out individual testers, but as the luck of the testing cycle would have it, results from Andreas J. König (ANDK), Slaven Rezić (SREZIC), Chris Williams (BINGOS), and Alceu Rodrigues de Freitas Junior were particularly useful to me.

SUPPORT

Support is by the author. Please file bug reports at https://rt.cpan.org/Public/Dist/Display.html?Name=Test-Pod-LinkCheck-Lite, https://github.com/trwyant/perl-Test-Pod-LinkCheck-Lite/issues, or in electronic mail to the author.

AUTHOR

Thomas R. Wyant, III wyant at cpan dot org

COPYRIGHT AND LICENSE

Copyright (C) 2019-2024 by Thomas R. Wyant, III

This program is free software; you can redistribute it and/or modify it under the same terms as Perl 5.10.0. For more details, see the full text of the licenses in the directory LICENSES.

This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose.