- ROBOT BEHAVIOR
- SEE ALSO
test-link - test links and update the link database
test-link [arguments] -V --version Give version information for this program -h --help --usage Describe usage of this program. --help-opt=OPTION Give help information for a given option -v --verbose[=VERBOSITY] Give information about what the program is doing. Set value to control what information is given. --quite -q --silent Program should generate no output except in case of error. --config-file=FILENAME Load in an additional configuration file -u --user-address=STRING Email address for user running link testing. -H --halt-time=MINUTES stop after given number of minutes --never-stop keep running without stopping --no-robot Don't follow robot rules. Dangerous!!! -w --no-waitre=NETLOC-REGEX Home HOST regex: no robot rules.. (danger?)!!! --test-now Test links now not when scheduled (testing only) --untested Test all links which have not been tested. --sequential Put links into schedule in order tested (for testing) -H --halt-time=MINUTES stop after given number of minutes -L --latest-time=MINUTES latest time from schedule to stop -m --max-links=INTEGER Maximum number of links to test (-1=no limit)
This program tests links and stores the information about what it found into the Link database.
* link database * schedule database
Configuration is done using the WWW::Link_Controller::ReadConf (3) module.
You may want to explicitly set the user name.
This program is designed to be a well behaved netizen.. That means that it will try not to put alot of load on a single site. However, the program also attempts to work efficeiently through all of the links it has to check.
In order to achieve these goals the test-link will wait for a delay period between checks to the same site, but it will try to re-order it's work so that it always has some link to check. It looks ahead up to 100 links.
Making this queue longer will probably not help with efficiency since an overload is probably a sign that you have many links from the same site. If that site is your own to check or you can get an arrangement with them then you could use a regular expression to allow faster checking.
Most of the scheduling is handled by Schedule::Softtime which provides an `I'll get round to you when I can be bothered' scheduler. We guarantee that we will never schedule a link earlier than min-time (defaults to a day) from now.
The suggested time is created by the link (see WWW::Link) for details. We then check that it's at least a certain amount (hard wired to be one day at present) into the future.
During it's operation, test-link can write a log file (to a file given in the $::link_stat_log configuration variable). This can be used to alerts to the webmaster about newly broken links.
test-link uses a very simple application level lock to protect the links database. If you bypass this locking it could corrupt the database. Only other runs of test-link will follow this locking.
During a run you can run link-report, but there is in principle no guarantee that it works properly at all. However it shouldn't normally do any damage since it has read only access to the database.
Note that the lock is done on the links database filename.
Other programs such as build-schedule and link creation programs should not
The locking used in the current design could be considered a bug..
There should be a mechanism for detecting that the computer is not connected to the network at all and aborting the run completely. This would avoid false positive broken links.
There is a problem with redirects. The second request has to wait for the robot rules to permit it after the first. We should allow a number of levels of redirects without waiting... Maybe this is fixed best with a parallel agent.
The LinkController manual in the distribution in HTML, info, or postscript formats, included in the distribution.
http://scotclimb.org.uk/software/linkcont/ - the LinkController homepage.