NAME
WWW::Link::Repair::Substitutor - repair links by text substitution
SYNOPSIS
use WWW::Link::Repair::Substitutor;
$dirsubs = WWW::Link::Repair::Substitutor::gen_substitutor
( "http://bounce.bounce.com/frodo/dogo" ,
"http://thing.thong/ding/dong",
1, 0, ); #directory substitution don't replace subsidiary links
&$dirsubs ($line_from_file)
DESCRIPTION
A module for substituting one link in a file for another.
This link repairer works by going through a file line by line and doing a substitute on each line. It will substitute absolute links all of the time, including within the text of the HTML page. This is useful because it means that things like instructions to people about what to do with URLs will be corrected.
SUBSTITUTORS
A substituter is a function which substitutes one url for another in a string. Typically it would be fed a file a line at a time and would substitute it directly. It works on it's argument directly.
The two urls should be provided in absolute form.
FILE HANDLERS
A file handler goes through files calling a substitutor as needed.
gen_directory_substitutor
Warning: I think the logic around here is more than a little dubious
gen_substitutor
This function was previously an exported interface and currently remains visible. I think it's interface is likely to change though. Preferably use generate_file_substitutor as an entry point instead.
This function generates a function which can be called either on a complete line of text from a file or on a URL and which will update the URL based on the URLs it has been given
If the third argument is true then the function will return a substitutor which works on all of the links below a given url and substitutes them all together. Thus if we change
http://fred.jim/eating/
to
http://roger.jemima/food/eating-out/
we also change
http://fred.jim/eating/hotels.html
to
http://roger.jemima/food/eating-out/hotels.html
This function should handle fragments correctly. This means that we should allow fragments to be substituted to and from normal links, but also when we fix a url to a url all of the internal fragments should follow. Fragments are not relative links. Cases
substitution of fragment for fragment
substitution of link for link
substitution of link to fragment
substitution of fragment to link
substitution of url base for url base with all relative links
Note that right now it isn't possible to substitute a tree under a fragment. There is no such thing as a sub-fragment defined in the standards.
If we stubstitute a link to a fragment then we should not substitute fragments under that link. that would loose information. Rather we should issue a warning. Maybe there should be an option that lets this happen.
gen_file_substitutor(<original url>, <new url>, [args...])
This function returns a function which will act on a text file or other file which can be treated as a text file and will carry out URL substitutions within it.
The returned code reference should be called with a filename as an argument, it will then replace all occurrences of original url with new url.
There are various options to this which can be set by putting various key value pairs in the call.
fakeit - set to create a function which actually does nothing
tree_mode - set to true to substitute also URLs which are "beneath"
original url
keep_orig - set to false to inhibit creation of backup files
relative - substitute also relative relative URLs which are equivalent
to original url (requires file_to_url)
file_to_url - provide a function which can translate a given filename
to a URL, so we can work out relative URLs for the current
file.
so a call like
$subs=gen_file_substitutor
("http://www.example.com/friendstuff/old",
"http://www.example.com/friendstuff/new",
relative => 1, tree_mode => 1;
file_to_url =>
sub { my $ret=shift;
$ret =~ return s,/var/www/me,http://www.example.com/mystuff,;
return $ret});
&$subs("/var/www/me/index.html");
&$subs("/var/www/me/friends.html");
should allow you to fix your web pages if your friend renames a whole directory.
BUGS
One problem with directory substitutors is treatment of the two different urls
http://fred.jim/eating/
and
http://fred.jim/eating
Most of the time, the latter of the pair is really just a mistaken reference to the earlier. This is not always true. What is more, where it is true, a user of LinkController will usually have changed to the correct version. For this reason, if gen_directory_substitutor is passed the first form of a url, it will not substitute the second. If passed the second, it will substitute the first.
We have to be fed whole URLs at a time. If a url is split between two different chunks then we may not handle it correctly. Always feeding in a complete line protects us from this because a URL cannot contain an unencoded line break.