HTML::Inspect::Normalize - normalize urls
HTML::Inspect::Normalize is an Exporter
set_page_base($base_url); # used as base for relative urls my $norm = normalize_url($relative_url); my ($norm, $rc, $err) = normalize_url($relative_url);
Although being part of module HTML::Inspect, it has a right of its own: the functions really, really fast convert sloppy http and https urls as found on webpages into cleanly normalized urls.
http
https
Normalize a URL relative to the base (which needs to be set first). Same returns as set_page_base().
In LIST context, returns the normalized_url (string), rc, and errmsg. In SCALAR content, only returns the normalized_url and casts error exception when a problem was found. The base is normalized before use.
See also https://pipeline.shared-search.eu/extract/normalize.html
The following actions are taken:
leading and trailing blanks are stripped
spaces (CR, LF, TAB, VTAB) are moved, and following blanks as well
relative urls are converted to absolute
'+' and included blanks are converted to %20
%20
hex representation of normal characters (which includes comma and more) is converted back into their character
characters which need to be encoded are converted to hex
hex digits are upper-cased
utf8 characters get hex encoded
hex encoding must be valid utf8, possibly multi-byte
fragment is removed
empty path will becomde '/'
remove ./ and ../
./
../
removed repeating slashes
hostnames with utf8 get IDN encoded
hostname syntax verified
remove trailing dot from hostname
default port numbers removed
port numbers leading zeros removed, restricted to max 65535
HTML::Inspect, URI::Fast
This module is part of HTML-Inspect distribution version 1.00, built on December 08, 2021. Website: http://perl.overmeer.net/CPAN/
Copyrights 2021 by [Mark Overmeer <markov@cpan.org>]. For other contributors see ChangeLog.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See http://dev.perl.org/licenses/
To install HTML::Inspect, copy and paste the appropriate command in to your terminal.
cpanm
cpanm HTML::Inspect
CPAN shell
perl -MCPAN -e shell install HTML::Inspect
For more information on module installation, please visit the detailed CPAN module installation guide.