The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

URI::URL - Uniform Resource Locators (absolute and relative)

SYNOPSIS

 require URI::URL;

 # Constructors
 $url1 = new URI::URL 'http://www.perl.com/%7Euser/gisle.gif';
 $url2 = new URI::URL 'gisle.gif', 'http://www.com/%7Euser';
 $url3 = $url2->abs; # get absolute url using base
 $url4 = $url2->abs('http:/other/path');

 $url5 = newlocal URI::URL;                # pwd
 $url6 = newlocal URI::URL '/tmp';         # dir
 $url7 = newlocal URI::URL '/etc/motd';    # file

 $url  = $url8->clone;     # copy objects

 # Stringify URL
 $str1 = $url->as_string;  # complete escaped URL string
 $str2 = $url->full_path;  # escaped path+query+params
 $str3 = "$url";           # use operator overloading (experimental)

 # Retrieving Generic-RL components:
 $scheme   = $url->scheme;
 $netloc   = $url->netloc;  # see user,password,host,port below
 $path     = $url->path;
 $params   = $url->params;
 $query    = $url->query;
 $frag     = $url->frag;

 # Retrieving Network location (netloc) components:
 $user     = $url->user;
 $password = $url->password;
 $host     = $url->host;
 $port     = $url->port;     # returns default if not defined

 # Retrieving other attributes:
 $base     = $url->base;

 # All methods above can set field values:
 $url->scheme('http');
 $url->host('www.w3.org');
 $url->port($url->default_port);
 $url->base($url5);          # use string or object

 # Specify unsafe characters to be escaped for this url
 $url->unsafe('\x00-\x20"\$#%;<>?\x7E-\xFF');

 # Port numbers
 $defport= $url->default_port;  # default port for scheme

 # Functions
 URI::URL::strict(0);                    # disable strict schemes
 URI::URL::implementor;                  # get generic implementor
 URI::URL::implementor($scheme);         # get scheme implementor
 URI::URL::implementor($scheme, $class); # set scheme implementor

DESCRIPTION

This module implements URI::URL objects representing Uniform Resource Locators (URL). Both absolute (RFC 1738) and relative (RFC 1808) URLs are supported.

URI::URL objects are created by new(), which takes a string representation of a URL or an existing URL object reference to be cloned. Specific individual elements can then be accessed via the scheme(), user(), password(), host(), port(), path(), params(), query() and frag() methods. These methods can be called with a value to set the element to that value, and always return the old value. The elem() method provides a general interface to access any element by name but it should be used with caution: the effect of using incorrect spelling and case is undefined.

The abs() method attempts to return a new absolute URI::URL object for a given URL. In order to convert a relative URL into an absolute one a base URL is required. You can associate a default base with a URL either by passing a base to the new() constructor when a URI::URL is created or using the base() method on the object later. Alternatively you can specify a one-off base as a parameter to the abs() method.

The object constructor new() must be able to determine the scheme for the URL. If a scheme is not specified in the URL it will use the scheme specified by the base URL. If no base URL scheme is defined then new() will croak unless URI::URL::strict(0) has been invoked, in which case http is silently assumed.

Once the scheme has been determined new() then uses the implementor() function to determine which class implements that scheme. If no implementor class is defined for the scheme then new() will croak unless URI::URL::strict(0) has been invoked, in which case the internal generic class is assumed.

Internally defined schemes are implemented by URI::URL::scheme_name. The URI::URL::implementor() function can also be used to set the class used to implement a scheme.

HOW AND WHEN TO ESCAPE

An edited extract from a URI specification:

The printability requirement has been met by specifing a safe set of characters, and a general escaping scheme for encoding "unsafe" characters. This "safe" set is suitable, for example, for use in electronic mail. This is the canonical form of a URI.

There is a conflict between the need to be able to represent many characters including spaces within a URI directly, and the need to be able to use a URI in environments which have limited character sets or in which certain characters are prone to corruption. This conflict has been resolved by use of an hexadecimal escaping method which may be applied to any characters forbidden in a given context. When URLs are moved between contexts, the set of characters escaped may be enlarged or reduced unambiguously. The canonical form for URIs has all white spaces encoded.

Notes:

A URL string must, by definition, consist of escaped components. Complete URLs are always escaped.

The components of a URL string must be individually escaped. Each component of a URL may have a separate requirements regarding what must be escaped, and those requirements are also dependent on the URL scheme.

Never escape an already escaped component string.

This implementation expects an escaped URL string to be passed to new() and will return an escaped URL string from as_string(). Individual components must be manipulated in unescaped form (this is most natural anyway).

The escaping applied to a URL when it is constructed by as_string() (or full_path()) can be controlled by using the unsafe() method to specify which characters should be treated as unsafe.

ADDING NEW URL SCHEMES

New URL schemes or alternative implementations for existing schemes can be added to your own code. To create a new scheme class use code like:

   package MYURL::foo;              
   @ISA = (URI::URL::implementor);   # inherit from generic scheme

The 'URI::URL::implementor()' function call with no parameters returns the name of the class which implements the generic URL scheme behaviour (typically URI::URL::_generic). All schemes should be derived from this class.

Your class can then define overriding methods (e.g., new(), _parse() as required).

To register your new class as the implementor for a specific scheme use code like:

   URI::URL::implementor('foo', 'MYURL::foo');

Any new URL created for scheme 'foo' will be implemented by your MYURL::foo class. Existing URLs will not be affected.

WHAT A URL IS NOT

URL objects do not, and should not, know how to 'get' or 'put' the resources they specify locations for, anymore than a postal address 'knows' anything about the postal system. The actual access/transfer should be achieved by some form of transport agent class. The agent class can use the URL class, but should not be a subclass of it.

OUTSTANDING ISSUES

Need scheme-specific reserved characters, maybe even scheme/part specific reserved chars...

The overloading interface is experimental. It is very useful (especially for interpolating URLs into strings) but should not yet be relied upon.

AUTHORS / ACKNOWLEDGMENTS

This module is (distantly) based on the wwwurl.pl code in the libwww-perl distribution developed by Roy Fielding <fielding@ics.uci.edu>, as part of the Arcadia project at the University of California, Irvine, with contributions from Brooks Cutter.

Gisle Aas <aas@nr.no>, Tim Bunce <Tim.Bunce@ig.co.uk>, Roy Fielding <fielding@ics.uci.edu> and Martijn Koster <m.koster@nexor.co.uk> (in aplhabetical order) have collaborated on the complete rewrite for Perl 5, with input from other people on the libwww-perl mailing list.

If you have any suggestions, bug reports, fixes, or enhancements, send them to the libwww-perl mailing list at <libwww-perl@ics.uci.edu>.

COPYRIGHT

Copyright (c) 1995 Gisle Aas. All rights reserved. Copyright (c) 1995 Martijn Koster. All rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

IN NO EVENT SHALL THE AUTHORS BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION (INCLUDING, BUT NOT LIMITED TO, LOST PROFITS) EVEN IF THE AUTHORS HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

PREREQUISITES

You will need Perl5.001e or better.

AVAILABILITY

The latest version of this module is likely to be available from:

   http://www.oslonett.no/home/aas/perl/www/
   http://web.nexor.co.uk/public/perl/perl.html
   http://www.ics.uci.edu/WebSoft/libwww-perl/contrib/

BUGS

Not all schemes are fully implemented. Two-way functions to get/set things like the News URL digits etc. are missing.

Non-http scheme specific escaping is not correct yet.

METHODS AND FUNCTIONS

new

 $url = new URI::URL $escaped_string [, $optional_base_url]

This is the object constructor. To trap bad or unknown URL schemes use:

 $obj = eval { new URI::URL ... };

or set URI::URL::strict(0) if you do not care about bad or unknown schemes.

newlocal

 $url = newlocal URI::URL $path;

Return a URL object that denotes a path on the local filesystem (current directory by default). Paths not starting with '/' are taken relative to the current directory.

 $url->print_on(*FILEHANDLE);

Prints a verbose presentation of the contents of the URL object to the specified file handle (default STDOUT). Mainly useful for debugging.

URI::URL::implementor

 URI::URL::implementor;
 URI::URL::implementor($scheme);
 URI::URL::implementor($scheme, $class);

Get and/or set implementor class for a scheme. Returns '' if specified scheme is not supported. Returns generic URL class if no scheme specified.