NAME
URI::Fast - A fast(er) URI parser
SYNOPSIS
if
(
$uri
->scheme =~ /http(s)?/) {
my
@path
=
$uri
->path;
my
$fnord
=
$uri
->param(
'fnord'
);
my
$foo
=
$uri
->param(
'foo'
);
}
if
(
$uri
->path =~ /\/login/ &&
$uri
->scheme ne
'https'
) {
$uri
->scheme(
'https'
);
$uri
->param(
'upgraded'
, 1);
}
DESCRIPTION
URI::Fast
is a faster alternative to URI. It is written in C and provides basic parsing and modification of a URI.
URI is an excellent module; it is battle-tested, robust, and handles many edge cases. As a result, it is rather slower than it would otherwise be for more trivial cases, such as inspecting the path or updating a single query parameter.
EXPORTED SUBROUTINES
Subroutines are exported on demand.
uri
Accepts a URI string, minimally parses it, and returns a URI::Fast
object.
Note: passing a URI::Fast
instance to this routine will cause the object to be interpolated into a string (via "to_string"), effectively creating a clone of the original URI::Fast
object.
iri
Similar to "uri", but returns a URI::Fast::IRI
object. A URI::Fast::IRI
differs from a URI::Fast
in that UTF-8 characters are permitted and will not be percent-encoded when modified.
abs_uri
Builds a new URI::Fast
from a relative URI string and makes it "absolute" in relation to $base
.
html_url
Parses a URI string, removing whitespace characters ignored in URLs found in HTML documents, replacing backslashes with forward slashes, and making the URL "normalize"d.
If a base URL is specified, the URI::Fast
object returned will be made "absolute" relative to that base URL.
# Resulting URL is "https://www.slashdot.org/recent"
uri_split
Behaves (hopefully) identically to URI::Split, but roughly twice as fast.
encode/decode/uri_encode/uri_decode
See "ENCODING".
CONSTRUCTORS
new
If desired, both URI::Fast
and URI::Fast::IRI may be instantiated using the default OO-flavored constructor, new
.
new_abs
OO equivalent to "abs_uri".
new_html_url
OO equivalent to "html_url".
ATTRIBUTES
All attributes serve as full accessors, allowing the URI segment to be both retrieved and modified.
RAW ACCESSORS
Each attribute defines a raw_*
method, which returns the raw, encoded string value for that attribute. If a new value is passed, it will set the field to the raw, unchanged value without checking it or changing it in any way.
CLEARERS
Each attribute further has a matching clearer method (clear_*
) which unsets its value.
ACCESSORS
In general, accessors accept an unencoded string and set their slot value to the encoded value. They return the decoded value. See "ENCODING" for an in depth description of their behavior as well as an explanation of the more complex behavior of compound fields.
scheme
Gets or sets the scheme portion of the URI (e.g. http
), excluding ://
.
auth
The authorization section is composed of the username, password, host name, and port number:
hostname.com
someone
@hostname
.com
someone:secret
@hostname
.com:1234
Setting this field may be done with a string (see the note below about "ENCODING") or a hash reference of individual field names (usr
, pwd
, host
, and port
). In both cases, the existing values are completely replaced by the new values and any values missing from the caller-supplied input are deleted.
usr
The username segment of the authorization string. Updating this value alters "auth".
pwd
The password segment of the authorization string. Updating this value alters "auth".
host
The host name segment of the authorization string. May be a domain string or an IP address. If the host is an IPV6 address, it must be surrounded by square brackets (per spec), which are included in the host string. Updating this value alters "auth".
port
The port number segment of the authorization string. Updating this value alters "auth".
path
In scalar context, returns the entire path string. In list context, returns a list of path segments, split by /
.
my
$uri
= uri
'/foo/bar'
;
my
$path
=
$uri
->path;
# "/foo/bar"
my
@path
=
$uri
->path;
# ("foo", "bar")
The path may also be updated using either a string or an array ref of segments:
$uri
->path(
'/foo/bar'
);
$uri
->path([
'foo'
,
'bar'
]);
This differs from the behavior of "path_segments" in URI, which considers the leading slash separating the path from the authority section to be an individual segment. If this behavior is desired, the lower level split_path_compat
is available. split_path_compat
(and its partner, split_path
), always return an array reference.
my
$uri
= uri
'/foo/bar'
;
$uri
->split_path;
# ['foo', 'bar'];
$uri
->split_path_compat;
# ['', 'foo', 'bar'];
query
In scalar context, returns the complete query string, excluding the leading ?
. The query string may be set in several ways.
$uri
->query(
"foo=bar&baz=bat"
);
# note: no percent-encoding performed
$uri
->query({
foo
=>
'bar'
,
baz
=>
'bat'
});
# foo=bar&baz=bat
$uri
->query({
foo
=>
'bar'
,
baz
=>
'bat'
},
';'
);
# foo=bar;baz=bat
In list context, returns a hash ref mapping query keys to array refs of their values (see "query_hash").
Both '&' and ';' are treated as separators for key/value parameters.
frag
The fragment section of the URI, excluding the leading #
.
fragment
An alias of "frag".
METHODS
query_keys
Does a fast scan of the query string and returns a list of unique parameter names that appear in the query string.
Both '&' and ';' are treated as separators for key/value parameters.
query_hash
Scans the query string and returns a hash ref of key/value pairs. Values are returned as an array ref, as keys may appear multiple times. Both '&' and ';' are treated as separators for key/value parameters.
May optionally be called with a new hash of parameters to replace the query string with, in which case keys may map to scalar values or arrays of scalar values. As with all query setter methods, a third parameter may be used to explicitly specify the separator to use when generating the new query string.
param
Gets or sets a parameter value. Setting a parameter value will replace existing values completely; the "query" string will also be updated. Setting a parameter to undef
deletes the parameter from the URI.
$uri
->param(
'foo'
, [
'bar'
,
'baz'
]);
$uri
->param(
'fnord'
,
'slack'
);
my
$value_scalar
=
$uri
->param(
'fnord'
);
# fnord appears once
my
@value_list
=
$uri
->param(
'foo'
);
# foo appears twice
my
$value_scalar
=
$uri
->param(
'foo'
);
# croaks; expected single value but foo has multiple
# Delete parameter
$uri
->param(
'foo'
,
undef
);
# deletes foo
# Ambiguous cases
$uri
->param(
'foo'
,
''
);
# foo=
$uri
->param(
'foo'
,
'0'
);
# foo=0
$uri
->param(
'foo'
,
' '
);
# foo=%20
Both '&' and ';' are treated as separators for key/value parameters when parsing the query string. An optional third parameter explicitly selects the character used to separate key/value pairs.
$uri
->param(
'foo'
,
'bar'
,
';'
);
# foo=bar
$uri
->param(
'baz'
,
'bat'
,
';'
);
# foo=bar;baz=bat
When unspecified, '&' is chosen as the default. In either case, all separators in the query string will be normalized to the chosen separator.
$uri
->param(
'foo'
,
'bar'
,
';'
);
# foo=bar
$uri
->param(
'baz'
,
'bat'
,
';'
);
# foo=bar;baz=bat
$uri
->param(
'fnord'
,
'slack'
);
# foo=bar&baz=bat&fnord=slack
add_param
Updates the query string by adding a new value for the specified key. If the key already exists in the query string, the new value is appended without altering the original value.
$uri
->add_param(
'foo'
,
'bar'
);
# foo=bar
$uri
->add_param(
'foo'
,
'baz'
);
# foo=bar&foo=baz
This method is simply sugar for calling:
$uri
->param(
'key'
, [
$uri
->param(
'key'
),
'new value'
]);
As with "param", the separator character may be specified as the final parameter. The same caveats apply with regard to normalization of the query string separator.
$uri
->add_param(
'foo'
,
'bar'
,
';'
);
# foo=bar
$uri
->add_param(
'foo'
,
'baz'
,
';'
);
# foo=bar;foo=baz
query_keyset
Allows modification of the query string in the manner of a set, using keys without =value
, e.g. foo&bar&baz
. Accepts a hash ref of keys to update. A truthy value adds the key, a falsey value removes it. Any keys not mentioned in the update hash are left unchanged.
my
$uri
= uri
'&baz&bat'
;
$uri
->query_keyset({
foo
=> 1,
bar
=> 1});
# baz&bat&foo&bar
$uri
->query_keyset({
baz
=> 0,
bat
=> 0});
# foo&bar
If there are key-value pairs in the query string as well, the behavior of this method becomes a little more complex. When a key is specified in the hash update hash ref, a positive value will leave an existing key/value pair untouched. A negative value will remove the key and value.
my
$uri
= uri
'&foo=bar&baz&bat'
;
$uri
->query_keyset({
foo
=> 1,
baz
=> 0});
# foo=bar&bat
An optional second parameter may be specified to control the separator character used when updating the query string. The same caveats apply with regard to normalization of the query string separator.
append
Serially appends path segments, query strings, and fragments, to the end of the URI. Each argument is added in order. If the segment begins with ?
, it is assumed to be a query string and it is appended using "add_param". If the segment begins with #
, it is treated as a fragment, replacing any existing fragment. Otherwise, the segment is treated as a path fragment and appended to the path.
$uri
->append(
'bar'
,
'baz/bat'
,
'?k=v1&k=v2'
,
'#fnord'
,
'slack'
);
to_string
as_string
"$uri"
Stringifies the URI, encoding output as necessary. String interpolation is overloaded.
compare
$uri eq $other
Compares the URI to another, returning true if the URIs are equivalent. Overloads the eq
operator.
clone
Sugar for:
my
$uri
= uri
'...'
;
my
$clone
= uri
$uri
;
absolute
Builds an absolute URI from a relative URI and a base URI string. Adheres as strictly as possible to the rules for resolving a target URI in RFC3986 section 5.2. Returns a new URI::Fast object representing the absolute, merged URI.
abs
Alias of "absolute".
relative
Builds a relative URI using a second URI (either a URI::Fast
object or a string) as a base. Unlike "rel" in URI, ignores differences in domain and scheme assumes the caller wishes to adopt the base URL's instead. Aside from that difference, it's behavior should mimic "rel" in URI's.
$uri
->to_string;
# "foo/bar"
$uri
->to_string;
# "foo/bar/"
rel
Alias of "relative".
normalize
Similar to "canonical" in URI, performs a minimal normalization on the URI. Only generic normalization described in the rfc is performed; no scheme-specific normalization is done. Specifically, the scheme and host members are converted to lower case, dot segments are collapsed in the path, and any percent-encoded characters in the URI are converted to upper case.
canonical
Alias of "normalize".
ENCODING
URI::Fast
tries to do the right thing in most cases with regard to reserved and non-ASCII characters. URI::Fast
will fully encode reserved and non-ASCII characters when setting individual values and return their fully decoded values. However, the "right thing" is somewhat ambiguous when it comes to setting compound fields like "auth", "path", and "query".
When setting compound fields with a string value, reserved characters are expected to be present, and are therefore accepted as-is. Any non-ASCII characters will be percent-encoded (since they are unambiguous and there is no risk of double-encoding them). Thus,
$uri
->auth(
'someone:secret@Ῥόδος.com:1234'
);
$uri
->auth;
# "someone:secret@%E1%BF%AC%CF%8C%CE%B4%CE%BF%CF%82.com:1234"
On the other hand, when setting these fields with a reference value (assumed to be a hash ref for "auth" and "query" or an array ref for "path"; see individual methods' docs for details), each field is fully percent-encoded, just as if each individual simple slot's setter had been called:
$uri
->auth({
usr
=>
'some one'
,
host
=>
'somewhere.com'
});
$uri
->auth;
# "some%20one@somewhere.com"
$uri
->usr;;
# "some one"
The same goes for return values. For compound fields returning a string, non-ASCII characters are decoded but reserved characters are not. When returning a list or reference of the deconstructed field, individual values are decoded of both reserved and non-ASCII characters.
'+' vs '%20'
Although no longer part of the standard, +
is commonly used as the encoded space character (rather than %20
); it is still official to the application/x-www-form-urlencoded
type, and is treated as a space by "decode".
encode
Percent-encodes a string for use in a URI. By default, both reserved and UTF-8 chars (! * ' ( ) ; : @ & = + $ , / ? # [ ] %
) are encoded.
A second (optional) parameter provides a string containing any characters the caller does not wish to be encoded. An empty string will result in the default behavior described above.
For example, to encode all characters in a query-like string except for those used by the query:
my
$encoded
= URI::Fast::encode(
$some_string
,
'?&='
);
decode
Decodes a percent-encoded string.
my
$decoded
= URI::Fast::decode(
$some_string
);
uri_encode
uri_decode
These are aliases of "encode" and "decode", respectively. They were added to make BLUEFEET happy after he made fun of me for naming "encode" and "decode" too generically.
In fact, these were originally aliased as url_encode
and url_decode
, but due to some pedantic whining on the part of BGRIMM, they have been renamed to uri_encode
and uri_decode
.
escape_tree
unescape_tree
Traverses a data structure, escaping or unescaping defined scalar values in place. Accepts a reference to be traversed. Any further parameters are passed unchanged to "encode" or "decode". Croaks if the input to escape/unescape is a non-reference value.
my
$obj
= {
foo
=> [
'bar baz'
,
'bat%fnord'
],
bar
=> {
baz
=>
'bat%bat'
},
baz
=>
undef
,
bat
=>
''
,
};
URI::Fast::escape_tree(
$obj
);
# $obj is now:
{
foo
=> [
'bar%20baz'
,
'bat%25fnord'
],
bar
=> {
baz
=>
'bat%25bat'
},
baz
=>
undef
,
bat
=>
''
,
}
URI::Fast::unescape_tree(
$obj
);
# $obj returned to original form
URI::Fast::escape_tree(
$obj
,
'%'
);
# escape but allow "%"
# $obj is now:
{
foo
=> [
'bar%20baz'
,
'bat%fnord'
],
bar
=> {
baz
=>
'bat%bat'
},
baz
=>
undef
,
bat
=>
''
,
}
CAVEATS
This module is designed to parse URIs according to RFC 3986. Browsers parse URLs using a different (but similar) algorithm and some strings that are valid URLs to browsers are not valid URIs to this module. The "html_url" function attempts to parse URLs more in line with how browsers do, but no guarantees are made as HTML standards and browser implementations are an ever shifting landscape.
SPEED
SEE ALSO
ACKNOWLEDGEMENTS
Thanks to ZipRecruiter for encouraging their employees to contribute back to the open source ecosystem. Without their dedication to quality software development this distribution would not exist.
CONTRIBUTORS
The following people have contributed to this module with patches, bug reports, API advice, identifying areas where the documentation is unclear, or by making fun of me for naming certain methods too generically.
- Andy Ruder
- Aran Deltac (BLUEFEET)
- Ben Grimm (BGRIMM)
- Dave Hubbard (DAVEH)
- James Messrie
- Martin Locklear
- Randal Schwartz (MERLYN)
- Sara Siegal (SSIEGAL)
- Tim Vroom (VROOM)
- Des Daignault (NAWGLAN)
- Josh Rosenbaum
AUTHOR
Jeff Ober <sysread@fastmail.fm>
COPYRIGHT AND LICENSE
This software is copyright (c) 2018 by Jeff Ober. This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.