The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

URI::Shortener - Shorten URIs so that you don't have to rely on external services

VERSION

version 0.001

SYNOPSIS

    # Just run this and store it somewhere, hardcode it if you like
    my $secret = new_letter_ordering();
    ...
    # Actually shortening the URIs
    my $s = URI::Shortener->new(
        secret => $secret,
        prefix => 'https://go.mydomain.test/short',
        dbname => '/opt/myApp/uris.db',
        offset => 90210,
    );
    my $uri = 'https://mydomain.test/somePath';
    # Persistently memoizes via sqlite
    my $short = $s->shorten( $uri );
    # Short will look like 'https://go.mydomain.test/short/szAgqIE
    ...
    # Presumption here is that your request router knows what to do with this, e.g. issue a 302:
    my $long = $s->lengthen( $short );
    ...
    # Prune old URIs
    $s->prune_before(time());

DESCRIPTION

Provides utility methods so that you can:

1) Create a new short uri and store it for usage later 2) Persistently pull it up 3) Store a creation time so you can prune the database later.

We use sqlite for persistence.

ALGORITHM

The particular algorithm used to generate the ciphertext composing the shortened URI is simple.

Suppose $rowId is the database row corresponding to a given URI.

The text will be of this length:

floor($rowId / len($secret)) + 1;

It then adds a character from $secret at the position:

$rowId % len($secret)

And for each additional character, we then select the next character in $secret, modulus the length so that we wrap around if needed.

In short, it's a crude substitution cipher and one-time pad.

IMPORTANT

This can be improved to make corresponding DB IDs more difficult to guess by including an Identifier salt (the 'offset' parameter). The difficulty of bruting for valid URIs scales with the size of the secret; a-zA-Z would be factorial(26+26)=8e67 possible permutations.

That said you shouldn't store particularly sensitive information in any URI, or attempt to use this as a means of access control. It only takes one right guess to ruin someone's day. You shouldn't use link shorteners for this at all, but many have done so and many will in the future.

I strongly recommend that you configure whatever serves these URIs be behind a fail2ban rule that bans 3 or more 4xx responses.

The secret used is not stored in the DB, so don't lose it. You can't use a DB valid for any other secret and expect anything but GIGO.

Multiple different prefixes for the shortened URIs are OK though. The more you use, the harder it is to guess valid URIs. Sometimes, CNAMEs are good for something.

OTHER CONSEQUENCES

If you prune old DB records and your database engine will then reuse these IDs, be aware that this will result in some old short URIs resolving to new pages.

The ciphertext generated will be unique supposing that every character in $secret is as well. The new_letter_ordering() subroutine is provided which can give you precisely that. It's a random ordering of a..zA..z. If you need more than those characters, use a different secret.

I would recommend passing List::Util::uniq(split(//, $secret)) to avoid issues with duplicated characters in $secret if you can't manually verify it.

UTF-8

I have not tested this module with UTF8 secrets. My expectation is that it will not work at all with it, but this could be patched straighforwardly.

CONSTRUCTOR

$class->new(%options)

See SYNOPSIS for supported optiosn.

We setting a default 'offset' of 0, and strip trailing slash(es) from the prefix.

The 'dbfile' you pass will be created automatically for you if possible. Otherwise we will croak the first time you run shorten() or lengthen().

METHODS

new_letter_ordering()

Static method. Returns a shuffle of a-zA-Z.

This results in a secret which produces URIs which can be spoken aloud in NATO phonetic alphabet. I presume this is the primary usefulness of URL shorteners aside from phishing scams.

cipher( STRING $secret, INTEGER $id )

Expects a bytea[] style string (e.g. "Good old fashioned perl strings") as opposed to the char[] you get when the UTF8 flag is high. Returns the string representation of the provided ID via the algorithm described above.

shorten($uri)

Transform original URI into a shortened one.

lengthen($uri)

Transform shortened URI into it's original.

prune_before(TIME_T $when)

Remove entries older than UNIX timestamp $when.

BUGS

Please report any bugs or feature requests on the bugtracker website https://github.com/Troglodyne-Internet-Widgets/URI-Shorten/issues

When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature.

AUTHORS

Current Maintainers:

  • George S. Baugh <teodesian@gmail.com>

COPYRIGHT AND LICENSE

Copyright (c) 2022 Troglodyne LLC

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.