The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Session::Token - Secure, efficient, simple random session token generation

SYNOPSIS

Simple 128-bit session token

    my $token = Session::Token->new->get;
    ## 74da9DABOqgoipxqQDdygw

Keep generator around

    my $generator = Session::Token->new;

    my $token = $generator->get;
    ## bu4EXqWt5nEeDjTAZcbTKY

    my $token2 = $generator->get;
    ## 4Vez56Zc7el5Ggx4PoXCNL

Custom entropy in bits

    my $token = Session::Token->new(entropy => 256)->get;
    ## WdLiluxxZVkPUHsoqnfcQ1YpARuj9Z7or3COA4HNNAv

Custom alphabet and length

    my $token = Session::Token->new(alphabet => 'ACGT', length => 100_000_000)->get;
    ## AGTACTTAGCAATCAGCTGGTTCATGGTTGCCCCCATAG...

DESCRIPTION

This module provides a secure, efficient, and simple interface for creating session tokens, password reset codes, temporary passwords, random identifiers, and anything else you can think of.

When a Session::Token object is created, 1024 bytes will be read from /dev/urandom (Linux, Solaris, most BSDs), /dev/arandom (some older BSDs), or with Crypt::Random::Source::Strong::Win32 (Windows). These bytes will be used to seed the ISAAC-32 pseudo random number generator.

Once a generator is created, you can repeatedly call the get method on the generator object and it will return new tokens.

IMPORTANT: If your application calls fork, make sure that any generators are re-created in one of the processes after the fork since forking will duplicate the generator state and otherwise both parent and child processes will go on to produce identical tokens.

ISAAC is a cryptographically secure PRNG that improves on the well known RC4 algorithm in some important areas. For instance, it doesn't have short cycles like RC4 does. A theoretical shortest possible cycle in ISAAC is 2**40, although no cycles this short have ever been found (and probably don't exist at all). On average, ISAAC cycles are a ridiculous 2**8295.

After the generator context is created, no system calls are used to generate tokens. This is one way that Session::Token helps with efficiency. This is only important for certain use cases (generally not web sessions).

In a server application the most important reason you should use the "keep generator around" mode instead of creating Session::Token objects every time you need a token is that in this mode generating a new token cannot fail due to a full descriptor table. Creating new generators for every token can fail for this reason. Programs that re-use the generator are also more efficient and are less likely to cause problems in chroot environments.

Aside: Some crappy (usually C) programs that assume opening /dev/urandom will always succeed can return session tokens based only on the contents of nulled or uninitialised memory! Unix really ought to provide a system call for random data.

CUSTOM ALPHABETS

Being able to choose exactly which characters appear in your token is sometimes useful. This set of characters is called the alphabet. The default alphabet size is 62 characters: uppercase latin letters, lowercase latin letters, and digits (a-zA-Z0-9).

For some purposes, base-62 is a sweet spot. It is much more compact than hexadecimal encoding which helps with efficiency because session tokens are usually transfered over the network many times during a session (often uncompressed in HTTP headers).

Also, base-62 tokens don't use "wacky" characters like base-64 encodings do. These characters sometimes cause encoding/escaping problems (ie when embedded in URLs) and are annoying because often you can't select tokens by double-clicking on them.

Although the default is base-62, there are all kinds of reasons you might like to use another alphabet. One example is if your users are reading tokens from a print-out or SMS or whatever, you may choose to omit characters like o, O, and 0 that can easily be confused.

To set a custom alphabet, just pass in either a string or an array of characters to the alphabet parameter of the constructor:

    Session::Token->new(alphabet => '01')->get;
    Session::Token->new(alphabet => ['0', '1'])->get; # same thing
    Session::Token->new(alphabet => ['a'..'z'])->get; # character range

ENTROPY

There are two ways to specify the length of tokens. The first is directly in terms of characters:

    print Session::Token->new(length => 5)->get;
    ## -> wpLH4

The second way is to specify their minimum entropy in terms of bits:

    print Session::Token->new(entropy => 24)->get;
    ## -> Fo5SX

In the above example, the resulting token is guaranteed to have at least 24 bits of entropy. Given the default base-62 alphabet, we can compute the exact entropy of a 5 character token as follows:

    $ perl -E 'say 5 * log(62)/log(2)'
    29.7709815519344

So these tokens have about 29.8 bits of entropy. Note that if we removed one character from this token, it would bring it below our desired 24 bits of entropy:

    $ perl -E 'say 4 * log(62)/log(2)'
    23.8167852415475

The default minimum entropy is 128 bits. Default tokens are 22 characters long and therefore have about 131 bits of entropy:

    $ perl -E 'say 22 * log(62)/log(2)'
    130.992318828511

An interesting observation is that 128-bit base-64 tokens also require 22 characters and these tokens contain only 1 more bit of entropy.

Another Session::Token design criterion is that all tokens should be the same length. The default token length is 22 characters and the tokens are always exactly 22 characters (no more, no less). This is nice because it makes writing matching regular expressions easier, simplifies storage (you never have to store length), and causes various log files and things to line up neatly on your screen. Instead of tokens that are exactly N characters, some libraries that use arbitrary precision arithmetic end up creating tokens of at most N characters.

In summary, the default token length of exactly 22 characters is a consequence of other decisions: base-62 representation, 128 bit minimum token entropy, and consistent token length.

MOD BIAS

Many token generation libraries, especially ones that implement custom alphabets, make the mistake of generating a random value, computing its modulus over the size of an alphabet, and then using this modulus to index into the alphabet to determine an output character.

Why is this bad? Consider the alphabet "abc". An ideal output probability distribution for each character in the token is:

    P(a) = 1/3
    P(b) = 1/3
    P(c) = 1/3

Assume we have a uniform random number source that generates values in the set [0,1,2,3] (most PRNGs provide sequences of bits, in other words power-of-2 set sizes). If we use the naïve modulus algorithm described above, 0 maps to a, 1 maps to b, 2 maps to c, and 3 also maps to a. Instead of the even distribution above, we have the following biased distribution:

    P(a) = 2/4 = 1/2
    P(b) = 1/4
    P(c) = 1/4

Session::Token eliminates this bias in the above case by only using 0, 1, and 2, and throwing away all 3s (also see the t/no-mod-bias.t test).

Of course throwing away a portion of random data is slightly inefficient. In the worst case scenario of an alphabet with 129 characters, for each output byte this module consumes on average 1.9845 bytes from the random number generator. This inefficiency isn't a problem because ISAAC is extremely fast.

Note that if your application issues biased tokens, then some tokens are more likely than other tokens, providing a starting point for token guessing. If the tokens are unbiased, then there is no starting point since all tokens are equally likely.

INTRODUCING BIAS

If your alphabet contains the same character two or more times, this character will be more biased than any characters that only occur once. You should be very careful that your alphabets don't overlap if you are trying to create random session tokens.

However, if you wish to introduce bias this library doesn't try to stop you. (Maybe it should issue a warning?)

    Session::Token->new(alphabet => '0000001', length => 100000)->get; # don't do this
    ## -> 0000000000010000000110000000000000000000000100...

Due to a limitation discussed below, alphabets larger than 256 aren't currently supported so your bias can't get very granular.

Aside: If you have a biased output stream like the above example then you can re-construct an un-biased bit sequence with the von neumann algorithm. This works by comparing pairs of bits. If the bits are identical, they are discarded. Otherwise the order of the different bits is used to determine the output bit, ie 00 and 11 are discarded but 01 and 10 are mapped to output bits of 0 and 1 respectively. This only works if the bias in each bit is constant (like in the above example).

ALPHABET SIZE LIMITATION

Due to a limitation in this module's code, alphabets can't be larger than 256 characters. Everywhere the above manual says "characters" it actually means bytes. This isn't a Unicode limitation per se, just the maximum size of the alphabet. Remember you can easily map bytes to characters with tr.

    use utf8; 
    $z = Session::Token->new(alphabet => '01', length => 10)->get;
    $z =~ tr/01/-λ/;
    ## -> λλ--λλλλ-λ

However, if you wanted to natively support high code points, there is no point in hard-coding a limitation on the size of Unicode or some arbitrary machine word. Instead, arbitrary precision "characters" should be supported with bigint. Here's an example of kinda doing that in lisp: isaac.lisp.

This module is not designed to be the ultimate random number generator and at this time I think changing the design as described above would interfere with its goal of being secure, efficient, and simple.

SEEDING

This module is designed to always seed itself from /dev/urandom or /dev/arandom. You should never need to seed it yourself.

However if you know what you're doing, you can pass in a custom seed as a 1024 byte long string. For example, here is how to create a "null seeded" generator:

    my $gen = Session::Token(seed => "\x00" x 1024);

This is done in the test-suite, but obviously don't do this in regular applications because the generated tokens will always be the same.

One valid reason for seeding is if you have some reason to believe that there isn't enough entropy in your kernel's randomness pool and therefore you don't trust /dev/urandom. In this case you should acquire your own seed data from somewhere trustworthy (maybe /dev/random or a previously stored trusted seed).

BUGS

It might be a good idea if this library could detect forks and re-seed in the child process.

There is currently no way to extract the seed from a Session::Token object. Note when implementing this: The saved seed must either store the current state of the ISAAC round as well as the 1024 byte randsl array or else do some kind of minimum fast forwarding in order to protect against a partially duplicated keystream bug.

SEE ALSO

The Session::Token github repo

There are lots of different modules for generating random data.

Like this module, perl's rand() function implements a PRNG in user-space seeded from /dev/urandom. However, perl rand() is seeded with a mere 4 bytes from /dev/urandom and the perldoc doesn't seem to specify a PRNG algorithm, so I prefer not to use rand() for session tokens.

Data::Token is the first thing I saw when I looked around on CPAN. It has an inflexible and unspecified (?) alphabet. It tries to get its source of unpredictability from UUIDs and then hashes these UUIDs with SHA1. I think this is bad design because some standard UUID formats designed to be unpredictable at all. Knowing a target's MAC address and the rough time the token was issued may help you predict a reduced area of token-space to concentrate guessing attacks upon. I don't know if Data::Token uses these types of UUIDs or the (potentially secure) good random types, but because this wasn't addressed in the documentation and because of an apparent misapplication of hash functions (if you really had a good random UUID type, there would be no need to hash), I don't feel good about using this module.

There are several decent random number generators like Math::Random::Secure, Crypt::URandom &c, but they usually don't implement alphabets and some of them require you open /dev/urandom for every chunk of random bytes. Note that Math::Random::Secure does prevent mod bias for its random integers though.

String::Random is a cool module with a neat regexp-like language for specifying random tokens which is more flexible than alphabets. However, inspecting the code indicates that it uses perl's rand(). Also, the lack of performance, bias, and security discussion in the docs made me decide to not use this otherwise very interesting module.

String::Urandom has alphabets, but it uses the flawed mod algorithm described above and opens /dev/urandom on every token. The docs say "this module was intended to be used as a pseudorandom string generator for less secure applications where response timing may be an issue." What the... ?

Data::Random is also a pretty nice looking library but it seems to use rand() and the docs don't discuss security.

AUTHOR

Doug Hoyte, <doug@hcsw.org>

COPYRIGHT & LICENSE

Copyright 2012 Doug Hoyte.

This module is licensed under the same terms as perl itself.

ISAAC code:

    By Bob Jenkins.  My random number generator, ISAAC.  Public Domain