The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

CHI -- Unified cache interface

SYNOPSIS

    use CHI;

    # Choose a standard driver
    #
    my $cache = CHI->new( driver => 'Memory' );
    my $cache = CHI->new( driver => 'File', cache_root => '/path/to/root' );
    my $cache = CHI->new(
        driver     => 'FastMmap',
        root_dir   => '/path/to/root',
        cache_size => '1k'
    );
    my $cache = CHI->new(
        driver  => 'Memcached',
        servers => [ "10.0.0.15:11211", "10.0.0.15:11212" ]
    );
    my $cache = CHI->new(
        driver => 'Multilevel',
        subcaches => [
            { driver => 'Memory' },
            {
                driver  => 'Memcached',
                servers => [ "10.0.0.15:11211", "10.0.0.15:11212" ]
            }
        ],
    );

    # Create your own driver
    # 
    my $cache = CHI->new( driver_class => 'My::Special::Driver' );

    # (These drivers coming soon...)
    #
    my $cache = CHI->new( driver => 'DBI', dbh => $dbh, table => 'app_cache' );
    my $cache = CHI->new( driver => 'BerkeleyDB', root_dir => '/path/to/root' );

    # Basic cache operations
    #
    my $customer = $cache->get($name);
    if ( !defined $customer ) {
        $customer = get_customer_from_db($name);
        $cache->set( $name, $customer, "10 minutes" );
    }
    $cache->remove($name);

DESCRIPTION

CHI provides a unified caching API, designed to assist a developer in persisting data for a specified period of time.

The CHI interface is implemented by driver classes that support fetching, storing and clearing of data. Driver classes exist or will exist for the gamut of storage backends available to Perl, such as memory, plain files, memory mapped files, memcached, and DBI.

CHI is intended as an evolution of DeWitt Clinton's Cache::Cache package, adhering to the basic Cache API but adding new features and addressing limitations in the Cache::Cache implementation.

CONSTRUCTOR

To create a new cache handle, call CHI->new. It takes the following common options.

driver [STRING]

The name of a standard driver to drive the cache, for example "Memory" or "File". CHI will prefix the string with "CHI::Driver::".

driver_class [STRING]

The exact CHI::Driver subclass to drive the cache, for example "My::Memory::Driver".

namespace [STRING]

Identifies a namespace that all cache entries for this object will be in. This allows easy separation of multiple, distinct caches without worrying about key collision.

Suggestions for easy namespace selection:

  • In a class, use the class name:

        CHI->new(namespace => __PACKAGE__, ...);
  • In a script, use the script's absolute path name:

        use Cwd qw(realpath);
        CHI->new(namespace => realpath($0), ...);
  • In a web template, use the template name. For example, in Mason, $m-e<gt>cache will set the namespace to the current component path.

Defaults to 'Default' if not specified.

expires_in [DURATION]
expires_at [NUM]
expires_variance [FLOAT]

Provide default values for the corresponding "set" options.

on_get_error [STRING|CODEREF]
on_set_error [STRING|CODEREF]

How to handle runtime errors occurring during cache gets and cache sets, which may or may not be considered fatal in your application. Options are:

  • log (the default) - log an error using the currently set logger, or ignore if no logger is set - see "LOGGING"

  • ignore - do nothing

  • warn - call warn() with an appropriate message

  • die - call die() with an appropriate message

  • coderef - call this code reference with three arguments: an appropriate message, the key, and the original raw error message

Some drivers will take additional constructor options. For example, the File driver takes root_dir and depth options.

INSTANCE METHODS

The following methods can be called on any cache handle returned from CHI->new(). They are implemented in the Cache::Driver package.

Getting and setting

get( $key, [option => value, ...] )

Returns the data associated with $key. If $key does not exist or has expired, returns undef. Expired items are not automatically removed and may be examined with "get_object" or "get_expires_at".

$key may be followed by one or more name/value parameters:

expire_if [CODEREF]

If $key exists and has not expired, call code reference with the CHI::CacheObject as a single parameter. If code returns a true value, expire the data. For example, to expire the cache if $file has changed since the value was computed:

    $cache->get('foo', expire_if => sub { $_[0]->created_at < (stat($file))[9] });
busy_lock [DURATION]

If the value has expired, set its expiration time to the current time plus the specified duration before returning undef. This is used to prevent multiple processes from recomputing the same expensive value simultaneously. The problem with this technique is that it doubles the number of writes performed - see "expires_variance" for another technique.

set( $key, $data, [$expires_in | "now" | "never" | options] )

Associates $data with $key in the cache, overwriting any existing entry.

The third argument to set is optional, and may be either a scalar or a hash reference. If it is a scalar, it may be the string "now", the string "never", or else a duration treated as an expires_in value described below. If it is a hash reference, it may contain one or more of the following options. Most of these options can be provided with defaults in the cache constructor.

expires_in [DURATION]

Amount of time until this data expires, in the form of a duration expressions - e.g. "10 seconds" or "5 minutes".

expires_at [NUM]

The epoch time at which the data expires.

expires_variance [FLOAT]

Controls the variable expiration feature, which allows items to expire a little earlier than the stated expiration time to help prevent cache miss stampedes.

Value is between 0.0 and 1.0, with 0.0 meaning that items expire exactly when specified (feature is disabled), and 1.0 meaning that items might expire anytime from now til the stated expiration time. The default is 0.0. A setting of 0.10 to 0.25 would introduce a small amount of variation without interfering too much with intended expiration times.

The probability of expiration increases as a function of how far along we are in the potential expiration window, with the probability being near 0 at the beginning of the window and approaching 1 at the end.

For example, in all of the following cases, an item might be considered expired any time between 15 and 20 minutes, with about a 20% chance at 16 minutes, a 40% chance at 17 minutes, and a 100% chance at 20 minutes.

    my $cache = CHI->new ( ..., expires_variance => 0.25, ... );
    $cache->set($key, $value, '20 min');
    $cache->set($key, $value, { expires_at => time() + 20*60 });

    my $cache = CHI->new ( ... );
    $cache->set($key, $value, { expires_in => '20 min', expires_variance => 0.25 });

CHI will make a new probabilistic choice every time it needs to know whether an item has expired (i.e. it does not save the results of its determination), so you can get situations like this:

    my $value = $cache->get($key);     # returns undef (indicating expired)
    my $value = $cache->get($key);     # returns valid value this time!

    if ($cache->is_valid($key))        # returns undef (indicating expired)
    if ($cache->is_valid($key))        # returns true this time!

Typical applications won't be affected by this, since the object is recomputed as soon as it is determined to be expired. But it's something to be aware of.

compute( $key, $code, $set_options )

Combines the get and set operations in a single call. Attempts to get $key; if successful, returns the value. Otherwise, calls $code and uses the return value as the new value for $key, which is then returned. $set_options is a scalar or hash reference, used as the third argument to set.

This method will eventually support the ability to recompute a value in the background just before it actually expires, so that users are not impacted by recompute time.

Removing and expiring

remove( $key )

Remove the data associated with the $key from the cache.

expire( $key )

If $key exists, expire it by setting its expiration time into the past.

expire_if ( $key, $code )

If $key exists, call code reference $code with the CHI::CacheObject as a single parameter. If $code returns a true value, expire the data. e.g.

    $cache->expire_if('foo', sub { $_[0]->created_at < (stat($file))[9] });

Inspecting keys

is_valid( $key )

Returns a boolean indicating whether $key exists in the cache and has not expired. Note: Expiration may be determined probabilistically if "expires_variance" was used.

exists_and_is_expired( $key )

Returns a boolean indicating whether $key exists in the cache and has expired. Note: Expiration may be determined probabilistically if "expires_variance" was used.

get_expires_at( $key )

Returns the epoch time at which $key definitively expires. Returns undef if the key does not exist or it has no expiration time.

get_object( $key )

Returns a CHI::CacheObject object containing data about the entry associated with $key, or undef if no such key exists. The object will be returned even if the entry has expired, as long as it has not been removed.

Namespace operations

clear( )

Remove all entries from the namespace.

get_keys( )

Returns a list of keys in the namespace. This may or may not include expired keys, depending on the driver.

is_empty( )

Returns a boolean indicating whether the namespace is empty, based on get_keys().

purge( )

Remove all entries that have expired from the namespace associated with this cache instance. Warning: May be very inefficient, depending on the number of keys and the driver.

get_namespaces( )

Returns a list of namespaces associated with the cache. This may or may not include empty namespaces, depending on the driver.

Multiple key/value operations

The methods in this section process multiple keys and/or values at once. By default these are implemented with the obvious map operations, but some cache drivers (e.g. Cache::Memcached) can override them with more efficient implementations.

get_multi_arrayref( $keys )

Get the keys in list reference $keys, and return a list reference of the same length with corresponding values or undefs.

get_multi_hashref( $keys )

Like "get_multi_arrayref", but returns a hash reference with each key in $keys mapping to its corresponding value or undef.

set_multi( $key_values, $set_options )

Set the multiple keys and values provided in hash reference $key_values. $set_options is a scalar or hash reference, used as the third argument to set.

remove_multi( $keys )

Removes the keys in list reference $keys.

dump_as_hash( )

Returns a hash reference containing all the non-expired keys and values in the cache.

Property accessors

There is a read-only accessor for namespace, and read/write accessors for expires_in, expires_at, expires_variance, on_get_error, and on_set_error.

DURATION EXPRESSIONS

Duration expressions, which appear in the "set" command and various other parts of the API, are parsed by Time::Duration::Parse. A duration is either a plain number, which is treated like a number of seconds, or a number and a string representing time units where the string is one of:

    s second seconds sec secs
    m minute minutes min mins
    h hr hour hours
    d day days
    w week weeks
    M month months
    y year years

e.g. the following are all valid duration expressions:

    25
    3s
    5 seconds
    1 minute and ten seconds
    1 hour

AVAILABILITY OF DRIVERS

The following drivers are currently available as part of this distribution. The bundling within a single distribution is a temporary convenience during the initial phase of (possibly) rapid changes. Once things have stabilized a bit, all the drivers except for Memory, File, and Multilevel will be moved out to their own CPAN distributions.

DEVELOPING NEW DRIVERS

See CHI::Driver::Development for information on developing new drivers.

LOGGING

If given a logger object, CHI will log events at various levels - for example, a debug log message for every cache get and set. To specify the logger object:

    CHI->logger($logger_object);   # Warning: Temporary API, see below

The object must provide the methods

    debug, info, warning, error, fatal

for logging, and

    is_debug, is_info, is_warning, is_error, is_fatal

for checking whether a message would be logged at that level. This is compatible with Log::Log4perl and Catalyst::Log among others.

Warning: CHI->logger is a temporary API. The intention is to replace this with Log::Any (http://use.perl.org/~jonswar/journal/34366).

RELATION TO OTHER MODULES

Cache::Cache

CHI is intended as an evolution of DeWitt Clinton's Cache::Cache package. It starts with the same basic API (which has proven durable over time) but addresses some implementation shortcomings that cannot be fixed in Cache::Cache due to backward compatibility concerns. In particular:

Performance

Some of Cache::Cache's subclasses (e.g. Cache::FileCache) have been justifiably criticized as inefficient. CHI has been designed from the ground up with performance in mind, both in terms of general overhead and in the built-in driver classes. Method calls are kept to a minimum, data is only serialized when necessary, and metadata such as expiration time is stored in packed binary format alongside the data.

As an example, using Rob Mueller's cacheperl benchmarks, CHI's file driver runs 3 to 4 times faster than Cache::FileCache.

Ease of subclassing

New Cache::Cache subclasses can be tedious to create, due to a lack of code refactoring, the use of non-OO package subroutines, and the separation of "cache" and "backend" classes. With CHI, the goal is to make the creation of new drivers as easy as possible, roughly the same as writing a TIE interface to your data store. Concerns like serialization and expiration options are handled by the driver base class so that individual drivers don't have to worry about them.

Increased compatibility with cache implementations

Probably because of the reasons above, Cache::Cache subclasses were never created for some of the most popular caches available on CPAN, e.g. Cache::FastMmap and Cache::Memcached. CHI's goal is to be able to support these and other caches with a minimum performance overhead and minimum of glue code required.

Cache::Memcached, Cache::FastMmap, etc.

CPAN sports a variety of full-featured standalone cache modules representing particular backends. CHI does not reinvent these but simply wraps them with an appropriate driver. For example, CHI::Driver::Memcached and CHI::Driver::FastMmap are thin layers around Cache::Memcached and Cache::FastMmap.

Of course, because these modules already work on their own, there will be some overlap. Cache::FastMmap, for example, already has code to serialize data and handle expiration times. Here's how CHI resolves these overlaps.

Serialization

CHI handles its own serialization, passing a flat binary string to the underlying cache backend.

Expiration

CHI packs expiration times (as well as other metadata) inside the binary string passed to the underlying cache backend. The backend is unaware of these values; from its point of view the item has no expiration time. Among other things, this means that you can use CHI to examine expired items (e.g. with $cache->get_object) even if this is not supported natively by the backend.

At some point CHI will provide the option of explicitly notifying the backend of the expiration time as well. This might allow the backend to do better storage management, etc., but would prevent CHI from examining expired items.

Naturally, using CHI's FastMmap or Memcached driver will never be as time or storage efficient as simply using Cache::FastMmap or Cache::Memcached. In terms of performance, we've attempted to make the overhead as small as possible, on the order of 5% per get or set (benchmarks coming soon). In terms of storage size, CHI adds about 16 bytes of metadata overhead to each item. How much this matters obviously depends on the typical size of items in your cache.

SUPPORT AND DOCUMENTATION

Questions and feedback are welcome, and should be directed to the perl-cache mailing list:

    http://groups.google.com/group/perl-cache-discuss

Bugs and feature requests will be tracked at RT:

    http://rt.cpan.org/NoAuth/Bugs.html?Dist=CHI

The latest source code is available at:

    http://code.google.com/p/perl-cache/wiki/Source

TODO

  • Perform cache benchmarks comparing both CHI and non-CHI cache implementations

  • Separate Memcached driver into its own CPAN distributions

  • Make serialization method flexible via Data::Serializer

  • Release BerkeleyDB and DBI drivers as separate CPAN distributions

  • Add docs comparing various strategies for reducing miss stampedes and cost of recomputes

  • Add expires_next syntax (e.g. expires_next => 'hour')

  • Support automatic serialization and escaping of keys

  • Create XS versions of main functions in Driver.pm (e.g. get, set)

ACKNOWLEDGMENTS

Thanks to Dewitt Clinton for the original Cache::Cache, to Rob Mueller for the Perl cache benchmarks, and to Perrin Harkins for the discussions that got this going.

CHI was originally designed and developed for the Digital Media group of the Hearst Corporation, a diversified media company based in New York City. Many thanks to Hearst management for agreeing to this open source release.

AUTHOR

Jonathan Swartz

SEE ALSO

Cache::Cache, Cache::Memcached, Cache::FastMmap

COPYRIGHT & LICENSE

Copyright (C) 2007 Jonathan Swartz.

CHI is provided "as is" and without any express or implied warranties, including, without limitation, the implied warranties of merchantibility and fitness for a particular purpose.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.