CHI -- Unified cache interface
use CHI; # Choose a standard driver # my $cache = CHI->new( driver => 'Memory' ); my $cache = CHI->new( driver => 'File', cache_root => '/path/to/root' ); my $cache = CHI->new( driver => 'FastMmap', root_dir => '/path/to/root', cache_size => '1k' ); my $cache = CHI->new( driver => 'Memcached', servers => [ "10.0.0.15:11211", "10.0.0.15:11212" ] ); my $cache = CHI->new( driver => 'Multilevel', subcaches => [ { driver => 'Memory' }, { driver => 'Memcached', servers => [ "10.0.0.15:11211", "10.0.0.15:11212" ] } ], ); # Create your own driver # my $cache = CHI->new( driver_class => 'My::Special::Driver' ); # (These drivers coming soon...) # my $cache = CHI->new( driver => 'DBI', dbh => $dbh, table => 'app_cache' ); my $cache = CHI->new( driver => 'BerkeleyDB', root_dir => '/path/to/root' ); # Basic cache operations # my $customer = $cache->get($name); if ( !defined $customer ) { $customer = get_customer_from_db($name); $cache->set( $name, $customer, "10 minutes" ); } $cache->remove($name);
CHI provides a unified caching API, designed to assist a developer in persisting data for a specified period of time.
The CHI interface is implemented by driver classes that support fetching, storing and clearing of data. Driver classes exist or will exist for the gamut of storage backends available to Perl, such as memory, plain files, memory mapped files, memcached, and DBI.
CHI is intended as an evolution of DeWitt Clinton's Cache::Cache package, adhering to the basic Cache API but adding new features and addressing limitations in the Cache::Cache implementation.
To create a new cache handle, call CHI->new. It takes the following common options.
The name of a standard driver to drive the cache, for example "Memory" or "File". CHI will prefix the string with "CHI::Driver::".
The exact CHI::Driver subclass to drive the cache, for example "My::Memory::Driver".
Identifies a namespace that all cache entries for this object will be in. This allows easy separation of multiple, distinct caches without worrying about key collision.
Suggestions for easy namespace selection:
In a class, use the class name:
CHI->new(namespace => __PACKAGE__, ...);
In a script, use the script's absolute path name:
use Cwd qw(realpath); CHI->new(namespace => realpath($0), ...);
In a web template, use the template name. For example, in Mason, $m-e<gt>cache will set the namespace to the current component path.
Defaults to 'Default' if not specified.
Provide default values for the corresponding "set" options.
How to handle runtime errors occurring during cache gets and cache sets, which may or may not be considered fatal in your application. Options are:
log (the default) - log an error using the currently set logger, or ignore if no logger is set - see "LOGGING"
ignore - do nothing
warn - call warn() with an appropriate message
die - call die() with an appropriate message
coderef - call this code reference with three arguments: an appropriate message, the key, and the original raw error message
Some drivers will take additional constructor options. For example, the File driver takes root_dir and depth options.
root_dir
depth
The following methods can be called on any cache handle returned from CHI->new(). They are implemented in the Cache::Driver package.
Returns the data associated with $key. If $key does not exist or has expired, returns undef. Expired items are not automatically removed and may be examined with "get_object" or "get_expires_at".
$key may be followed by one or more name/value parameters:
If $key exists and has not expired, call code reference with the CHI::CacheObject as a single parameter. If code returns a true value, expire the data. For example, to expire the cache if $file has changed since the value was computed:
$cache->get('foo', expire_if => sub { $_[0]->created_at < (stat($file))[9] });
If the value has expired, set its expiration time to the current time plus the specified duration before returning undef. This is used to prevent multiple processes from recomputing the same expensive value simultaneously. The problem with this technique is that it doubles the number of writes performed - see "expires_variance" for another technique.
Associates $data with $key in the cache, overwriting any existing entry.
The third argument to set is optional, and may be either a scalar or a hash reference. If it is a scalar, it may be the string "now", the string "never", or else a duration treated as an expires_in value described below. If it is a hash reference, it may contain one or more of the following options. Most of these options can be provided with defaults in the cache constructor.
set
Amount of time until this data expires, in the form of a duration expressions - e.g. "10 seconds" or "5 minutes".
The epoch time at which the data expires.
Controls the variable expiration feature, which allows items to expire a little earlier than the stated expiration time to help prevent cache miss stampedes.
Value is between 0.0 and 1.0, with 0.0 meaning that items expire exactly when specified (feature is disabled), and 1.0 meaning that items might expire anytime from now til the stated expiration time. The default is 0.0. A setting of 0.10 to 0.25 would introduce a small amount of variation without interfering too much with intended expiration times.
The probability of expiration increases as a function of how far along we are in the potential expiration window, with the probability being near 0 at the beginning of the window and approaching 1 at the end.
For example, in all of the following cases, an item might be considered expired any time between 15 and 20 minutes, with about a 20% chance at 16 minutes, a 40% chance at 17 minutes, and a 100% chance at 20 minutes.
my $cache = CHI->new ( ..., expires_variance => 0.25, ... ); $cache->set($key, $value, '20 min'); $cache->set($key, $value, { expires_at => time() + 20*60 }); my $cache = CHI->new ( ... ); $cache->set($key, $value, { expires_in => '20 min', expires_variance => 0.25 });
CHI will make a new probabilistic choice every time it needs to know whether an item has expired (i.e. it does not save the results of its determination), so you can get situations like this:
my $value = $cache->get($key); # returns undef (indicating expired) my $value = $cache->get($key); # returns valid value this time! if ($cache->is_valid($key)) # returns undef (indicating expired) if ($cache->is_valid($key)) # returns true this time!
Typical applications won't be affected by this, since the object is recomputed as soon as it is determined to be expired. But it's something to be aware of.
Combines the get and set operations in a single call. Attempts to get $key; if successful, returns the value. Otherwise, calls $code and uses the return value as the new value for $key, which is then returned. $set_options is a scalar or hash reference, used as the third argument to set.
get
This method will eventually support the ability to recompute a value in the background just before it actually expires, so that users are not impacted by recompute time.
Remove the data associated with the $key from the cache.
If $key exists, expire it by setting its expiration time into the past.
If $key exists, call code reference $code with the CHI::CacheObject as a single parameter. If $code returns a true value, expire the data. e.g.
$cache->expire_if('foo', sub { $_[0]->created_at < (stat($file))[9] });
Returns a boolean indicating whether $key exists in the cache and has not expired. Note: Expiration may be determined probabilistically if "expires_variance" was used.
Returns a boolean indicating whether $key exists in the cache and has expired. Note: Expiration may be determined probabilistically if "expires_variance" was used.
Returns the epoch time at which $key definitively expires. Returns undef if the key does not exist or it has no expiration time.
Returns a CHI::CacheObject object containing data about the entry associated with $key, or undef if no such key exists. The object will be returned even if the entry has expired, as long as it has not been removed.
Remove all entries from the namespace.
Returns a list of keys in the namespace. This may or may not include expired keys, depending on the driver.
Returns a boolean indicating whether the namespace is empty, based on get_keys().
Remove all entries that have expired from the namespace associated with this cache instance. Warning: May be very inefficient, depending on the number of keys and the driver.
Returns a list of namespaces associated with the cache. This may or may not include empty namespaces, depending on the driver.
The methods in this section process multiple keys and/or values at once. By default these are implemented with the obvious map operations, but some cache drivers (e.g. Cache::Memcached) can override them with more efficient implementations.
Get the keys in list reference $keys, and return a list reference of the same length with corresponding values or undefs.
Like "get_multi_arrayref", but returns a hash reference with each key in $keys mapping to its corresponding value or undef.
Set the multiple keys and values provided in hash reference $key_values. $set_options is a scalar or hash reference, used as the third argument to set.
Removes the keys in list reference $keys.
Returns a hash reference containing all the non-expired keys and values in the cache.
There is a read-only accessor for namespace, and read/write accessors for expires_in, expires_at, expires_variance, on_get_error, and on_set_error.
namespace
expires_in
expires_at
expires_variance
on_get_error
on_set_error
Duration expressions, which appear in the "set" command and various other parts of the API, are parsed by Time::Duration::Parse. A duration is either a plain number, which is treated like a number of seconds, or a number and a string representing time units where the string is one of:
s second seconds sec secs m minute minutes min mins h hr hour hours d day days w week weeks M month months y year years
e.g. the following are all valid duration expressions:
25 3s 5 seconds 1 minute and ten seconds 1 hour
The following drivers are currently available as part of this distribution. The bundling within a single distribution is a temporary convenience during the initial phase of (possibly) rapid changes. Once things have stabilized a bit, all the drivers except for Memory, File, and Multilevel will be moved out to their own CPAN distributions.
CHI::Driver::Memory - In-process memory based cache
CHI::Driver::File - File-based cache using one file per entry in a multi-level directory structure
CHI::Driver::FastMmap - Shared memory interprocess cache via mmap'ed files
CHI::Driver::Memcached - Distributed cache via memcached (memory cache daemon)
CHI::Driver::Multilevel - Cache formed from several subcaches chained together
CHI::Driver::CacheCache - CHI wrapper for Cache::Cache
See CHI::Driver::Development for information on developing new drivers.
If given a logger object, CHI will log events at various levels - for example, a debug log message for every cache get and set. To specify the logger object:
CHI->logger($logger_object); # Warning: Temporary API, see below
The object must provide the methods
debug, info, warning, error, fatal
for logging, and
is_debug, is_info, is_warning, is_error, is_fatal
for checking whether a message would be logged at that level. This is compatible with Log::Log4perl and Catalyst::Log among others.
Warning: CHI->logger is a temporary API. The intention is to replace this with Log::Any (http://use.perl.org/~jonswar/journal/34366).
CHI is intended as an evolution of DeWitt Clinton's Cache::Cache package. It starts with the same basic API (which has proven durable over time) but addresses some implementation shortcomings that cannot be fixed in Cache::Cache due to backward compatibility concerns. In particular:
Some of Cache::Cache's subclasses (e.g. Cache::FileCache) have been justifiably criticized as inefficient. CHI has been designed from the ground up with performance in mind, both in terms of general overhead and in the built-in driver classes. Method calls are kept to a minimum, data is only serialized when necessary, and metadata such as expiration time is stored in packed binary format alongside the data.
As an example, using Rob Mueller's cacheperl benchmarks, CHI's file driver runs 3 to 4 times faster than Cache::FileCache.
New Cache::Cache subclasses can be tedious to create, due to a lack of code refactoring, the use of non-OO package subroutines, and the separation of "cache" and "backend" classes. With CHI, the goal is to make the creation of new drivers as easy as possible, roughly the same as writing a TIE interface to your data store. Concerns like serialization and expiration options are handled by the driver base class so that individual drivers don't have to worry about them.
Probably because of the reasons above, Cache::Cache subclasses were never created for some of the most popular caches available on CPAN, e.g. Cache::FastMmap and Cache::Memcached. CHI's goal is to be able to support these and other caches with a minimum performance overhead and minimum of glue code required.
CPAN sports a variety of full-featured standalone cache modules representing particular backends. CHI does not reinvent these but simply wraps them with an appropriate driver. For example, CHI::Driver::Memcached and CHI::Driver::FastMmap are thin layers around Cache::Memcached and Cache::FastMmap.
Of course, because these modules already work on their own, there will be some overlap. Cache::FastMmap, for example, already has code to serialize data and handle expiration times. Here's how CHI resolves these overlaps.
CHI handles its own serialization, passing a flat binary string to the underlying cache backend.
CHI packs expiration times (as well as other metadata) inside the binary string passed to the underlying cache backend. The backend is unaware of these values; from its point of view the item has no expiration time. Among other things, this means that you can use CHI to examine expired items (e.g. with $cache->get_object) even if this is not supported natively by the backend.
At some point CHI will provide the option of explicitly notifying the backend of the expiration time as well. This might allow the backend to do better storage management, etc., but would prevent CHI from examining expired items.
Naturally, using CHI's FastMmap or Memcached driver will never be as time or storage efficient as simply using Cache::FastMmap or Cache::Memcached. In terms of performance, we've attempted to make the overhead as small as possible, on the order of 5% per get or set (benchmarks coming soon). In terms of storage size, CHI adds about 16 bytes of metadata overhead to each item. How much this matters obviously depends on the typical size of items in your cache.
Questions and feedback are welcome, and should be directed to the perl-cache mailing list:
http://groups.google.com/group/perl-cache-discuss
Bugs and feature requests will be tracked at RT:
http://rt.cpan.org/NoAuth/Bugs.html?Dist=CHI
The latest source code is available at:
http://code.google.com/p/perl-cache/wiki/Source
Perform cache benchmarks comparing both CHI and non-CHI cache implementations
Separate Memcached driver into its own CPAN distributions
Make serialization method flexible via Data::Serializer
Release BerkeleyDB and DBI drivers as separate CPAN distributions
Add docs comparing various strategies for reducing miss stampedes and cost of recomputes
Add expires_next syntax (e.g. expires_next => 'hour')
Support automatic serialization and escaping of keys
Create XS versions of main functions in Driver.pm (e.g. get, set)
Thanks to Dewitt Clinton for the original Cache::Cache, to Rob Mueller for the Perl cache benchmarks, and to Perrin Harkins for the discussions that got this going.
CHI was originally designed and developed for the Digital Media group of the Hearst Corporation, a diversified media company based in New York City. Many thanks to Hearst management for agreeing to this open source release.
Jonathan Swartz
Cache::Cache, Cache::Memcached, Cache::FastMmap
Copyright (C) 2007 Jonathan Swartz.
CHI is provided "as is" and without any express or implied warranties, including, without limitation, the implied warranties of merchantibility and fitness for a particular purpose.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
To install CHI, copy and paste the appropriate command in to your terminal.
cpanm
cpanm CHI
CPAN shell
perl -MCPAN -e shell install CHI
For more information on module installation, please visit the detailed CPAN module installation guide.