The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Alzabo::ObjectCache - A simple in-memory cache for row objects.

SYNOPSIS

  use Alzabo::ObjectCache
      ( store => 'Alzabo::ObjectCache::Store::Memory',
        sync  => 'Alzabo::ObjectCache::Sync::BerkeleyDB',
        sync_dbm_file => 'somefile.db' );

DESCRIPTION

This class exists primarily to delegate necessary caching operations to other objects.

It always contains two objects. One is responsible for storing the objects to be cached. This can be done in any way that the storing object sees fit.

The syncing object is responsible for making sure that objects in multiple processes stay in sync with each other, as well as within a single process. For example, if an object in process 1 is deleted and then process 2 attempts to retrieve the same object from the database, process 2 needs to be told (in this case via an exception) that this object is no longer available. Similarly if process 1 updates the database then if there is a cached object in process 2, it needs to know that it should fetch its data again.

IMPORT

This module is configured entirely through the parameters passed when it is imported.

Parameters

  • store => 'Alzabo::ObjectCache::Store::Foo'

    This should be the name of a class that implements the Alzabo::ObjectCache object storing interface.

    The default is Alzabo::ObjectCache::Store::Memory.

  • sync => 'Alzabo::ObjectCache::Sync::Foo'

    This should be the name of a class that implements the Alzabo::ObjectCache object syncing interface.

    Default is Alzabo::ObjectCache::Sync::Null.

  • lru_size => $size

    This is the maximum number of objects you want the storing class to store at once. If it is 0 or undefined, the default, the storage class will store an unlimited number of objects.

All parameters given will be also be passed through to the import method of the storing and syncing class being used.

LRU STORAGE

Any storage module can be turned into an LRU cache by passing an lru_size parameter to this module when using it.

For example:

  use Alzabo::ObjectCache
          ( store => 'Alzabo::ObjectCache::Store::Memory',
            lru_size => 100,
            sync  => 'Alzabo::ObjectCache::Sync::BerkeleyDB',
            sync_dbm_file => 'somefile.db' );

CACHING SCENARIOS

The easiest way to understand how the Alzabo caching system works is to outline different scenarios and show the results based on different caching configurations.

Scenario 1 - Single process - delete followed by select/update

In a single process, the following sequence occurs:

- A row object is retrieved.

- The row object's delete method is called, removing the data it represents from the database.

- The program attempts to call the row object's select or update method.

Results

  • No caching

    An Alzabo::Exception::NoSuchRow exception is thrown.

  • Any syncing module

    An Alzabo::Exception::Cache::Deleted exception is thrown.

Scenario 2 - Multiple processes - delete followed by select

Assume two process, ids 1 and 2.

- Process 1 retrieves a row object.

- Process 2 retrieves a row object for the same database row.

- Process 1 calls that object's delete method.

- Process 2 calls that object's select method.

Results

  • No caching

    An Alzabo::Exception::NoSuchRow exception is thrown.

  • Alzabo::ObjectCache::Sync::Null module is in use

    If the column(s) have been previously retrieved in process 2, then that data will be returned. Otherwise, an Alzabo::Exception::NoSuchRow exception is thrown.

  • Any other syncing module is in use

    An Alzabo::Exception::Cache::Deleted exception is thrown.

Scenario 3 - Multiple processes - delete followed by update

Assume two process, ids 1 and 2.

- Process 1 retrieves a row object.

- Process 2 retrieves a row object for the same database row.

- Process 1 calls that object's delete method.

- Process 2 calls that object's update method.

Results

  • No caching

    An Alzabo::Exception::NoSuchRow exception is thrown.

  • Alzabo::ObjectCache::Sync::Null module is in use

    The object will attempt to update the database. This is a potential disaster if, in the meantime, another row with the same primary key has been inserted.

  • Any other syncing module is in use

    An Alzabo::Exception::Cache::Deleted exception is thrown.

Scenario 4 - Multiple processes - update followed by update

Assume two process, ids 1 and 2.

- Process 1 retrieves a row object.

- Process 2 retrieves a row object for the same database row.

- Process 1 calls that object's update method.

- Process 2 calls that object's update method.

- Process 1 calls that object's select method.

Results

  • No caching

    The data from process 2's update is returned.

  • Alzabo::ObjectCache::Sync::Null module is in use

    The data from process 1's update is returned.

  • Any other syncing module is in use

    An Alzabo::Exception::Cache::Expired exception is thrown when process 2 attempts to update the row. If process 2 were to then attempt the update again it would succeed (as the object is updated before the exception is thrown).

Scenario 5 - Multiple processes - delete followed by insert (same primary key)

Assume two process, ids 1 and 2.

- Process 1 retrieves a row object.

- The row is deleted. In this case, it does not matter whether this happens through Alzabo or not.

- Process 2 inserts a new row, with the same primary key.

- Process 1 or 2 calls that object's select method.

Results

  • All cases.

    The correct data (from process 2's insert) is returned. This is a bit odd if process 1 called the object's delete method, but in that case it shouldn't be reusing the same object anyway.

This example may seem a bit far-fetched but is actually quite likely when using MySQL's auto_increment feature with older versions of MySQL, where numbers could be re-used.

Summary

The most important thing to take from this is that you should never use the Alzabo::ObjectCache::Sync::Null class in a multi-process situation. It is really only safe if you are sure your code will only be running in a single process at a time.

In all other cases, either use no caching or use one of the other syncing classes to ensure that data really is synced across multiple processes.

RACE CONDITIONS

It is important to note that there are small race conditions in the syncing scheme. When data is requested from a row object, the row object first makes sure that it is up to date with the database. If it is not, it refreshes itself. Then, it returns the requested data (whether or or not it had to refresh). It is possible that in the time between checking whether or not it is expired that an update could occur. This would not be seen by the row object.

I don't consider this a bug since it is impossible to work around and is unlikely to be a problem. In a single process, this is not an issue. In a multi-process application, this is the price that is paid for caching.

If this is a problem for your application then you should not use caching.

SYNCING MODULES

The following syncing modules are available with Alzabo:

Alzabo::ObjectCache::Sync::Null

This module simply emulates the syncing interface without doing any actual syncing, though it does track deleted objects. This module is useful is you want to cache objects in a single process but you don't need the overhead of real syncing.

Alzabo::ObjectCache::Sync::BerkeleyDB

Alzabo::ObjectCache::Sync::SDBM_File

Alzabo::ObjectCache::Sync::DB_File

These three modules all use DBM files, via the relevant module, to do multi-process syncing. They are listed in order from fastest to slowest. Using DB_File is significantly slower than either BerkeleyDB or SDBM_File, which are both relatively fast.

They all take the same parameters:

  • sync_dbm_file => $filename

    The file which should be used to store syncing data.

  • clear_on_startup => $boolean

    Indicates whether or not the file should be cleared before it is first used.

Alzabo::ObjectCache::Sync::Mmap

This module uses Cache::Mmap for syncing. It takes the following parameters.

  • sync_mmap_file => $filename

    The file which should be used to store syncing data.

  • clear_on_startup => $boolean

    Indicates whether or not the file should be cleared before it is first used.

Alzabo::ObjectCache::Sync::RDBMS

This module uses an RDBMS to do syncing. This does not need to be the same database as your data is stored in, though it could be.

If the database it is told to use does not contain the table it needs, it will use the Alzabo::Create modules to create it. If you have warnings turned on, this will cause a warning telling you that these modules were loaded, as having them loaded in any sort of persistent process is probably a waste of memory.

The table it stores data in looks like this:

  AlzaboObjectCacheSync
  ----------------------
  object_id       varchar(22)   primary key
  sync_time       varchar(40)

This modules take the following parameters:

  • sync_schema_name => $name

    This should be the name of the schema where you want syncing data to be stored. If it doesn't exist, this module will attempt to create it.

  • sync_rdbms => $name (optional)

    If the schema given does not exist, then this parameter is required so this module knows what type of database it is connecting to.

  • sync_user => $user (optional)

    A username with which to connect to the database.

  • sync_password => $password (optional)

    A password with which to connect to the database.

  • sync_host => $host (optional)

    The host where the database lives.

  • sync_connect_params => { extra_param => 1 }

    Extra connection parameters. These will simply be passed onto the relevant Driver module.

Alzabo::ObjectCache::Sync::IPC

This module is quite slow and is included mostly for historical reasons (it was one of the first syncing modules made). I recommend against using it but if you must it takes the following parameters:

  • clear_on_startup => $boolean

    Indicates whether or not the file should be cleared before it is first used.

STORAGE MODULES

All of the storage modules may be turned into LRU caches by simply passing the lru_size parameter.

The following storage modules are included with Alzabo:

Alzabo::ObjectCache::Store::Null

This module mimics the storage interface without actually storing anything. It is useful if you want to use syncing without any storage.

Alzabo::ObjectCache::Store::Memory

This module simply stored cached objects in memory.

Alzabo::ObjectCache::Store::BerkeleyDB

This module stores serialized cached objects in a DBM file using the BerkeleyDB module.

It takes these parameters:

  • store_dbm_file => $filename

    The file which should be used to store serialized objects.

  • clear_on_startup => $boolean

    Indicates whether or not the file should be cleared before it is first used.

Alzabo::ObjectCache::Store::RDBMS

This module uses an RDBMS to do store. This does not need to be the same database as your data is stored in, though it could be.

For example, if you are using Oracle as your primary RDBMS, caching serialized objects in a MySQL database might be a performance boost.

If the database it is told to use does not contain the table it needs, it will use the Alzabo::Create modules to create it. If you have warnings turned on, this will cause a warning telling you that these modules were loaded, as having them loaded in any sort of persistent process is probably a waste of memory.

The table it stores data in looks like this:

  AlzaboObjectCacheStore
  ----------------------
  object_id       varchar(22)   primary key
  object_data     blob

The actual type of the object_data column will vary depending on what RDBMS you are using.

This modules take the following parameters:

  • store_schema_name => $name

    This should be the name of the schema where you want syncing data to be stored. If it doesn't exist, this module will attempt to create it.

  • store_rdbms => $name (optional)

    If the schema given does not exist, then this parameter is required so this module knows what type of database it is connecting to.

  • store_user => $user (optional)

    A username with which to connect to the database.

  • store_password => $password (optional)

    A password with which to connect to the database.

  • store_host => $host (optional)

    The host where the database lives.

  • store_connect_params => { extra_param => 1 }

    Extra connection parameters. These will simply be passed onto the relevant Driver module.

Alzabo::ObjectCache METHODS

new

Returns

A new Alzabo::ObjectCache object.

fetch_object ($id)

Returns

The specified object if it is in the cache. Otherwise it returns undef.

store_object ($object)

Stores an object in the cache. This will not overwrite an existing object in the cache. To do that you must first call the delete_from_cache method.

is_expired ($object)

Returns

Whether or not the given object is expired.

is_deleted ($object)

Returns

A boolean value indicating whether or not an object has been deleted from the cache.

register_refresh ($object)

Tells the cache system that an object has refreshed its data from the database.

register_change ($object)

Tells the cache system that an object has updated its data in the database.

register_delete ($object)

This tells the cache that the object has been removed from its external data source. This causes the cache to remove the object internally. Future calls to is_deleted for this object will now return true.

delete_from_cache ($object)

This method allows you to remove an object from the cache. This does not register the object as deleted. It is provided solely so that you can call store_object after calling this method and have store_object actually store the new object.

clear

Call this method to completely clear the cache.

MAKING YOUR OWN SUBCLASSES

It is relatively easy to create your own storage or syncing modules by following a fairly simple interface.

Storage Interface

The interface that any object storing module needs to implement is as follows:

new

Returns

A new object.

fetch_object ($id)

Returns

The specified object if it is in the cache. Otherwise it returns undef.

store_object ($object)

Stores an object in the cache but should not overwrite an existing object.

delete_from_cache ($object)

This method deletes an object from the cache.

clear

Completely clears the cache.

Syncing Interface

Any class that implements the syncing interface should inherit from Alzabo::ObjectCache::Sync. This class provides most of the functionality necessary to handle syncing operations.

The interface that any object storing module needs to implement is as follows:

_init

This method will be called when the object is first created.

clear

Clears the process-local sync times (not the times shared between processes).

sync_time ($id)

Returns

Returns the time that the object matching the given id was last refreshed.

update ($id, $time, $overwrite)

This is called to update the state of the syncing object in regards to a particularl object. The first parameter is the object's id. The second is the time that the object was last refreshed. The third parameter tells the syncing object whether or not to preserve an existing time for the object if it already has one.

AUTHOR

Dave Rolsky, <autarch@urth.org>