NAME

Image::Delivery - Efficient transformation and delivery of web images

INTRODUCTION

Many web applications generate or otherwise deliver graphics as part of their interface. Getting the delivery of these images right is tricky, and developers usually need to make trade-offs in order to get a usable mechanism.

Image::Delivery is an extremely sophisticated module for delivering these generated images. It is designed to be powerful, flexible, extensible, scalable, secure, stable and correct, and use a minimum of resources.

DESIGN

Because it can take a little bit of work to set up Image::Delivery, we will start with a quick once-over of the design of the API, and the reasons and use cases that drove it.

Preventing Multiple Server Calls

Use Case 1: CVS Monitor

  The initial idea for Image::Delivery was due to some problems with
  the design of CVS Monitor (L<http://ali.as/devel/cvsmonitor/), an advanced
  but extremely resource-hungry MVC CGI application. Many of the CVS Monitor
  views have a single large graph on them, which involves a second call to the
  server that starts just before the previous call ends. Generating the graph
  took minimal extra effort, but the overhead of starting another process and
  loading another 100meg of data creates a double whammy hit to the server.
  
  What would be ideal would be to generate both at once and have the browser
  get the image without a CGI hit.

The solution to this problem, and the primary mechanism that Image::Delivery implements could be called "Static Delivery via Cached Disk", but is best demonstrated with the diagram outlined in General Structure below.

Use Case 2: Thumbnails

  One problem with thumbnailing is the vast number that need to be generated.
  When done on demand, if generated by the image request, you will have large
  numbers of processes working. The normal solution is to pre-generate the
  thumbnails, potentially polluting image directories.

Image::Delivery stores all images in one central cache, so that the original images are unaffected.

General Structure

    Image Provider
      |
      |BLOB + TransformPath
      |
     \1/
    Image::Delivery
      |           \
      |            |
      |            |
     \2/           |
  Hard Disk        |
  /5\     |        |URI
   |      |        |
   |      |        |
   |     \6/       |
  Web Server       | 
   /4\    |       /
    |     |gzip  /
     \    |     / 
      \  \7/  \3/
      Web Browser

1) Image Data pulled from Object/Provider

An Object, or a Provider that accesses the data from outside the API, generates or obtains the image data and various metadata that describes the image data.

2) Image Written to File-System

Image::Delivery writes the image to the filesystem with a specific file name

3) URI sent to Browser in HTML

Image::Delivery determines the matching URI that points to the location of the written file, and provides it to be used in an img tag in the generated HTML page.

4) Web Browser Requests Image

Having received the HTML, the browser requests the image from the web server.

5) Web Server Finds Image File

The web server receives the image request and finds the file that was written at step 2)

6) Web Server Retrieves Image File

Web server reads the file like any other plain file

7) Web Server Sends File to Browser

Web server sends the file off to the browser

Digest::TransformPath

Image::Delivery works around source objects. Each source object may want to work with more than one image, and each image may need to come in several different versions. In short, there can be lots of variations of images.

To handle this, we utilise (or SHOULD utilise) Digest::TransformPath to help identify the images, with a 10 digit digest built into the filename.

Might as Well Cache Them

Since we went to all that effort to write the file, its relatively easy to add caching. But the most important thing if we are going to cache is to have a good file naming scheme.

Image::Delivery Naming Scheme

In order to make this all work, the naming scheme is critical.

The basic path format is:

  $ROOT/Object.id/checksum.type

Object.id

When an object is updated, it may have any number of Image fields, which may each have any number of scaled/rotated/morphed/derived images. When a source object is updated, some or all of these need to be cleared.

checksum

The checksum calculated from the TransformPath does not describe any of the data, only the data source and modifications to it. This means that it is possible to cheaply test if the image for a particular transform has already been created, without having to access any of the data in the actual images.

type

Because we accept image data in a variety of formats, its not possible to know what image type any given image should be. So when testing we simply check the lot until we find one.

Generally, rather than test 10-15 types, the Provider will inform us of the types to expect. :)

Operation Profile

All of this junk gives the module the following properties

- Intrinsicaly supports all major image types

- No pre-generation of images, generates everything on-the-fly

- Image names are secure and can't be predicted

- All images for any page are processed in one process hit

- Cache checking is extremely quick

- Never touches image source data when not filling the cache

- Handles many images. Storage extendable to support thousands to millions of individual images

- Multiple hosts can work with the same Image cache

- Images can be delivered by a different web server to the application

DESCRIPTION

Image::Delivery is very powerful, but setting it up may take a little bit of work.

Setting up the URI <-> path mapping

First, you need to become aquainted with HTML::Location. This is used as the basis for the mapping between the disc and a URI.

You should also make sure that whatever process will be running will have write permissions to the appropriate directory.

For starters, we would suggest creating the cache directory just under the root of a website, at $ROOT/cache, which will be linked to http://yourwebsite.com/cache/.

This will let you create your HTML::Location.

  # Set up the location of the cache
  my $Location = HTML::Location->new(
      "$ROOT/cache",
      "http://yourwebsite.com/cache"
      );

This gives you the absolute minimum Image::Delivery itself needs to get rolling. With a location to manage, you can then start to fire images at it, and it will store them and hand you back a HTML::Location for the actual file.

  # Create the Image::Delivery object
  my $Delivery = Image::Delivery->new(
        Location => $Location,
        );

However, the tricky bit is probably setting up your Provider class. Although the abstract class implements much of the details and defaults for you, you are probably still going to need to do some work to tie the two together.

STATUS

While the concept and design are fairly well understood and unlikely to change, there is an unfortunate situation with regards to the Cache:: family of modules.

Although originally written to live at Cache::Web and to be a little more general, it was felt by the maintainer that Cache::Web would represent the module as being a full member of the Cache:: family, which it is not.

However, during the first few releases I hope to at least try to move the API of Image::Delivery as close to Cache:: as possible, possibly under a common Cache::Interface class, to gain some potential benefits from code written on top of it.

Until these comments are updated, you should assume that the API may undergo some changes.

METHODS

new %params

The new constructor creates a new Image::Delivery object. It takes a number of required and optional parameters, provided as a set of key/value pairs.

Location

The required Location parameter

Location

The Location method returns the HTML::Location that was used when creating the Image::Delivery.

filename $TransformPath | $Provider

The filename method determines, for a given $TransformPath or $Provider, the file name that the Image should be written to, excluding the file type.

This is the method most likely to be overloaded, so enable a different naming scheme.

exists $TransformPath | $Provider

For a given Digest::TransformPath, or a ::Provider which contains one, check to see the a file exists for it in the cache already.

Returns the HTML::Location of the image if it exists, false if it does not exist, or undef on error.

get $TransformPath | $Provider

The get methods gets the contents of a cached file from the cache, if it exists. You should generally check that the image exists first before trying to get it.

Returns a reference to a SCALAR containing the image data if the image exists. Returns undef if the image does not exist, or some other error occurs.

set $Provider

The set method stores an image in the cache, shortcutting if the image has already been stored.

Returns the HTML::Location of the stored image on success, or undef on error.

clear $TransformPath

The clear method allows you to explicitly delete an image from the cache. This would generally be done for security purposes, as the cache cleaners will generally harvest files directly, rather than going via TransformPaths.

Returns true if the image was removed, or did not exist. Returns undef on error.

TO DO

- Add ability to mask indexes with empty HTML files

- Add cache clearing capabilities

- Add file locking to prevent race conditions in the cache

- Add pluggable cache cleaners

SUPPORT

All bugs should be filed via the bug tracker at

http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Image-Delivery

For other issues, contact the author

AUTHORS

Adam Kennedy <adamk@cpan.org>

COPYRIGHT

Copyright 2004 - 2007 Adam Kennedy.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

The full text of the license can be found in the LICENSE file included with this module.