NAME

WebCache::Digest - a Cache Digest implementation in Perl

SYNOPSIS

use WebCache::Digest;

# fetching a digest via HTTP
$d = new WebCache::Digest;
$d->fetch("flibbertigibbet.swedish-chef.org", 3128);

# dump header fields out for info
print STDERR $d->dump_header();

# saving a digest
$d->save("flib");

# loading a digest
$e = new WebCache::Digest;
$e->load("flib");

# creating a new digests
$f = new WebCache::Digest;
$f->create; # defaults to a digest with 500 URL capacity

# registering a URL and method in the digest
$f->register("get", "http://www.kha0s.org/">;
if ($f->lookup("get", "http://www.kha0s.org/">) {
  print "hit!\n";
}

# access to raw header and digest contents
print "header: " . unpack("H*", $f->header) . "\n";
print "digest: " . unpack("H*", $f->digest) . "\n";

# access to digest header block elements
print "Current version:      " . $f->current_version . "\n";
print "Required version:     " . $f->required_version . "\n";
print "Capacity:             " . $f->capacity . "\n";
print "Count:                " . $f->count . "\n";
print "Deletion count:       " . $f->del_count . "\n";
print "Size in bytes:        " . $f->size_in_bytes . "\n";
print "Bits per entry:       " . $f->bits_per_entry . "\n";

DESCRIPTION

This Perl module implements version 5 of the Cache Digest specification. For more information about Cache Digests, check out the Squid FAQ:

http://squid.nlanr.net/Squid/FAQ/FAQ-16.html

A copy of the specification is included with this distribution as the file cache-digest-v5.txt.

This code has been benchmarked on a 400MHz PII running Linux at 1866 lookups per second (or 112000 per minute, 560000 in five minutes), with a cache digest of 500000 URLs.

Cache Digests are summaries of the contents of WWW cache servers, which are made available to other WWW caches and may also be used internally within the server which generates them. They allow a WWW cache server such as Squid to determine whether or not a particular Internet resource (designated by its URL and the HTTP method which is being used to fetch it) was cached at the time the digest was generated. Unlike other mechanisms such as the Internet Cache Protocol (ICP - see RFC 2186), Cache Digests do not generate a continuous stream of request/response pairs, and do not add latency to each URL which is looked up.

Since we provide routines to both lookup URLs in Cache Digests and also register them in Cache Digests, it should be trivial to use this code to devise innovative applications which take advantage of Cache Digests to fool genuine WWW caches into treating them like WWW caches. For example, mirror servers could register all the URLs which they're aware of for the resources they mirror, so that cache servers which peer with them will always get a cache 'hit' on the mirror server for any reference to any of the mirrored resources.

We also provide methods to store Cache Digests to disk and load them back in again, in addition to creating new Digests and fetching them from WWW caches which support the protocol. This can be used to take a 'snapshot' of the state of a WWW cache at any particular point in time, or for saving state if building a Cache Digest powered server.

METHODS

We only describe public methods and method arguments here. Anything else should be considered private, at least for now.

new

Constructor function, creates a new WebCache::Digest object. As yet this takes no arguments.

create

Fills in the data structures for a WebCache::Digest object, given the number of lots to make available for URLs via the capacity parameter.

fetch

Tries to fetch a Cache Digest from the machine whose domain name (or IP address) and port number are specified (in this order) in the first and second parameters. Returns 0 on failure, and 1 on success.

dump_header

Dump the fields in the Cache Digest header out as plain text to, e.g. for debugging purposes.

save

Saves the WebCache::Digest object to the filename supplied as its parameter. Returns 0 on failure, 1 on success.

load

Populates the WebCache::Digest object with contents of the filename supplied as its parameter. Returns 0 on failure, 1 on success.

lookup

Given an HTTP method and URL (in that order) as parameters, try to look them up in the Cache Digest. Returns 1 if the URL is a Cache Digest hit, or 0 otherwise.

register

Given an HTTP method and URL (in that order) as parameters, register them in the Cache Digest.

The raw Cache Digest header.

digest

The raw Cache Digest object, sans header.

current_version

The current version number from the Digest object.

required_version

The required version number from the Digest object. Implementations should support at least this version for interoperability.

capacity

The number of 'slots' for URLs in the Digest.

count

The number of slots which have been filled.

del_count

The number of deletion attempts - Squid doesn't currently delete any URLs (e.g. on their becoming stale) but simply discards them the next time the Digest is rebuilt. Deleting one URL's information without affecting others is impossible with Cache Digests as currently conceived - where a given bit of the Digest may be used in looking up multiple URLs.

size_in_bytes

The size of the Digest in bytes when stored in transfer format.

bits_per_entry

The number of bits in the Cache Digest consumed for each entry.

hash_func_count

The number of times the Cache Digest hash function (see the specification for more information on this) is called for each URL.

BUGS

This is a first release, and there are probably lots of hideous bugs waiting to catch you out - consider it pre-alpha code!

Something else to watch out for is that the name may well change - depending on feedback from the comp.lang.perl.modules Usenet conference.

We use far too much memory - a more efficient approach to processing the Cache Digest needs to be hacked in.

We should be consistent and always use one form of arguments - either hash array or fixed position arguments. Mixing the two is confusing. However... most methods don't need named (hash array) arguments, except for create.

COPYRIGHT

Copyright (c) 1999, Martin Hamilton. All rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

It was developed by the JANET Web Cache Service, which is funded by the Joint Information Systems Committee (JISC) of the UK Higher Education Funding Councils.

AUTHOR

Martin Hamilton <martinh@gnu.org>