NAME
WebCache::Digest - a Cache Digest implementation in Perl
SYNOPSIS
use WebCache::Digest;
# fetching a digest via HTTP
$d = new WebCache::Digest;
$d->fetch("flibbertigibbet.swedish-chef.org", 3128);
# dump header fields out for info
print STDERR $d->dump_header();
# saving a digest
$d->save("flib");
# loading a digest
$e = new WebCache::Digest;
$e->load("flib");
# creating a new digests
$f = new WebCache::Digest;
$f->create; # defaults to a digest with 500 URL capacity
# registering a URL and method in the digest
$f->register("get", "http://www.kha0s.org/">;
if ($f->lookup("get", "http://www.kha0s.org/">) {
print "hit!\n";
}
# access to raw header and digest contents
print "header: " . unpack("H*", $f->header) . "\n";
print "digest: " . unpack("H*", $f->digest) . "\n";
# access to digest header block elements
print "Current version: " . $f->current_version . "\n";
print "Required version: " . $f->required_version . "\n";
print "Capacity: " . $f->capacity . "\n";
print "Count: " . $f->count . "\n";
print "Deletion count: " . $f->del_count . "\n";
print "Size in bytes: " . $f->size_in_bytes . "\n";
print "Bits per entry: " . $f->bits_per_entry . "\n";
DESCRIPTION
This Perl module implements version 5 of the Cache Digest specification. For more information about Cache Digests, check out the Squid FAQ:
http://squid.nlanr.net/Squid/FAQ/FAQ-16.html
A copy of the specification is included with this distribution as the file cache-digest-v5.txt.
This code has been benchmarked on a 400MHz PII running Linux at 1866 lookups per second (or 112000 per minute, 560000 in five minutes), with a cache digest of 500000 URLs.
Cache Digests are summaries of the contents of WWW cache servers, which are made available to other WWW caches and may also be used internally within the server which generates them. They allow a WWW cache server such as Squid to determine whether or not a particular Internet resource (designated by its URL and the HTTP method which is being used to fetch it) was cached at the time the digest was generated. Unlike other mechanisms such as the Internet Cache Protocol (ICP - see RFC 2186), Cache Digests do not generate a continuous stream of request/response pairs, and do not add latency to each URL which is looked up.
Since we provide routines to both lookup URLs in Cache Digests and also register them in Cache Digests, it should be trivial to use this code to devise innovative applications which take advantage of Cache Digests to fool genuine WWW caches into treating them like WWW caches. For example, mirror servers could register all the URLs which they're aware of for the resources they mirror, so that cache servers which peer with them will always get a cache 'hit' on the mirror server for any reference to any of the mirrored resources.
We also provide methods to store Cache Digests to disk and load them back in again, in addition to creating new Digests and fetching them from WWW caches which support the protocol. This can be used to take a 'snapshot' of the state of a WWW cache at any particular point in time, or for saving state if building a Cache Digest powered server.
METHODS
We only describe public methods and method arguments here. Anything else should be considered private, at least for now.
- new
-
Constructor function, creates a new WebCache::Digest object. As yet this takes no arguments.
- create
-
Fills in the data structures for a WebCache::Digest object, given the number of lots to make available for URLs via the capacity parameter.
- fetch
-
Tries to fetch a Cache Digest from the machine whose domain name (or IP address) and port number are specified (in this order) in the first and second parameters. Returns 0 on failure, and 1 on success.
- dump_header
-
Dump the fields in the Cache Digest header out as plain text to, e.g. for debugging purposes.
- save
-
Saves the WebCache::Digest object to the filename supplied as its parameter. Returns 0 on failure, 1 on success.
- load
-
Populates the WebCache::Digest object with contents of the filename supplied as its parameter. Returns 0 on failure, 1 on success.
- lookup
-
Given an HTTP method and URL (in that order) as parameters, try to look them up in the Cache Digest. Returns 1 if the URL is a Cache Digest hit, or 0 otherwise.
- register
-
Given an HTTP method and URL (in that order) as parameters, register them in the Cache Digest.
- header
-
The raw Cache Digest header.
- digest
-
The raw Cache Digest object, sans header.
- current_version
-
The current version number from the Digest object.
- required_version
-
The required version number from the Digest object. Implementations should support at least this version for interoperability.
- capacity
-
The number of 'slots' for URLs in the Digest.
- count
-
The number of slots which have been filled.
- del_count
-
The number of deletion attempts - Squid doesn't currently delete any URLs (e.g. on their becoming stale) but simply discards them the next time the Digest is rebuilt. Deleting one URL's information without affecting others is impossible with Cache Digests as currently conceived - where a given bit of the Digest may be used in looking up multiple URLs.
- size_in_bytes
-
The size of the Digest in bytes when stored in transfer format.
- bits_per_entry
-
The number of bits in the Cache Digest consumed for each entry.
- hash_func_count
-
The number of times the Cache Digest hash function (see the specification for more information on this) is called for each URL.
BUGS
This is a first release, and there are probably lots of hideous bugs waiting to catch you out - consider it pre-alpha code!
Something else to watch out for is that the name may well change - depending on feedback from the comp.lang.perl.modules Usenet conference.
We use far too much memory - a more efficient approach to processing the Cache Digest needs to be hacked in.
We should be consistent and always use one form of arguments - either hash array or fixed position arguments. Mixing the two is confusing. However... most methods don't need named (hash array) arguments, except for create.
COPYRIGHT
Copyright (c) 1999, Martin Hamilton. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
It was developed by the JANET Web Cache Service, which is funded by the Joint Information Systems Committee (JISC) of the UK Higher Education Funding Councils.
AUTHOR
Martin Hamilton <martinh@gnu.org>