The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

File::HashCache::JavaScript - Minify and cache javascript files based on the hash of their contents.

SYNOPSIS

  use File::HashCache::JavaScript;

  my $jsh = File::HashCache::JavaScript->new();

  my $hashed_minified_path = $jsh->hash("my_javascript_file.js");
  # returns "my_javascript_file-7f4539486f2f6e65ef02fe9f98e68944.js"

  # If you are using Template::Toolkit you may want something like this:
  $template->process('template.tt2', {
      script => sub {
          my $path = $jsh->hash($_[0]);
          "<script src=\"js/$path\" type=\"text/javascript\"></script>\n";
      } } ) || die $template->error();

  # And in your template.tt2 file:
  #    [% script("myscript.js") %]
  # which will get replaced with something like:
  #    <script src="js/myscript-708b88f899939c4adedc271d9ab9ee66.js"
  #            type="text/javascript"></script>

DESCRIPTION

File::HashCache::JavaScript is an automatic versioning scheme for Javascript based on the hash of the contents of the Javascript files themselves. It aims to be painless for the developer and very fast.

File::HashCache::JavaScript solves the problem in web development where you update some Javascript files on the server and the end user ends up with mismatched versions because of browser or proxy caching issues. By referencing your Javascript files by their MD5 hash, the browser is unable to to give the end user mismatched versions no matter what the caching policy is.

HOW TO USE IT

The best place to use File::HashCache::JavaScript is in your HTML template code. While generating a page to serve to the user, call the hash() method for each Javascript file you are including in your page. The hash() method will return the name of the newly hashed file. You should use this name in the <script> tag of the page.

This means that when the browser gets the page you serve, it will have references to specific versions of Javascript files.

METHODS

new(%options)

Initializes a new cache object. Available options and their defaults:

cache_dir => 'js'

Where to put the resulting minified js files.

minify => 1

Whether or not to minify the Javascript.

cache_file => "$cache_dir/cache.json"

Where to put the cache control file.

hash($path_to_js_file)

This method...

  1. Reads the Javascript file into memory. While reading it understands C style "#include" directives so you can structure the code nicely.

  2. Uses JavaScript::Minifier::XS to minify the resulting code. If the minify option is set to 0 then it doesn't actually minify the code. This is useful for debugging.

  3. Calculates the MD5 hash of the minified code.

  4. Saves the minified code to a cache directory where it is named based on its hash value which makes the name globally unique (it also keeps it's original name as a prefix so debugging is sane).

  5. Keeps track of the original script name, the minified script's globally unique name, and the dependencies used to build the image. This is stored in a hash table and also saved to the disk for future runs.

  6. Returns the name of the minified file that was stored in step 4. This name does not include the cache directory path because its physical file system path does not necessarily relate to its virtual server path.

There's actually a step 0 in there too: If the original Javascript file name is found in the hash table then it quickly stats its saved dependencies to see if they are newer than the saved minified file. If the minified file is up to date then steps 1 through 5 are skipped.

FURTHER DISCUSSION ABOUT THIS TECHNIQUE

It keeps the Javascript files in sync

When the user refreshes the page they will either get the page from their browser cache or they will get it from our site. No matter where it came from the Javascript files it references are now uniquely named so that it is impossible for the files to be out of date from each other.

That is, if you get the old HTML file you will reference all the old named Javascript files and everything will be mutually consistent (even though it is out of date). If you get the new HTML file it guarantees you will have to fetch the latest Javascript files because the new HTML only references the new hashed names that aren't going to be in your browser cache.

It's fast.

Everything is cached so it only does the minification and hash calculations once per file. More importantly the cached dir can be statically served by the web server so it's exactly as fast as it would be if you served the .js files without any preprocessing. All this technique adds is a couple filesystem stats per page load, which isn't much (Linux can do something like a million stats per second).

It's automatic.

If you hook in through Template::Toolkit then there's no script to remember to run when you update the site. When the template generates the HTML, the File::HashCache::JavaScript code lazily takes care of rebuilding any files that may have gone out of date.

It's stateless.

It doesn't rely on incrementing numbers (”js/v10/script.js” or even “js/script-v10.js”). We considered this approach but decided it was actually harder to implement and had no advantages over the way we chose to do it. This may have been colored by our choice of version control systems (we love the current wave of DVCSes) where monotonically increasing version numbers have no meaning.

It allows aggressive caching.

Since the files are named by their contents' hash, you can set the cache time on your web server to be practically infinite.

It's very simple to understand.

It took less than a page of Perl code to implement the whole thing and it worked the first time with no bugs. I believe it's taken me longer to write this than it took to write the code (granted I'd been thinking about it for a long time before I started coding).

No files are deleted.

The old js files are not automatically deleted (why bother, they are tiny) so people with extremely old HTML files will not have inconsistent pages when they reload. However:

The cache directory is volatile.

It's written so we can delete the entire cache dir at any point and it will just recreate what it needs to on the next request. This means there's no setup to do in your app.

You get a bit of history.

Do a quick ls -lrt of the directory and you can see which scripts have been updated recently and in what order they got built.

SEE ALSO

This code was adapted from the code we wrote for our site http://greenfelt.net/. Here is our original blog post talking about the technique: http://blog.greenfelt.net/2009/09/01/caching-javascript-safely/

This code is now a short wrapper around File::HashCache, a more generic (and currently undocumented) library.

COPYRIGHT

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

Copyright (C) 2009 David Caldwell and Jim Radford.

AUTHOR

  • David Caldwell <david@porkrind.org>

  • Jim Radford <radford@blackbean.org>

1 POD Error

The following errors were encountered while parsing the POD:

Around line 176:

Non-ASCII character seen before =encoding in '(”js/v10/script.js”'. Assuming UTF-8