NAME
Win32::UrlCache - parse Internet Explorer's history/cache/cookies
SYNOPSIS
use Win32::UrlCache;
my $index = Win32::UrlCache->new( 'index.dat' );
foreach my $url ( $index->urls ) {
print $url->url, "\n";
}
Or, you can use callback function if you care memory usage.
use Win32::UrlCache;
my $index = Win32::UrlCache->new( 'index.dat' );
$index->urls( callback => \&callback )
sub callback {
my $entry = shift;
my $url = $entry->url;
$url =~ s/^Visited: //;
$entry->url( $url );
print $entry->url, "\n";
return; # to prevent the entry from being kept in the object
}
If you want to know the title of the cached page (for Win32 only):
use Win32::UrlCache::Cache;
use Win32::UrlCache::Title;
use Encode;
my $cache = Win32::UrlCache::Cache->new;
$cache->urls( callback => \&callback )
sub callback {
my $entry = shift;
print $entry->url, "\n";
my $title = Win32::UrlCache::Title->extract( $entry->filename );
print encode( shiftjis => $title ), "\n\n" if $title;
return;
}
DESCRIPTION
This parses so-called "Client UrlCache MMF Ver 5.2" index.dat files, which are used to store Internet Explorer's history, cache, and cookies. As of writing this, I've only tested on Win2K + IE 6.0, but I hope this also works with some of the other versions of OS/Internet Explorer. However, note that this is not based on the official/public MSDN specification, but on a hack on the web. So, caveat emptor in every sense, especially for the redr entries ;)
Patches and feedbacks are welcome.
METHODS
new
receives a path to an 'index.dat', and parses it to create an object.
urls
returns URL entries in the 'index.dat' file. Each entry has url, filename, headers, filesize, last_modified, last_accessed, and optionally, title accessors (note that some of them would return meaningless values). As of 0.02, it can receive a callback function. See below. As of 0.04, you can also pass ( extract_title => 1 ) to extract title. However, this extraction is processed after a callback. So, if you want both to use a callback and to extract title, you might want to insert extraction code into the callback as shown in the synopsis.
leaks
almost the same as urls, but returns LEAK entries (if any) in the 'index.dat' file.
redrs
returns REDR entries (if any) in the 'index.dat' file. Each entry has a url accessor. As of 0.02, it can receive a callback function.
CALLBACK
Three methods shown above return all the entries found in the index by default, but this may eat lots of memory especially if you use IE as a main browser. As of 0.02, those methods may receive a callback function, which will take an entry for the first (and only, as of writing this) argument. If the callback returns true, the entry will be stored in the ::UrlCache object, and if the callback returns false, the entry will be discarded after the callback is executed.
SEE ALSO
http://www.latenighthacking.com/projects/2003/reIndexDat/
AUTHOR
Kenichi Ishigaki, <ishigaki at cpan.org>
COPYRIGHT AND LICENSE
Copyright (C) 2007 by Kenichi Ishigaki.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.