Author image Alexandru Palade


Archive::Chm - Performs some read-only operations on HTML help (.chm) files. Range of operations includes enumerating contents, extracting contents and getting information about one certain part of the archive.

The module supersedes Text::Chm written by Domenico Delle Side. The method get_filelist() and all it's dependencies are taken nearly "as-is" from Text::Chm as written by Domenico.


 my $test = Archive::Chm->new("TestPrj.chm");

 #make the module log it's activity

 #set the auto-overwrite function to off

 #enumerate the contents of the archive
 $test->enum_files("listing.txt", 1);

 #extract all items in a certain directory

 #extract a single item from the archive
 $item = $test->("/Secret of Monkey Island Solution.html";

 #or just get the length of the item
 $test->get_item_length("/Secret of Monkey Island Solution.html");
 #get complete information about the chm archive
 @content = $test->get_filelist();
 foreach (@content) {
        print $_->{title} . "\n" if defined $_->{title};
        print $_->{path} . "\n";
        print $_->{size} . "\n";

 #p.s. There are ways to check for errors, just look up each method and see. :)


Archive::Chm is a module that provides access to Microsoft Compiled HTML Help files (chm files). A lot of today's software ships with documentation in .chm format. However Microsoft only provides viewing tools for their own OS and the company doesn't disclose the format specification.

Unofficial specs can be found at Matthew T. Russotto's site:

The module is basically a wrapper of Jed Wing's chmlib, a C library that provides access to all ITSS archives, though .chm is the only ITSS type file in use today. To use this module you need chmlib installed on your system. You can get it at:

Currently access to .chm files is read-only and this will change over time if Jed Wing upgrades his library. Supported operations are getting a listing of the contents, extracting one or all items in the archive and retrieving an item's length.


Archive::Chm has various methods, which can be divided into two categories: methods for working with the chm archive and methods that control how the module works (i.e. logging, overwrite).

Archive Handling Methods

These are methods to effectively work with the archive. All operations that can be performed on the archive are contained herein.


 $chmobj = Archive::Chm->new($filename)

Constructor of the Archive::Chm class. It only takes the filename as input and opens the target file, checking for errors. The name of the file is also saved.


 $chmobj->enum_files($out_file, $mode)

Method for enumerating files in the archive. It takes as its input the output file (if NULL then stdout) and the mode. There are two modes currently supported: mode 1 prints all files, including dependencies and mode 2 prints only the base .html files, without their dependencies, like pictures and such.

Return values and meanings are: 0 (All OK!), 1 (file exists, not overwriting due to AUTO_OVERWRITE = 0), 2 (output file cannot be created/overwritten), 3 (unkown error in enumeration API), 4 (unknown mode requested), 5 (no chm archive open).

Note that the method was "successfull" with a return value of 1 as well. The err variable is set to the return value unless that value is 0 or 1.



Method for extracting all files from the .chm archive to a given directory. It returns 0 when all went well, 1 when there was an unkown error in enumeration API and 2 when there is no open archive. The err value is set to the return value unless all went well.


 $html = $chmobj->extract_item($item_path)

Method for retrieving an item, transmitted by it's relative path from the .chm archive's root. It returns a string with the file's contents. If there was an error, returns NULL and sets the error flag and message.


 @contents = $chmobj->get_filelist()

Metod for getting a list of hash references for all elements of the archive. Each hash has a maximum of 3 keys, "title", "path" and "size". They are self-explanatory.


 $length = $chmobj->get_item_length($item_path)

Method for getting a certain item's length. The item is transmitted by it's relative path from the archive's root. The return value is 0 and the error variable set to 1 if the item could not be resolved, otherwise the return value is the actual length of the item.


 $filename = $chmobj->get_name()

Method for getting the filename of the attached .chm file.



Sometimes you may want to close the associated .chm file while letting the Archive::Chm object live on. If you do so, you'll need to open it again by using open_file.



While the file is automatically opened at object creation, if you close it during the object's lifetime, you will need to reopen it using this method. Returns 0 on success, 1 on error.

Control Methods

Methods for module control. It should be noted that the error flag is never reset by the module and should be manually reset after it has been checked. Archive::Chm only sets the error flag when an error occurs.



Gets the current error code if $code = -1, otherwise sets it to $code.



Gets the string containing the error message corresponding to the last error encountered.


 $log_filename = $chmobj->set_logfile();

Method used to get/set the logfile of the module. Notable that this is actually a static data member and as such common for all the Archive::Chm objects.


 $owr = $chmobj->set_overwrite(-1);

Method used to get/set the static AUTO_OVERWRITE flag. Works just like the logfile function above, except that getting the flag requires a value of -1 to be passed.


 $verb = $chmobj->set_verbose(-1)

Yet another function to get/set the VERBOSE flag. Works just like the previous one for AUTO_OVERWRITE.

See Also


HMTL Help specs:

Domenico Delle Side's module, Text::Chm. It is simpler than Archive::Chm, but still offers good support for HTML Help archives, including the very useful get_filelist() method.


Alexandru Palade <>, Netsoft S.R.L.

The Text::Chm functions are the work of Domenico Delle Side <>


Copyright (C) 2005 Alexandru Palade, Netsoft S.R.L.

All rights reserved.