Author image Christoph Halbartschlager


HtDig::Site - Perl extension for managing a single ht://Dig configuration


  use HtDig::Site;
  $site = new HtDig::Site(conf_path=>"/opt/www/conf/htdig.conf");
  $site->setting("maintainer", "");


HtDig::Site provides an object for manipulating configuration files for ht://Dig, a popular open source content indexing and searching system. The Site object allows you to open a configuration file, modify settings in it, including custom settings that don't directly relate to ht://Dig executibles, and also allows you to perform database operations such as site index runs, database merges, and fuzzy index creation.


  $site = new HtDig::Site(conf_path=>"/opt/www/conf/htdig.conf", trace_lvl=>1, site_name=>"Default");

new creates a new Site object and returns it. The only required parameter is conf_path. auto_create allows the object to create the file specified in conf_path if it doesn't already exist. trace_lvl is mainly for debugging problems, and site_name is really only meant to be used by the Config object, which provides you with named access to registered configuration files. Wouldn't you rather be able to use the name "My Site", instead of "/opt/www/conf/htdig.conf"? That's what Config does for you. But there's nothing stopping you from naming a Site object yourself when you create it explicitly; the name just won't persist beyond the current session.

  $site->setting("exclude_urls", ["http://localhost/cgi-bin", "http://localhost/images"]);
  @exlude_urls = $site->setting("exclude_urls");

Allows you to modify or retrieve a setting in the configuration file. You must save the file before it will be persisted.

As illustrated in the example, if the datatype of the setting you are attempting to modify is a "string list", you can pass in an array reference. Otherwise, you can pass in a space separated list of values, and the Site object will convert it to an array reference by splitting on the white space. The array reference is for internal representation only, and will be converted to a space separated list when the config file is written to disk.


When a configuration file is first loaded from disk (or saved for the first time), its modification time is stored in memory and will be compared when the save method is called. If you suspect someone might have touched the file on disk and wish to sync up with its current version, you can use the refresh method. Any changes since the last save will be lost.


Saves the in-memory settings to disk. If the optional save_to parameter is provided, the file is written to that path, otherwise, it's written to the original conf_path that was provided when the object was created.


Initiates a site indexing run.


Generates a fuzzy_index of the type specified in the parameter type.

  $site->merge(merge_site=>"/opt/www/conf/othersite.conf",not_words=>1, not_documents=>0, work_files=>1);

Performs a merge using htmerge, merging the configuration file specified in merge_site into the current Site's database. not_words, not_documents, and work_files correspond to the htmerge command line options -w, -d, and -a, respectively.

  my @stock_settings = keys %{$site->datatypes};

Returns a hash that describes ht://Dig configuration file setting datatypes. These are documented at The example uses the hash to get a list of stock settings that htdig recognizes.

The hash structure looks something like this:

   setting_name => 'DATATYPE'

...where setting_name is the name of the configuration file setting, such as "maintainer", and DATATYPE is the documented datatype of the setting. There are four currently documented datatypes: string, string list, number, and boolean.

This feature is probably not very useful for the perl scripter using the Site object, but it is provided just in case some input validation needs to be done, or some options need to be presented to the user. It's a good way to present a list of the stock settings that can appear in a configuration file.

  print $site->errstr . "\n";

Returns the most recently generated error. The Site object doesn't bother with error numbers, since they would be arbitrary and difficult to track. This may change if demand is high enough for error numbers.


  • Timed digs are broken. Needs work.


James Tillman <> CPAN PAUSE ID: jtillman