WE_Frontend::Indexer::Htdig - interface to the htdig search engine
use WE_Frontend::Indexer::Htdig; my $results = WE_Frontend::Indexer::Htdig::search(-words => "word");
This is an interface to the htdig search engine. The result of the search function call is a perl hash reference containing the results.
htdig
search
Arguments are:
A string with the words to search. Multiple words are space-separated. This argument is required.
Specify a different htdig configuration file, otherwise the default htdig.conf is used.
htdig.conf
(Optional) Specify a language. The configuration parameter given by conf may contain %{lang} placeholders which are substituted by the value of this argument.
Output some diagnostics to stderr.
Set to a true value if operating on a https server. htdig does not handle SSL, so a parallel http should be setup for the indexing. With the https hack the URLs in the search result list are translated at template display time.
list
The result is a hash reference with the following keys:
Holds an array with the search results. See below.
This variable is set to a true value if the search produces no results. Also detectable by an empty result list.
A list of URLs for the 1 .. 10 result pages.
The corresponding numbers for the pageurllist. Please note that perl/Template arrays start with index 0 (which would be page 1).
Hold the URLs for the previous resp. next result page.
Usually not needed: the number of the previous resp. next result page. In fact you would label them "Prev"/"Next" or "<"/">".
There are more keys. For a complete list refer to the htdig documentation at http://www.htdig.org, htsearch, Templates. Note that the original template variable names are converted to lowercase.
htsearch
The value of list is an array reference with the matches. Each match is a hash reference with the following keys:
The URL of the page. See also the -httpshack option above.
-httpshack
The title of the page, as specified by the <title> html tag.
The first lines of text in the document.
The date and time the document was last modified. See also the documentation of the iso_8601 config variable in htdig.conf.
iso_8601
The complete list is also in the htdig documentation at http://www.htdig.org, htsearch, Templates.
It is best to just use the original conf/htdig.tpl.conf file found in the webeditor distribution. The indexing program in webeditor will use the template file and fill it with the configuration found in WEsiteinfo. Please look also into htdig.txt in the webeditor/doc directory for a first-time installation/configuration.
conf/htdig.tpl.conf
WEsiteinfo
To override the searchindexer path (default is "rundig" without a path):
$searchengine->searchindexer("/usr/local/bin/rundig");
To set the template htdig and target htdig configuration files (these settings are highly recommended):
$searchengine->htdigconftemplate($paths->uprootdir . "/conf/htdig.tpl.conf"); $searchengine->htdigconf($paths->uprootdir . "/conf/htdig.%{lang}.conf");
where $paths is the WEsiteinfo::Paths object documented in WE_Frontend::Info. If the configuration file should not be language dependent, then use
$paths
$searchengine->htdigconf($paths->uprootdir . "/conf/htdig.conf");
instead.
If you decide to make your own htdig.conf, put at least the following lines into the configuration file:
template_map: Long long ${common_dir}/long.html \ Short short ${common_dir}/short.html \ Perl perl ${common_dir}/perl/match.pl template_name: perl search_results_header: ${common_dir}/perl/header.pl search_results_footer: ${common_dir}/perl/footer.pl nothing_found_file: ${common_dir}/perl/nomatch.pl
${common_dir}/perl should be a link to the directory .../lib/WE_Frontend/Indexer/htdig_common.
${common_dir}/perl
.../lib/WE_Frontend/Indexer/htdig_common
htdig is available e.g. from this location: http://www.htdig.org/files/snapshots/htdig-3.2.0b5-20040404.tar.gz.
To compile and install htdig from scratch, the following configure line could be used to create a path layout similar to the RedHat one:
sh configure --prefix=/usr --with-search-dir=/usr/share/htdig --with-image-dir=/usr/share/htdig --with-cgi-bin-dir=/usr/bin --with-config-dir=/etc --with-database-dir=/usr/share/htdig
Many. Mind the permissions. Especially, rundig may use the default database directory (/usr/local/share/htdig/database or such) as the temporary directory for sorting, which will fail if the apache user (usually nobody or www) has no permissions to write to this directory. In this case change the TMPDIR definition in rundir or set appropriate write permissions.
/usr/local/share/htdig/database
nobody
www
TMPDIR
Slaven Rezic - slaven@rezic.de
htdig(1).
To install WE::DB, copy and paste the appropriate command in to your terminal.
cpanm
cpanm WE::DB
CPAN shell
perl -MCPAN -e shell install WE::DB
For more information on module installation, please visit the detailed CPAN module installation guide.