The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Pod::POM::Web::Indexer - fulltext search for Pod::POM::Web

SYNOPSIS

  perl -MPod::POM::Web::Indexer -e index

DESCRIPTION

Adds fulltext search capabilities to the Pod::POM::Web application. This requires Search::Indexer to be installed.

Queries may include plain terms, "exact phrases", '+' or '-' prefixes, boolean operators and parentheses. See Search::QueryParser for details.

METHODS

index

    Pod::POM::Web::Indexer->new->index(%options)

Walks through directories in @INC and indexes all *.pm and *.pod files, skipping shadowed files (files for which a similar loading path was already found in previous @INC directories), and skipping files that are too big.

Default indexing is incremental : files whose modification time has not changed since the last indexing operation will not be indexed again.

Options can be

-max_size

Size limit (in bytes) above which files will not be indexed. The default value is 300K. Files of size above this limit are usually not worth indexing because they only contain big configuration tables (like for example Module::CoreList or Unicode::Charname).

-from_scratch

If true, the previous index is deleted, so all files will be freshly indexed. If false (the default), indexation is incremental, i.e. files whose modification time has not changed will not be re-indexed.

-positions

If true, the indexer will also store word positions in documents, so that it can later answer to "exact phrase" queries.

So if -positions are on, a search for "more than one way" will only return documents which contain that exact sequence of contiguous words; whereas if -positions are off, the query is equivalent to more AND than AND one AND way, i.e. it returns all documents which contain these words anywhere and in any order.

The option is off by default, because it requires much more disk space, and does not seem to be very relevant for searching Perl documentation.

The index function is exported into the main:: namespace if perl is called with the -e flag, so that you can write

  perl -MPod::POM::Web::Indexer -e index

PERFORMANCES

On my machine, indexing a module takes an average of 0.2 seconds, except for some long and complex sources (this is why sources above 300K are ignored by default, see options above). Here are the worst figures (in seconds) :

  Date/Manip            39.655
  DBI                   30.73
  Pod/perlfunc          29.502
  Module/CoreList       27.287
  CGI                   16.922
  Config                13.445
  CPAN                  12.598
  Pod/perlapi           10.906
  CGI/FormBuilder        8.592
  Win32/TieRegistry      7.338
  Spreadsheet/WriteExcel 7.132
  Pod/perldiag           5.771
  Parse/RecDescent       5.405
  Bit/Vector             4.768

The index will be stored in an index subdirectory under the module installation directory. The total index size should be around 10MB if -positions are off, and between 30MB and 50MB if -positions are on, depending on how many modules are installed.

TODO

 - highlights in shown documents
 - paging