The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

NexTrieve::Index - handle indexing with NexTrieve

SYNOPSIS

 use NexTrieve;
 die unless NexTrieve::Index->executable;

 $ntv = NexTrieve->new( | {method => value} );

 # using direct access
 $resource = $ntv->Resource( | file | xml | {method => value} );
 $index = $ntv->Index( | file | $resource | {method => value}, | {}, | @files );

 # use version control (new -> current -> old)
 $index->update_start( | incremental );

 # indexing created XML on the fly
 $docseq = $index->Docseq;
 $docseq->add( xml );
 $docseq->done;

 # indexing pre-created XML stored in files
 $index->index( file1,file2,file3 );

 # do it all yourself with created XML on the fly
 $handle = $index->stream;
 print $handle xml;
 close( $handle );

 $result = $index->result;

 # finish version control (new -> current -> old)
 $index->update_end;
 $ntv->Daemon( $resource )->restart;

DESCRIPTION

The Index object of the Perl support for NexTrieve. Do not create directly, but through the Index method of the NexTrieve object.

CREATING AN INDEX

The process of creating a NexTrieve index is straightforward: create a directory which is to contain the result of the indexing process (also referred to as the indexdir). Create a NexTrieve::Index object with its Resources pointing to the right directory and perform your indexing by either calling method index or through creating a magic Docseq object that is used by any of the other NexTrieve::xxx modules.

UPDATING AN INDEX

Updating an already existing index can be done in two ways: either incrementally (only adding documents in a document sequence that were new or changed) or by doing a full re-index.

If you are running a search service on an already existing index, you do not want that search service to be interrupted by the indexing process. To be able to create a new index while not interrupting the running search service, the following procedure is performed:

1. create a directory with same name as the indexdir with extension .new

As a first step, a directory is created with the same name as the original indexdirectory, but with the extension ".new". If such a directory already exists, it is cleared of any files that exist in it.

2. copy current index into the new indexdirectory if incremental update

If an incremental update was requested, all the files necessary for that index are copied to the new directory.

3. temporarily override the indexdir setting

To index using this new indexdirectory, a temporary override of the indexdir is necessary.

-item 4. perform the indexing

Do the indexing that needs to be done, either re-indexing all documents or just the documents that were new or changed, depending on whether you preferred an incremental indexing or not.

5. swap indexdirectories and restore indexdir setting

Once the indexing is done, we now have two indexdirectories: the live one and the one with the ".new" extension. The live one is renamed to the same name, but with the ".old" extension. The ".new" extension is removed from the new indexdirectory. No other action is needed to be taken if you use the on-demand way of searching, using the "ntvsearch" program.

6. restart server process

If there is a NexTrieve server process running using the original indexdirectory, it should be stopped and started again using the new indexdirectory (that now has the same name as the original indexdirectory).

Steps 1, 2 and 3 of this procedure are performed by the update_start method. Step 4 can be done in many ways, e.g. using the files method or having other modules use the magic Docseq method. Step 5 is performed by the update_end method. Step 6 can be performed by the "restart" method of the NexTrieve::Daemon module.

CLASS METHODS

These methods are available as class methods.

executable

 $executable = NexTrieve::Index->executable;
 ($program,$expiration,$software,$database) = NexTrieve::Index->executable;

Return information about the associated NexTrieve program "ntvindex".

The first output parameter contains the full program name of the NexTrieve executable "ntvindex". It contains the empty string if the "ntvindex" executable could not be found or is not executable by the effective user. Can be used as a flag. Is the only parameter returned in a scalar context.

If this method is called in a list context, an attempt is made to execute the NexTrieve program "ntvindex" to obtain additional information. Then the following output parameters are returned.

The second output parameter returns the expiration date of the license that NexTrieve is using by default. If available, then the date is returned as a datestamp (YYYYMMDD).

The third output parameter returns the version of the NexTrieve software that is being used. It is a string in the form "M.m.rr", whereby "M" is the major release number, "m" is the minor release number and "rr" is the build number.

The fourth output parameter returns the version of the internal database that will be created by the version of the NexTrieve software that is being used. It is a string in the form "M.m.rr", whereby "M" is the major release number, "m" is the minor release number and "rr" is the build number.

OBJECT METHODS

These methods return objects.

Docseq

 $docseq = $index->Docseq( | extrafile | extrahandle );

The "Docseq" method returns a special purpose streaming NexTrieve::Docseq that will cause any data to be added to the NexTrieve::Docseq object to be immediately indexed.

Each input parameter specified indicates either:

 - a filename to store a copy of the XML indexed
 - a handle to write a copy of the XML to

This allows you the benefit of immediately indexing anything that is generated but still keep a copy of the XML generated available for reference.

See the NexTrieve::Docseq module for more information.

Resource

 $resource = $index->Resource;
 $index->Resource( $resource | file | xml | {method => value} );

The "Resource" method is primarily intended to allow you to obtain the NexTrieve::Resource object that is (indirectly) created when the NexTrieve::Index object is created. If necessary, it can also be used to create a new NexTrieve::Resource object associated with the NexTrieve::Index object.

See the NexTrieve::Resource module for more information.

ResourceFromIndex

 $resource = $index->ResourceFromIndex;

If for whatever reason the resource-file of an already existing NexTrieve index is lost, then the "ResourceFromIndex" method can be used to create the basic NexTrieve::Resource object that corresponds to that index.

OTHER METHODS

The following methods set and return other aspects of the NexTrieve::Index object.

index

 $error = $index->index( file1,file2,file3 );

The "index" method allows for a quick and dirty index of document sequences that have been created previously and saved in files. The input parameters specify the names of the files that should be indexed.

The output parameter returns the exit process value of the "ntvindex" program. Success is indicated by the value 0, any other value indicates an error in the indexing process in which case an error is raised. Any specific error messages from the "ntvindex" program can be obtained through the result method.

indexdir

 $index->indexdir( directory );
 $directory = $index->indexdir;

The "indexdir" method specifies an indexdirectory other than the indexdirectory that is specified in the Resource object. By default, the indexdirectory information from the Resource object is used.

log

 $index->log( filename );
 $log = $index->log;

The "log" method specifies the name of the file in which any error and other messages are stored during the indexing process performed by the "ntvindex" program. If no filename is specified before the indexing process commences, the file "ntvindex.log" located in the indexdir will be assumed.

Use method result to read from the logfile.

optimize

 $error = $index->optimize;

The "optimize" method performs the optimization of an index created by the "ntvindex" program. To be able to do this, the "ntvopt" program must be installed with a valid license.

The output parameter returns the exit process value of the "ntvopt" program. Success is indicated by the value 0, any other value indicates an error in the optimization process in which case an error is raised.

result

 $text = $index->result;

Each consecutive call to the "result" method returns any lines that were added to the logfile since the last call to the "result" method. It can be used in a threaded environment to monitor the progress of the indexing process. Or it can be used to view the final result after the indexing process is done.

stream

 $handle = $index->stream;
 print $handle "<ntv:docseq>....</ntv:docseq>";

The "stream" method returns a special purpose handle to which a document sequence can be written that you want to be indexed immediately. It is rarely needed directly. Internally, it is used to give the Docseq method its magic.

update_end

 $index->update_end;

The "update_end" method performs step 5 of the "UPDATING AN INDEX" process.

update_start

 $index->update_start( | incremental );

The "update_start" method performs steps 1, 2 and 3 of the "UPDATING AN INDEX" process. This input parameter specifies whether the indexing process to be performed should be incremental or not.

AUTHOR

Elizabeth Mattijsen, <liz@dijkmat.nl>.

Please report bugs to <perlbugs@dijkmat.nl>.

COPYRIGHT

Copyright (c) 1995-2002 Elizabeth Mattijsen <liz@dijkmat.nl>. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SEE ALSO

http://www.nextrieve.com, the NexTrieve.pm and the other NexTrieve::xxx modules.