The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Search::Estraier - pure perl module to use Hyper Estraier search engine

SYNOPSIS

Simple indexer

        use Search::Estraier;

        # create and configure node
        my $node = new Search::Estraier::Node;
        $node->set_url("http://localhost:1978/node/test");
        $node->set_auth("admin","admin");

        # create document
        my $doc = new Search::Estraier::Document;

        # add attributes
        $doc->add_attr('@uri', "http://estraier.gov/example.txt");
        $doc->add_attr('@title', "Over the Rainbow");

        # add body text to document
        $doc->add_text("Somewhere over the rainbow.  Way up high.");
        $doc->add_text("There's a land that I heard of once in a lullaby.");

        die "error: ", $node->status,"\n" unless ($node->put_doc($doc));

Simple searcher

        use Search::Estraier;

        # create and configure node
        my $node = new Search::Estraier::Node;
        $node->set_url("http://localhost:1978/node/test");
        $node->set_auth("admin","admin");

        # create condition
        my $cond = new Search::Estraier::Condition;

        # set search phrase
        $cond->set_phrase("rainbow AND lullaby");

        my $nres = $node->search($cond, 0);
        if (defined($nres)) {
                # for each document in results
                for my $i ( 0 ... $nres->doc_num - 1 ) {
                        # get result document
                        my $rdoc = $nres->get_doc($i);
                        # display attribte
                        print "URI: ", $rdoc->attr('@uri'),"\n";
                        print "Title: ", $rdoc->attr('@title'),"\n";
                        print $rdoc->snippet,"\n";
                }
        } else {
                die "error: ", $node->status,"\n";
        }

DESCRIPTION

This module is implementation of node API of Hyper Estraier. Since it's perl-only module with dependencies only on standard perl modules, it will run on all platforms on which perl runs. It doesn't require compilation or Hyper Estraier development files on target machine.

It is implemented as multiple packages which closly resamble Ruby implementation. It also includes methods to manage nodes.

There are few examples in scripts directory of this distribution.

Inheritable common methods

This methods should really move somewhere else.

_s

Remove multiple whitespaces from string, as well as whitespaces at beginning or end

 my $text = $self->_s(" this  is a text  ");
 $text = 'this is a text';

Search::Estraier::Document

This class implements Document which is collection of attributes (key=value), vectors (also key value) display text and hidden text.

new

Create new document, empty or from draft.

  my $doc = new Search::HyperEstraier::Document;
  my $doc2 = new Search::HyperEstraier::Document( $draft );

add_attr

Add an attribute.

  $doc->add_attr( name => 'value' );

Delete attribute using

  $doc->add_attr( name => undef );

add_text

Add a sentence of text.

  $doc->add_text('this is example text to display');

add_hidden_text

Add a hidden sentence.

  $doc->add_hidden_text('this is example text just for search');

id

Get the ID number of document. If the object has never been registred, -1 is returned.

  print $doc->id;

attr_names

Returns array with attribute names from document object.

  my @attrs = $doc->attr_names;

attr

Returns value of an attribute.

  my $value = $doc->attr( 'attribute' );

texts

Returns array with text sentences.

  my @texts = $doc->texts;

cat_texts

Return whole text as single scalar.

 my $text = $doc->cat_texts;

dump_draft

Dump draft data from document object.

  print $doc->dump_draft;

delete

Empty document object

  $doc->delete;

This function is addition to original Ruby API, and since it was included in C wrappers it's here as a convinience. Document objects which go out of scope will be destroyed automatically.

Search::Estraier::Condition

new

  my $cond = new Search::HyperEstraier::Condition;

set_phrase

  $cond->set_phrase('search phrase');

add_attr

  $cond->add_attr('@URI STRINC /~dpavlin/');

set_order

  $cond->set_order('@mdate NUMD');

set_max

  $cond->set_max(42);

set_options

  $cond->set_options( SURE => 1 );

phrase

Return search phrase.

  print $cond->phrase;

order

Return search result order.

  print $cond->order;

attrs

Return search result attrs.

  my @cond_attrs = $cond->attrs;

max

Return maximum number of results.

  print $cond->max;

-1 is returned for unitialized value, 0 is unlimited.

options

Return options for this condition.

  print $cond->options;

Options are returned in numerical form.

Search::Estraier::ResultDocument

new

  my $rdoc = new Search::HyperEstraier::ResultDocument(
        uri => 'http://localhost/document/uri/42',
        attrs => {
                foo => 1,
                bar => 2,
        },
        snippet => 'this is a text of snippet'
        keywords => 'this\tare\tkeywords'
  );

uri

Return URI of result document

  print $rdoc->uri;

attr_names

Returns array with attribute names from result document object.

  my @attrs = $rdoc->attr_names;

attr

Returns value of an attribute.

  my $value = $rdoc->attr( 'attribute' );

snippet

Return snippet from result document

  print $rdoc->snippet;

keywords

Return keywords from result document

  print $rdoc->keywords;

Search::Estraier::NodeResult

new

  my $res = new Search::HyperEstraier::NodeResult(
        docs => @array_of_rdocs,
        hits => %hash_with_hints,
  );

doc_num

Return number of documents

  print $res->doc_num;

get_doc

Return single document

  my $doc = $res->get_doc( 42 );

Returns undef if document doesn't exist.

hint

Return specific hint from results.

  print $rec->hint( 'VERSION' );

Possible hints are: VERSION, NODE, HIT, HINT#n, DOCNUM, WORDNUM, TIME, LINK#n, VIEW.

Search::Estraier::Node

new

  my $node = new Search::HyperEstraier::Node;

or optionally with url as parametar

  my $node = new Search::HyperEstraier::Node( 'http://localhost:1978/node/test' );

set_url

Specify URL to node server

  $node->set_url('http://localhost:1978');

set_proxy

Specify proxy server to connect to node server

  $node->set_proxy('proxy.example.com', 8080);

set_timeout

Specify timeout of connection in seconds

  $node->set_timeout( 15 );

set_auth

Specify name and password for authentication to node server.

  $node->set_auth('clint','eastwood');

status

Return status code of last request.

  print $node->status;

-1 means connection failure.

put_doc

Add a document

  $node->put_doc( $document_draft ) or die "can't add document";

Return true on success or false on failture.

out_doc

Remove a document

  $node->out_doc( document_id ) or "can't remove document";

Return true on success or false on failture.

out_doc_by_uri

Remove a registrated document using it's uri

  $node->out_doc_by_uri( 'file:///document/uri/42' ) or "can't remove document";

Return true on success or false on failture.

edit_doc

Edit attributes of a document

  $node->edit_doc( $document_draft ) or die "can't edit document";

Return true on success or false on failture.

get_doc

Retreive document

  my $doc = $node->get_doc( document_id ) or die "can't get document";

Return true on success or false on failture.

get_doc_by_uri

Retreive document

  my $doc = $node->get_doc_by_uri( 'file:///document/uri/42' ) or die "can't get document";

Return true on success or false on failture.

get_doc_attr

Retrieve the value of an atribute from object

  my $val = $node->get_doc_attr( document_id, 'attribute_name' ) or
        die "can't get document attribute";

get_doc_attr_by_uri

Retrieve the value of an atribute from object

  my $val = $node->get_doc_attr_by_uri( document_id, 'attribute_name' ) or
        die "can't get document attribute";

etch_doc

Exctract document keywords

  my $keywords = $node->etch_doc( document_id ) or die "can't etch document";

etch_doc_by_uri

Retreive document

  my $keywords = $node->etch_doc_by_uri( 'file:///document/uri/42' ) or die "can't etch document";

Return true on success or false on failture.

uri_to_id

Get ID of document specified by URI

  my $id = $node->uri_to_id( 'file:///document/uri/42' );

_fetch_doc

Private function used for implementing of get_doc, get_doc_by_uri, etch_doc, etch_doc_by_uri.

 # this will decode received draft into Search::Estraier::Document object
 my $doc = $node->_fetch_doc( id => 42 );
 my $doc = $node->_fetch_doc( uri => 'file:///document/uri/42' );

 # to extract keywords, add etch
 my $doc = $node->_fetch_doc( id => 42, etch => 1 );
 my $doc = $node->_fetch_doc( uri => 'file:///document/uri/42', etch => 1 );

 # to get document attrubute add attr
 my $doc = $node->_fetch_doc( id => 42, attr => '@mdate' );
 my $doc = $node->_fetch_doc( uri => 'file:///document/uri/42', attr => '@mdate' );

 # more general form which allows implementation of
 # uri_to_id
 my $id = $node->_fetch_doc(
        uri => 'file:///document/uri/42',
        path => '/uri_to_id',
        chomp_resbody => 1
 );

name

  my $node_name = $node->name;

label

  my $node_label = $node->label;

doc_num

  my $documents_in_node = $node->doc_num;

word_num

  my $words_in_node = $node->word_num;

size

  my $node_size = $node->size;

Search documents which match condition

  my $nres = $node->search( $cond, $depth );

$cond is Search::Estraier::Condition object, while <$depth> specifies depth for meta search.

Function results Search::Estraier::NodeResult object.

cond_to_query

Return URI encoded string generated from Search::Estraier::Condition

  my $args = $node->cond_to_query( $cond, $depth );

shuttle_url

This is method which uses LWP::UserAgent to communicate with Hyper Estraier node master.

  my $rv = shuttle_url( $url, $content_type, $req_body, \$resbody );

$resheads and $resbody booleans controll if response headers and/or response body will be saved within object.

set_snippet_width

Set width of snippets in results

  $node->set_snippet_width( $wwidth, $hwidth, $awidth );

$wwidth specifies whole width of snippet. It's 480 by default. If it's 0 snippet is not sent with results. If it is negative, whole document text is sent instead of snippet.

$hwidth specified width of strings from beginning of string. Default value is 96. Negative or zero value keep previous value.

$awidth specifies width of strings around each highlighted word. It's 96 by default. If negative of zero value is provided previous value is kept unchanged.

set_user

Manage users of node

  $node->set_user( 'name', $mode );

$mode can be one of:

0

delete account

1

set administrative right for user

2

set user account as guest

Return true on success, otherwise false.

Manage node links

  $node->set_link('http://localhost:1978/node/another', 'another node label', $credit);

If $credit is negative, link is removed.

PRIVATE METHODS

You could call those directly, but you don't have to. I hope.

_set_info

Set information for node

  $node->_set_info;

EXPORT

Nothing.

SEE ALSO

http://hyperestraier.sourceforge.net/

Hyper Estraier Ruby interface on which this module is based.

AUTHOR

Dobrica Pavlinusic, <dpavlin@rot13.org>

COPYRIGHT AND LICENSE

Copyright (C) 2005-2006 by Dobrica Pavlinusic

This library is free software; you can redistribute it and/or modify it under the GPL v2 or later.

2 POD Errors

The following errors were encountered while parsing the POD:

Around line 1488:

Expected text after =item, not a number

Around line 1492:

Expected text after =item, not a number