Search::Estraier - pure perl module to use Hyper Estraier search engine
use Search::Estraier; # create and configure node my $node = new Search::Estraier::Node( url => 'http://localhost:1978/node/test', user => 'admin', passwd => 'admin', create => 1, label => 'Label for node', croak_on_error => 1, ); # create document my $doc = new Search::Estraier::Document; # add attributes $doc->add_attr('@uri', "http://estraier.gov/example.txt"); $doc->add_attr('@title', "Over the Rainbow"); # add body text to document $doc->add_text("Somewhere over the rainbow. Way up high."); $doc->add_text("There's a land that I heard of once in a lullaby."); die "error: ", $node->status,"\n" unless (eval { $node->put_doc($doc) });
use Search::Estraier; # create and configure node my $node = new Search::Estraier::Node( url => 'http://localhost:1978/node/test', user => 'admin', passwd => 'admin', croak_on_error => 1, ); # create condition my $cond = new Search::Estraier::Condition; # set search phrase $cond->set_phrase("rainbow AND lullaby"); my $nres = $node->search($cond, 0); if (defined($nres)) { print "Got ", $nres->hits, " results\n"; # for each document in results for my $i ( 0 ... $nres->doc_num - 1 ) { # get result document my $rdoc = $nres->get_doc($i); # display attribte print "URI: ", $rdoc->attr('@uri'),"\n"; print "Title: ", $rdoc->attr('@title'),"\n"; print $rdoc->snippet,"\n"; } } else { die "error: ", $node->status,"\n"; }
This module is implementation of node API of Hyper Estraier. Since it's perl-only module with dependencies only on standard perl modules, it will run on all platforms on which perl runs. It doesn't require compilation or Hyper Estraier development files on target machine.
It is implemented as multiple packages which closly resamble Ruby implementation. It also includes methods to manage nodes.
There are few examples in scripts directory of this distribution.
scripts
This methods should really move somewhere else.
Remove multiple whitespaces from string, as well as whitespaces at beginning or end
my $text = $self->_s(" this is a text "); $text = 'this is a text';
This class implements Document which is single item in Hyper Estraier.
It's is collection of:
'key' => 'value' pairs which can later be used for filtering of results
'key' => 'value'
You can add common filters to attrindex in estmaster's _conf file for better performance. See attrindex in Hyper Estraier P2P Guide.
attrindex
_conf
also 'key' => 'value' pairs
Text which will be used to create searchable corpus of your index and included in snippet output.
Text which will be searchable, but will not be included in snippet.
Create new document, empty or from draft.
my $doc = new Search::HyperEstraier::Document; my $doc2 = new Search::HyperEstraier::Document( $draft );
Add an attribute.
$doc->add_attr( name => 'value' );
Delete attribute using
$doc->add_attr( name => undef );
Add a sentence of text.
$doc->add_text('this is example text to display');
Add a hidden sentence.
$doc->add_hidden_text('this is example text just for search');
Add a vectors
$doc->add_vector( 'vector_name' => 42, 'another' => 12345, );
Set the substitute score
$doc->set_score(12345);
Get the substitute score
Get the ID number of document. If the object has never been registred, -1 is returned.
-1
print $doc->id;
Returns array with attribute names from document object.
my @attrs = $doc->attr_names;
Returns value of an attribute.
my $value = $doc->attr( 'attribute' );
Returns array with text sentences.
my @texts = $doc->texts;
Return whole text as single scalar.
my $text = $doc->cat_texts;
Dump draft data from document object.
print $doc->dump_draft;
Empty document object
$doc->delete;
This function is addition to original Ruby API, and since it was included in C wrappers it's here as a convinience. Document objects which go out of scope will be destroyed automatically.
my $cond = new Search::HyperEstraier::Condition;
$cond->set_phrase('search phrase');
$cond->add_attr('@URI STRINC /~dpavlin/');
$cond->set_order('@mdate NUMD');
$cond->set_max(42);
$cond->set_options( 'SURE' ); $cond->set_options( qw/AGITO NOIDF SIMPLE/ );
Possible options are:
check every N-gram
check every second N-gram
check every third N-gram
check every fourth N-gram
don't perform TF-IDF tuning
use simplified query phrase
Skipping N-grams will speed up search, but reduce accuracy. Every call to set_options will reset previous options;
set_options
This option changed in version 0.04 of this module. It's backwards compatibile.
0.04
Return search phrase.
print $cond->phrase;
Return search result order.
print $cond->order;
Return search result attrs.
my @cond_attrs = $cond->attrs;
Return maximum number of results.
print $cond->max;
-1 is returned for unitialized value, 0 is unlimited.
0
Return options for this condition.
print $cond->options;
Options are returned in numerical form.
Set number of skipped documents from beginning of results
$cond->set_skip(42);
Similar to offset in RDBMS.
offset
Return skip for this condition.
print $cond->skip;
$cond->set_distinct('@author');
Return distinct attribute
print $cond->distinct;
Filter out some links when searching.
Argument array of link numbers, starting with 0 (current node).
$cond->set_mask(qw/0 1 4/);
my $rdoc = new Search::HyperEstraier::ResultDocument( uri => 'http://localhost/document/uri/42', attrs => { foo => 1, bar => 2, }, snippet => 'this is a text of snippet' keywords => 'this\tare\tkeywords' );
Return URI of result document
print $rdoc->uri;
Returns array with attribute names from result document object.
my @attrs = $rdoc->attr_names;
my $value = $rdoc->attr( 'attribute' );
Return snippet from result document
print $rdoc->snippet;
Return keywords from result document
print $rdoc->keywords;
my $res = new Search::HyperEstraier::NodeResult( docs => @array_of_rdocs, hits => %hash_with_hints, );
Return number of documents
print $res->doc_num;
This will return real number of documents (limited by max). If you want to get total number of hits, see hits.
max
hits
Return single document
my $doc = $res->get_doc( 42 );
Returns undef if document doesn't exist.
Return specific hint from results.
print $res->hint( 'VERSION' );
Possible hints are: VERSION, NODE, HIT, HINT#n, DOCNUM, WORDNUM, TIME, LINK#n, VIEW.
VERSION
NODE
HIT
HINT#n
DOCNUM
WORDNUM
TIME
LINK#n
VIEW
More perlish version of hint. This one returns hash.
hint
my %hints = $res->hints;
Syntaxtic sugar for total number of hits for this query
print $res->hits;
It's same as
print $res->hint('HIT');
but shorter.
my $node = new Search::HyperEstraier::Node;
or optionally with url as parametar
url
my $node = new Search::HyperEstraier::Node( 'http://localhost:1978/node/test' );
or in more verbose form
my $node = new Search::HyperEstraier::Node( url => 'http://localhost:1978/node/test', user => 'admin', passwd => 'admin' create => 1, label => 'optional node label', debug => 1, croak_on_error => 1 );
with following arguments:
URL to node
specify username for node server authentication
password for authentication
create node if it doesn't exists
optional label for new node if create is used
create
dumps a lot of debugging output
very helpful during development. It will croak on all errors instead of silently returning -1 (which is convention of Hyper Estraier API in other languages).
Specify URL to node server
$node->set_url('http://localhost:1978');
Specify proxy server to connect to node server
$node->set_proxy('proxy.example.com', 8080);
Specify timeout of connection in seconds
$node->set_timeout( 15 );
Specify name and password for authentication to node server.
$node->set_auth('clint','eastwood');
Return status code of last request.
print $node->status;
-1 means connection failure.
Add a document
$node->put_doc( $document_draft ) or die "can't add document";
Return true on success or false on failure.
Remove a document
$node->out_doc( document_id ) or "can't remove document";
Return true on success or false on failture.
Remove a registrated document using it's uri
$node->out_doc_by_uri( 'file:///document/uri/42' ) or "can't remove document";
Edit attributes of a document
$node->edit_doc( $document_draft ) or die "can't edit document";
Retreive document
my $doc = $node->get_doc( document_id ) or die "can't get document";
my $doc = $node->get_doc_by_uri( 'file:///document/uri/42' ) or die "can't get document";
Retrieve the value of an atribute from object
my $val = $node->get_doc_attr( document_id, 'attribute_name' ) or die "can't get document attribute";
my $val = $node->get_doc_attr_by_uri( document_id, 'attribute_name' ) or die "can't get document attribute";
Exctract document keywords
my $keywords = $node->etch_doc( document_id ) or die "can't etch document";
my $keywords = $node->etch_doc_by_uri( 'file:///document/uri/42' ) or die "can't etch document";
Get ID of document specified by URI
my $id = $node->uri_to_id( 'file:///document/uri/42' );
This method won't croak, even if using croak_on_error.
croak_on_error
Private function used for implementing of get_doc, get_doc_by_uri, etch_doc, etch_doc_by_uri.
get_doc
get_doc_by_uri
etch_doc
etch_doc_by_uri
# this will decode received draft into Search::Estraier::Document object my $doc = $node->_fetch_doc( id => 42 ); my $doc = $node->_fetch_doc( uri => 'file:///document/uri/42' ); # to extract keywords, add etch my $doc = $node->_fetch_doc( id => 42, etch => 1 ); my $doc = $node->_fetch_doc( uri => 'file:///document/uri/42', etch => 1 ); # to get document attrubute add attr my $doc = $node->_fetch_doc( id => 42, attr => '@mdate' ); my $doc = $node->_fetch_doc( uri => 'file:///document/uri/42', attr => '@mdate' ); # more general form which allows implementation of # uri_to_id my $id = $node->_fetch_doc( uri => 'file:///document/uri/42', path => '/uri_to_id', chomp_resbody => 1 );
my $node_name = $node->name;
my $node_label = $node->label;
my $documents_in_node = $node->doc_num;
my $words_in_node = $node->word_num;
my $node_size = $node->size;
Search documents which match condition
my $nres = $node->search( $cond, $depth );
$cond is Search::Estraier::Condition object, while <$depth> specifies depth for meta search.
$cond
Search::Estraier::Condition
Function results Search::Estraier::NodeResult object.
Search::Estraier::NodeResult
Return URI encoded string generated from Search::Estraier::Condition
my $args = $node->cond_to_query( $cond, $depth );
This is method which uses LWP::UserAgent to communicate with Hyper Estraier node master.
LWP::UserAgent
my $rv = shuttle_url( $url, $content_type, $req_body, \$resbody );
$resheads and $resbody booleans controll if response headers and/or response body will be saved within object.
$resheads
$resbody
Set width of snippets in results
$node->set_snippet_width( $wwidth, $hwidth, $awidth );
$wwidth specifies whole width of snippet. It's 480 by default. If it's 0 snippet is not sent with results. If it is negative, whole document text is sent instead of snippet.
$wwidth
480
$hwidth specified width of strings from beginning of string. Default value is 96. Negative or zero value keep previous value.
$hwidth
96
$awidth specifies width of strings around each highlighted word. It's 96 by default. If negative of zero value is provided previous value is kept unchanged.
$awidth
Manage users of node
$node->set_user( 'name', $mode );
$mode can be one of:
$mode
delete account
set administrative right for user
set user account as guest
Return true on success, otherwise false.
Manage node links
$node->set_link('http://localhost:1978/node/another', 'another node label', $credit);
If $credit is negative, link is removed.
$credit
my @admins = @{ $node->admins };
Return array of users with admin rights on node
my @guests = @{ $node->guests };
Return array of users with guest rights on node
my $links = @{ $node->links };
Return array of links for this node
Return cache usage for a node
my $cache = $node->cacheusage;
Set actions on Hyper Estraier node master (estmaster process)
estmaster
$node->master( action => 'sync' );
All available actions are documented in http://hyperestraier.sourceforge.net/nguide-en.html#protocol
You could call those directly, but you don't have to. I hope.
Set information for node
$node->_set_info;
Clear information for node
$node->_clear_info;
On next call to name, label, doc_num, word_num or size node info will be fetch again from Hyper Estraier.
name
label
doc_num
word_num
size
Nothing.
http://hyperestraier.sourceforge.net/
Hyper Estraier Ruby interface on which this module is based.
Hyper Estraier now also has pure-perl binding included in distribution. It's a faster way to access databases directly if you are not running estmaster P2P server.
Dobrica Pavlinusic, <dpavlin@rot13.org>
Robert Klep <robert@klep.name> contributed refactored search code
Copyright (C) 2005-2006 by Dobrica Pavlinusic
This library is free software; you can redistribute it and/or modify it under the GPL v2 or later.
2 POD Errors
The following errors were encountered while parsing the POD:
Expected text after =item, not a number
To install Search::Estraier, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Search::Estraier
CPAN shell
perl -MCPAN -e shell install Search::Estraier
For more information on module installation, please visit the detailed CPAN module installation guide.