The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Search::Circa - a Search Engine / Indexer running with Mysql

DESCRIPTION

This is Search::Circa, a module who provide functions to perform search on Circa, a www search engine running with Mysql. Circa is for your Web site, or for a list of sites. It indexes like Altavista does. It can read, add and parse all url's found in a page. It add url and word to MySQL for use it at search.

Circa can be used for index 100 to 100 000 url

Notes:

  • Accents are removed on search and when indexed

  • Search are case unsensitive (mmmh what my english ? ;-)

Search::Circa::Search work with Search::Circa::Indexer result. Search::Circa::Search is a Perl interface, but it's exist on this package a PHP client too.

Search::Circa is root class for Search::Circa::Indexer and Search::Circa::Search.

SYNOPSIS

See Search::Circa::Search, Search::Circa::Indexer

FEATURES

  • Search Features

    • Boolean query language support : or (default) and ("+") not ("-"). Ex perl + faq -cgi : Documents with faq, eventually perl and not cgi.

    • Client Perl or PHP

    • Can browse site by directory / rubrique.

    • Search for different criteria: news, last modified date, language, URL / site.

  • Full text indexing

  • Different weights for title, keywords, description and rest of page HTML read can be given in configuration

  • Herite from features of LWP suite:

    • Support protocol HTTP://,FTP://, FILE:// (Can do indexation of filesystem without talk to Web Server)

    • Full support of standard robots exclusion (robots.txt). Identification with CircaIndexer/0.1, mail alian@alianwebserver.com. Delay requests to the same server for 8 secondes. "It's not a bug, it's a feature!" Basic rule for HTTP serveur load.

    • Support proxy HTTP.

  • Make index in MySQL

  • Read HTML and full text plain

  • Several kinds of indexing : full, incremental, only on a particular server.

  • Documents not updated are not reindexed.

  • All requests for a file are made first with a head http request, for information such as validate, last update, size, etc.Size of documents read can be restricted (Ex: don't get all documents > 5 MB). For use with low-bandwidth connections, or computers which do not have much memory.

  • HTML template can be easily customized for your needs.

  • Admin functions available by browser interface or command-line.

  • Index the different links found in a CGI (all after name_of_file?)

FREQUENTLY ASKED QUESTIONS

Q: Where are clients for example ?

A: See in demo directory. For command line, see circa_admin and circa_search,, for CGI, take a look in cgi-bin/circa, they are installed with make cgi.

Q: Where are global parameters to connect to Circa ?

A: Use lib/CircaConf.pm file

Q : What is an account for Circa ?

A: It's like a project, or a databse. A namespace for what you want.

Q : How I begin with indexer ?

A: See man page of circa_admin

Q : Did you succed to use Circa with mod_perl ?

A: Yes

Public interface

You use this method behind Search::Circa::Indexer and Search::Circa::Search object

connect user, password, database, host

Connect Circa to MySQL. Return 1 on succes, 0 else

  • user : Utilisateur MySQL

  • password : Mot de passe MySQL

  • db : Database MySQL

  • bost : Adr IP du serveur MySQL

Connect Circa to MySQL. Return 1 on succes, 0 else

close

Close connection to MySQL. This method is called with DESTROY method of this class.

pre_tbl

Get or set the prefix for table name for use Circa with more than one time on a same database

fill_template masque, ref_hash
  • masque : Path of template

  • vars : hash ref with keys/val to substitue

Give template with remplaced variables Ex:

 $circa->fill_template('A <? $age ?> ans', ('age' => '12 ans'));

Will return:

  J'ai 12 ans,
fetch_first request

Execute request SQL on db and return first row. In list context, retun full row, else return just first column.

trace level, msg

Print message msg on standart output error if debug level for script is upper than level.

prompt message, default_value

Ask in STDIN for a parameter with message and default_value and return value

SEE ALSO

Search::Circa::Indexer, Indexer module

Search::Circa::Search, Searcher module

Search::Circa::Annuaire, Manage directory of Circa

Search::Circa::Url, Manage url of Circa

Search::Circa::Categorie, Manage categorie of Circa

VERSION

$Revision: 1.18 $

AUTHOR

Alain BARBET alian@alianwebserver.com