SWISH::Prog::DBI - index DB records with Swish-e
use SWISH::Prog::DBI; use Carp; my $prog_dbi = SWISH::Prog::DBI->new( db => [ "DBI:mysql:database=movies;host=localhost;port=3306", 'some_user', 'some_secret_pass', { RaiseError => 1, HandleError => sub { confess(shift) }, } ], alias_columns => 1 ); $prog_dbi->create( tables => { 'moviesIlike' => { title => 1, synopsis => 1, year => 1, director => 1, producer => 1, awards => 1 } } );
SWISH::Prog::DBI is a SWISH::Prog subclass designed for providing full-text search for your databases with Swish-e.
Since SWISH::Prog::DBI inherits from SWISH::Prog, read the SWISH::Prog docs first. Any overridden methods are documented here.
Create new indexer object. DBI_connect_info is passed directly to DBI's connect() method, so see the DBI docs for syntax. If DBI_connect_info is a DBI handle object, it is accepted as is. If DBI_connect_info is an array ref, it will be dereferenced and passed to connect(). Otherwise it will be passed to connect as is.
The alias_columns flag indicates whether all columns should be searchable under the default MetaName of swishdefault. The default is 1 (true). This is not the default behaviour of swish-e; this is a feature of SWISH::Prog.
alias_columns
NOTE: The new() method simply inherits from SWISH::Prog, so any params valid for that method() are allowed here.
Initialize object. This overrides SWISH::Prog init() base method.
Adds the special table MetaName to the Config object before opening indexer.
table
Internal method for retrieving db meta data.
Internal method for retrieving db column data.
Get/set all the table/column info for the current db.
Create index. The default is for all tables to be indexed, with each table name saved in the tablename MetaName.
tablename
opts supports the following options:
Only index the following tables (and optionally, columns within tables).
Example:
If you only want to index the table foo and only the columns bar and gab, pass this:
foo
bar
gab
$dbi->index( tables => { foo => { columns => bar=>1, gab=>1 } } } );
To index all columns:
$dbi->index( tables => { foo => 1 } );
#TODO - make the column hash value the MetaRankBias for that column
NOTE: create() just loops over all the relevant tables and calls index_sql() to actually create each index. If you want to tailor your SQL (using JOINs etc.) then you probably want to call index_sql() directly.
Returns number of rows indexed.
Fetch rows from the DB, convert to XML and pass to inherited index() method. %opts should include at least the following:
The SQL statement to execute.
%opts may also contain:
The name of the table. Used for creating virtual XML documents passed to indexer.
Which column to use as the title of the virtual document. If not defined, the title will be the empty string.
Which columns to include in swishdescription property. Default is none. Should be a hashref with column names as keys.
swishdescription
%opts may contain any other param that SWISH::Prog::Index->new() accepts.
$prog_dbi->index_sql( sql => 'SELECT * FROM `movies`', title => 'Movie_Title' );
Converts row_hash_ref to a XML string. Returns the XML.
The table_name is included in <table> tagset within each row. You can use the table MetaName to limit searches to a specific table.
<table
Override this method if you do not provide a title column in index_sql(). The return value of title_filter() will be used as the swishtitle for the row's virtual XML document.
title
swishtitle
Override this method if you need to alter the data returned from the db prior to it being converted to XML for indexing.
This method is called prior to title_filter() so all row data is affected.
NOTE: This is different from the row() method in the ::Doc subclass. This row_filter() gets called before the Doc object is created.
See FILTERS section.
There are several filtering methods in this module. Here's a summary of what they do and when they are called, so you have a better idea of how to best use them. Pay special attention to those called before converting the row to XML as opposed to after conversion.
Called by index_sql() for each row fetched from the database. This is the first filter called in the chain. Called before the row is converted to XML.
Called by index_sql() after row_filter() but only if an explicit title opt param was not passed to index_sql(). Called before the row is converted to XML.
Each of the normal SWISH::Prog::Doc attributes has a *_filter() method. These are called after the row is converted to XML. See SWISH::Prog::Doc.
NOTE: There is not a SWISH::Prog::DBI::Doc row_filter() method.
The normal SWISH::Prog filter() method is called as usual just before passing to ok() inside index(). Called after the row is converted to XML.
Since Swish-e version 2 does not support UTF-8 encodings, you may need to convert or transliterate your text prior to indexing. Swish-e offers the TranslateCharacters config option, but that does not work well with multi-byte characters.
Here's one way to handle the issue. Use Search::Tools::Transliterate and the row_filter() method to convert your UTF-8 text to single-byte characters. You can do this by subclassing SWISH::Prog::DBI and overriding the row_filter() method.
package My::DBI; use base qw( SWISH::Prog::DBI ); use POSIX qw(locale_h); use locale; use Encode; use Search::Tools::Transliterate; my $trans = Search::Tools::Transliterate->new; my ($charset) = (setlocale(LC_CTYPE) =~ m/^.+?\.(.+)/ || 'iso-8859-1'); sub row_filter { my $self = shift; my $row = shift; # We transliterate everything in each row and append as a charset column. # This means we can search for it but it'll not show in any property. # Instead we'll get the UTF-8 text in the property value. # The downside is that you can't do 'meta=asciitext' because the charset string # is not stored under any but the swishdefault metaname. # You could get around that by using MetaNameAlias in config() to alias # each column to column_charset. for (keys %$row) { # if it's not already UTF-8, make it so. unless ($trans->is_valid_utf8($row->{$_})) { $row->{$_} = Encode::encode_utf8(Encode::decode($charset, $row->{$_}, 1)); } # then transliterate to single-byte chars $row->{$_ . '_' . $charset} = $trans->convert($row->{$_}); } } 1; use My::DBI; my $dbi_prog = My::DBI->new( config => SWISH::Config->new( # also use Swish-e's feature so that all text is searchable as ASCII TranslateCharacters => ':ascii:' ), ); $dbi_prog->create;
http://swish-e.org/docs/
SWISH::Prog, SWISH::Prog::DBI::Doc, Search::Tools
Peter Karman, <perl@peknet.com>
Thanks to Atomic Learning for supporting the development of this module.
Copyright 2006 by Peter Karman
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
To install SWISH::Prog, copy and paste the appropriate command in to your terminal.
cpanm
cpanm SWISH::Prog
CPAN shell
perl -MCPAN -e shell install SWISH::Prog
For more information on module installation, please visit the detailed CPAN module installation guide.