The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

SWISH::Prog::DBI - index DB records with Swish-e

SYNOPSIS

    use SWISH::Prog::DBI;
    use Carp;
    
    my $prog_dbi = SWISH::Prog::DBI->new(
        db => [
            "DBI:mysql:database=movies;host=localhost;port=3306",
            'some_user', 'some_secret_pass',
            {
                RaiseError  => 1,
                HandleError => sub { confess(shift) },
            }
        ],
        alias_columns => 1
    );
    
    $prog_dbi->create(
            tables => {
                'moviesIlike' => {
                    title       => 1,
                    synopsis    => 1,
                    year        => 1,
                    director    => 1,
                    producer    => 1,
                    awards      => 1
                    }
                 }
                );

DESCRIPTION

SWISH::Prog::DBI is a SWISH::Prog subclass designed for providing full-text search for your databases with Swish-e.

Since SWISH::Prog::DBI inherits from SWISH::Prog, read the SWISH::Prog docs first. Any overridden methods are documented here.

METHODS

new( db => DBI_connect_info, alias_columns => 0|1 )

Create new indexer object. DBI_connect_info is passed directly to DBI's connect() method, so see the DBI docs for syntax. If DBI_connect_info is a DBI handle object, it is accepted as is. If DBI_connect_info is an array ref, it will be dereferenced and passed to connect(). Otherwise it will be passed to connect as is.

The alias_columns flag indicates whether all columns should be searchable under the default MetaName of swishdefault. The default is 1 (true). This is not the default behaviour of swish-e; this is a feature of SWISH::Prog.

NOTE: The new() method simply inherits from SWISH::Prog, so any params valid for that method() are allowed here.

init

Initialize object. This overrides SWISH::Prog init() base method.

init_indexer

Adds the special table MetaName to the Config object before opening indexer.

info

Internal method for retrieving db meta data.

cols

Internal method for retrieving db column data.

table_meta

Get/set all the table/column info for the current db.

create( opts )

Create index. The default is for all tables to be indexed, with each table name saved in the tablename MetaName.

opts supports the following options:

tables

Only index the following tables (and optionally, columns within tables).

Example:

If you only want to index the table foo and only the columns bar and gab, pass this:

 $dbi->index( tables => { foo => { columns => bar=>1, gab=>1 } } } );

To index all columns:

 $dbi->index( tables => { foo => 1 } );
TODO
 #TODO - make the column hash value the MetaRankBias for that column

NOTE: create() just loops over all the relevant tables and calls index_sql() to actually create each index. If you want to tailor your SQL (using JOINs etc.) then you probably want to call index_sql() directly.

Returns number of rows indexed.

index_sql( %opts )

Fetch rows from the DB, convert to XML and pass to inherited index() method. %opts should include at least the following:

sql

The SQL statement to execute.

%opts may also contain:

table

The name of the table. Used for creating virtual XML documents passed to indexer.

title

Which column to use as the title of the virtual document. If not defined, the title will be the empty string.

desc

Which columns to include in swishdescription property. Default is none. Should be a hashref with column names as keys.

%opts may contain any other param that SWISH::Prog::Index->new() accepts.

Example:

 $prog_dbi->index_sql(  sql => 'SELECT * FROM `movies`',
                        title => 'Movie_Title'
                        );
                        

row2xml( table_name, row_hash_ref, title )

Converts row_hash_ref to a XML string. Returns the XML.

The table_name is included in <table> tagset within each row. You can use the table MetaName to limit searches to a specific table.

title_filter( row_hash_ref )

Override this method if you do not provide a title column in index_sql(). The return value of title_filter() will be used as the swishtitle for the row's virtual XML document.

row_filter( row_hash_ref )

Override this method if you need to alter the data returned from the db prior to it being converted to XML for indexing.

This method is called prior to title_filter() so all row data is affected.

NOTE: This is different from the row() method in the ::Doc subclass. This row_filter() gets called before the Doc object is created.

See FILTERS section.

FILTERS

There are several filtering methods in this module. Here's a summary of what they do and when they are called, so you have a better idea of how to best use them. Pay special attention to those called before converting the row to XML as opposed to after conversion.

row_filter

Called by index_sql() for each row fetched from the database. This is the first filter called in the chain. Called before the row is converted to XML.

title_filter

Called by index_sql() after row_filter() but only if an explicit title opt param was not passed to index_sql(). Called before the row is converted to XML.

SWISH::Prog::DBI::Doc *_filter() methods

Each of the normal SWISH::Prog::Doc attributes has a *_filter() method. These are called after the row is converted to XML. See SWISH::Prog::Doc.

NOTE: There is not a SWISH::Prog::DBI::Doc row_filter() method.

filter

The normal SWISH::Prog filter() method is called as usual just before passing to ok() inside index(). Called after the row is converted to XML.

ENCODINGS

Since Swish-e version 2 does not support UTF-8 encodings, you may need to convert or transliterate your text prior to indexing. Swish-e offers the TranslateCharacters config option, but that does not work well with multi-byte characters.

Here's one way to handle the issue. Use Search::Tools::Transliterate and the row_filter() method to convert your UTF-8 text to single-byte characters. You can do this by subclassing SWISH::Prog::DBI and overriding the row_filter() method.

Example:

 package My::DBI;
 use base qw( SWISH::Prog::DBI );

 use POSIX qw(locale_h);
 use locale;
 use Encode;
 use Search::Tools::Transliterate;
 my $trans = Search::Tools::Transliterate->new;
 my ($charset) = (setlocale(LC_CTYPE) =~ m/^.+?\.(.+)/ || 'iso-8859-1');

 sub row_filter
 {
    my $self = shift;
    my $row  = shift;

    # We transliterate everything in each row and append as a charset column.
    # This means we can search for it but it'll not show in any property.
    # Instead we'll get the UTF-8 text in the property value.
    # The downside is that you can't do 'meta=asciitext' because the charset string
    # is not stored under any but the swishdefault metaname.
    # You could get around that by using MetaNameAlias in config() to alias
    # each column to column_charset.
    
    for (keys %$row)
    {
        # if it's not already UTF-8, make it so.
        unless ($trans->is_valid_utf8($row->{$_}))
        {
            $row->{$_} = Encode::encode_utf8(Encode::decode($charset, $row->{$_}, 1));
        }
        
        # then transliterate to single-byte chars
        $row->{$_ . '_' . $charset} = $trans->convert($row->{$_});
    }

 }

 1;
 
 use My::DBI;
 
 my $dbi_prog = My::DBI->new(
                    config => SWISH::Config->new(     
             # also use Swish-e's feature so that all text is searchable as ASCII
                      TranslateCharacters => ':ascii:'
                                ),
                                
                            );
                            
 $dbi_prog->create;
                    

SEE ALSO

http://swish-e.org/docs/

SWISH::Prog, SWISH::Prog::DBI::Doc, Search::Tools

AUTHOR

Peter Karman, <perl@peknet.com>

Thanks to Atomic Learning for supporting the development of this module.

COPYRIGHT AND LICENSE

Copyright 2006 by Peter Karman

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.