SWISH::3 - Perl interface to libswish3
use SWISH::3; my $swish3 = SWISH::3->new( config => 'path/to/config.xml', handler => \&my_handler, regex => qr/\w+(?:'\w+)*/, ); $swish3->parse( 'path/to/file.xml' ) or die "failed to parse file: " . $swish3->error; printf "libxml2 version %s\n", $swish3->xml2_version; printf "libswish3 version %s\n", $swish3->version;
SWISH::3 is a Perl interface to the libswish3 C library.
All the SWISH_* constants defined in libswish3.h are available and can be optionally imported with the :constants keyword.
SWISH_*
use SWISH::3 qw(:constants);
See the SWISH::3::Constants section below.
In addition, the SWISH::3 Perl class defines some Perl-only constants:
An array of method names that can be called on a SWISH::3::Doc object in your handler method.
An array of method names that can be called on a SWISH::3::Token object.
A hashref of method names to id integer values. The integer values are assigned in libswish3.h.
A hashref of built-in property names to docinfo attribute names. The values of SWISH_DOC_PROP_MAP are the keys of SWISH_DOC_FIELDS_MAP.
The handler used if you do not specify one. By default is simply prints the contents of SWISH::3::Data to stderr.
args should be an array of key/value pairs. See SYNOPSIS.
Returns a new SWISH::3 instance.
Returns the libxml2 version used by libswish3.
Returns the libswish3 version.
Returns the Perl reference count for object.
Prints a isw* summary to stderr for codepoint. codepoint should be a positive integer representing a Unicode codepoint.
This prints a report similar to the swish_isw.c example script.
Returns the contents of filename as a scalar string. May also be called as an object method.
Returns file extension for filename.
Returns the configured MIME type for filename based on file extension.
Returns the configured MIME type for filename, ignoring any .gz extension. See looks_like_gz.
.gz
Returns true if filename has a file extension indicating it is gzip'd. Wraps the swish_fs_looks_like_gz() C function.
Wrapper around parse_file(), parse_buffer() and parse_fh() that tries to Do the Right Thing.
Calls the C function of the same name on filename.
Calls the C function of the same name on str. Note that str should contain the API headers.
Not yet implemented.
Returns the error message from the last call to parse(), parse_file() parse_buffer() or parse_fh(). If there was no error on the last call to one of those methods, returns undef.
Set the Config object.
Returns SWISH::3::Config object.
Alias for get_config().
Set the Analyzer object.
Returns SWISH::3::Analyzer object.
Alias for get_analyzer()
Set the Parser object.
Returns SWISH::3::Parser object.
Alias for get_parser().
Set the parser handler CODE ref.
Returns a CODE ref for the handler.
Default class_name is SWISH::3::Data.
SWISH::3::Data
Returns class name.
Default class_name is SWISH::3::Parser.
SWISH::3::Parser
Default class_name is SWISH::3::Config.
SWISH::3::Config
Default class_name is SWISH::3::Analyzer.
SWISH::3::Analyzer
Set the regex used in tokenize().
Returns the regex used in tokenize().
Alias for get_regex().
Returns the SWISH::3::Stash object used internally by the SWISH::3 object. You typically do not need to access this object as a user of SWISH::3, but if you are developing code that needs to access objects within a handler function, you can put it in the Stash object and then retrieve it later.
Example:
my $s3 = SWISH::3->new( handler => \&handler ); my $stash = $s3->get_stash(); $stash->set('my_indexer' => $indexer); # later.. sub handler { my $data = shift; my $indexer = $data->s3->get_stash->get('my_indexer'); $indexer->add_doc( $data ); }
Returns a SWISH::3::TokenIterator object representing string. The tokenizer uses the regex defined in set_regex().
Returns a SWISH::3::TokenIterator object representing string. The tokenizer uses the built-in libswish3 tokenizer, not a regex.
Returns the internal reference count for the underlying C struct pointer.
Get/set the internal debugging level.
Like calling Devel::Peek::Dump on object.
Calls the C function swish_memcount_debug().
Returns the global C malloc counter value.
A wrapper around describe() and Data::Dump::dump().
Returns a new SWISH::3::Analyzer instance.
Set the regex used in SWISH::3->tokenize().
Returns a qr// regex object.
Get the tokenize flag. Default is true.
Toggle the tokenize flag. Default is true (tokenize contents when file is parsed).
An alias for add() is merge().
delete() is NOT YET IMPLEMENTED.
Get the parent SWISH::3 object.
Get the parent SWISH::3::Config object.
Returns the string value of PropertyName name.
Returns the string value of MetaName name.
Returns a hashref of name/value pairs.
Returns a SWISH::3::Doc object.
Returns a SWISH::3::TokenIterator object.
Returns the last modified time as epoch int.
Returns the size in bytes.
Returns the number of tokenized words in the Doc.
Returns the string encoding of Doc.
Returns the URI value.
Returns the file extension.
Returns the mime type.
Returns the name of the parser used (TXT, HTML, or XML).
Returns the intended action (e.g., add, delete, update).
Returns a new SWISH::3::MetaName instance.
TODO: there are no set methods so this isn't of much use.
Returrns the id integer.
Returns the name string.
Returns the bias integer.
Returns the alias_for string.
Get the SWISH::3::MetaName object for name
Set the SWISH::3::MetaName for name.
Returns array of names.
Returns the id integer.
Returns the ignore_case boolean.
Returns the type integer.
Returns the verbatim boolean.
Returns the max integer.
Returns the sort boolean.
Get the SWISH::3::Property object for name
Set the SWISH::3::Property for name.
Returns the value string.
Returns the SWISH::3::MetaName object for the Token.
Returns the id integer for the related MetaName.
Returns the context string.
Returns the position integer.
Returns the length in bytes of the Token.
Returns the next SWISH::3::Token.
The following constants are imported directly from libswish3 and are defined there.
libswish3 is not yet ported to Windows.
Peter Karman perl@peknet.com
perl@peknet.com
Copyright 2010 Peter Karman.
This file is part of libswish3.
libswish3 is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
libswish3 is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
http://swish-e.org/
SWISH::Prog
To install SWISH::3, copy and paste the appropriate command in to your terminal.
cpanm
cpanm SWISH::3
CPAN shell
perl -MCPAN -e shell install SWISH::3
For more information on module installation, please visit the detailed CPAN module installation guide.