CGI::Application::Search - Base class for CGI::App Swish-e site engines
package My::Search; use base 'CGI::Application::Search'; sub cgiapp_init { my $self = shift; $self->param( 'SWISHE_INDEX' => 'my-swishe.index', 'TEMPLATE' => 'search_results.tmpl', ); } #let the user turn context highlighting off sub cgiapp_prerun { my $self = shift; $self->param('HIGHLIGHT' => 0) if($self->query->param('highlight_off')); } 1;
A CGI::Application based control module that uses Swish-e API in perl (http://swish-e.org) to to perform searches on a swish-e index of documents. It uses HTML::Template to display the search form and the results. You may customize this template to alter the look and feel of the generated search interface.
If this is your first time using Swish-e, or you think you need a refresher or if you want step-by-step instructions to use the AJAX capabilities of this module, then please see CGI::Application::Search::Tutorial.
The start_mode is show_search and these are the other available run modes:
This method will load the template pointed to by the TEMPLATE param (falling back on a default internal template if none is configured) and display it to the user. It will 'associate' this template with $self so that any parameters in $self->param() are also accessible to the template. It will also use HTML::FillInForm to fill in the search form with the previously selected parameters.
TEMPLATE
This is where the meat of the searching is performed. We create a SWISH::API object on the SWISHE_INDEX and create the query for the search based on the value of the 'keywords' parameter in CGI and any other EXTRA_PARAMETERS. The search is executed and if HIGHLIGHT is true we will use Text::Context to highlight it and then format the results data only showing PER_PAGE number of elements per page (if PER_PAGE is true). We will also show a list of pages that can be selected for navigating through the results. Then we will return to the show_search() method for displaying.
This run mode will fetch a remote page (with either a fullrelative, or absolute URL using the url Query param) and highlight the keywords used in the search on that page (the keywords Query param) using the <HIGHLIGHT_TAG>, HIGHLIGHT_CLASS or HIGHLIGHT_COLORS options. This run mode is best used in the links of the search results listing.
url
keywords
<a href="?rm=highlight_remote_page;url=http%3A%2F%2Fexample.com%2Fabout_us%2Findex.html;keywords=Us">about us</a>
This run mode will fetch a local page (only allowing relative files based in the DOCUMENT_ROOT config var and the path using the path Query param) and highlight the keywords used in the search on that page (the keywords Query param) using the <HIGHLIGHT_TAG>, HIGHLIGHT_CLASS or HIGHLIGHT_COLORS options. This run mode is best used in the links of the search results listing.
path
<a href="?rm=highlight_local_page;path=%2Fabout_us%2Findex.html;keywords=Us">about us</a>
This run mode will return an AJAX listing of words that should be suggested to the user for the words that they have typed so far. It uses the suggested_words method to actually choose what words to send back.
Most of the time you will not need to call the methods that are implemented in this module. But in cases where more customization is required than can be done in the templates, it might be prudent to override or extend these methods in your derived class.
We simply override and extend the CGI::Application new() to setup our defaults.
Here's were we setup our run modes. If you override this method, make sure you also call it in your base class
sub setup { my $self = shift; # do your thing ... $self->SUPER::setup(); }
This method is used to generate the query for swish-e from the $keywords (by default the 'keywords' CGI parameter), as well as any EXTRA_PROPERTIES that are present.
If you wish to generate your own search query then you should override this method. This is common if you need to have access/authorization control that will need to be taken into account for your search. (eg, anything under /protected can't be seen by someone not logged in).
Please see the swish-e documentation on the exact syntax for the query.
This object method is used by the AUTO_SUGGEST flag to return the words that should be suggested to the user after they have typed a $word. It returns an array reference of those words.
$word
By default it will just look for words in the AUTO_SUGGEST_FILE that begin with $word. If you need more performance or flexibility (eg, storing your words in a database and querying for them) you are encouraged to override this method.
There are several configuration parameters that you can set at any time (using param() in your cgiapp_init, or PARAMS hash in new()) before the run mode is called that will affect the search and display of the results. They are:
param()
This is the swishe index used for the searches. The default is 'data/swish-e.index'. You will probably override this every time.
This is a boolean indicating whether or not AJAX capabilities will be permitted.
Please see the CGI::Application::Search::Tutorial for more information on how to use the AJAX capabilities of this module.
The name of the search interface template. A default template is included within the module which will be used if you don't specify one. A more elaborate example is included in the distribution under the tmpl/ directory.
tmpl/
Please see "TEMPLATE USAGE" for more information on what variables are passed into your template.
This module uses CGI::Application::Plugin::AnyTemplate to allow flexibility in choosing which templating system to use for your search. This works especially well when you are trying to integrate the Search into an existing app with an existing templating structure.
This value is passed to the $self->template->config() method as the default_type. By default it is 'HTMLTemplate'. Please see CGI::Application::Plugin::AnyTemplate for more options.
$self->template->config()
default_type
If you want more control of configuration for the template the it would probably best be done by subclassing CGI::Application::Search and passing your desired params to $self->template->config.
$self->template->config
How many search result items to display per page. The default is 10.
Boolean indicating whether or not we should highlight the description given to the templates. The default is true.
The tag used to surround the highlighted context. The default is strong.
strong
The class attribute of the HIGHLIGHT_TAG HTML tag. This is useful when you want to dictacte the style through a CSS style sheet. If given, this value will override that of HIGHLIGHT_COLORS. By default it is '' (an empty string).
''
This is an array ref of acceptable HTML colors. If provided, it will highlight each matching word/phrase in an alternating style. For instance, if given 2 colors, every other highlighted phrase would be a different color. By default it is an empty array.
This is an array ref of extra properties used in the search. By default, the module will only use the value of the 'keywords' parameter coming in the CGI query. If anything is provided as an extra property then it will be added to the query used in the search.
An example: You have some of you pages designated into categories. You want the user to have the option of narrowing his results by category. You add the word 'category' to the 'EXTRA_PROPERTIES' list and then you add a 'category' form element that the user has the option of giving a value to your search form. If the user gives that element a value, then it will be seen and applied to the search. This will also only work if you have the 'category' element defined for your documents (see "SWISH-E Configuration" and 'MetaNames' in the swish-e.org SWISH-CONF documentation).
The default is an empty list.
This is the maximum length for the context (in chars) that is displayed for each search result. The default is 250 characters.
This is the number of words on either side of the searched for words and phrases (keywords) that will be displayed as part of the description. If this is 0, then the entire description will be displayed. The default is 0.
NOTE: This directive will cause search to perform some intensive computations to figure out the best piece of the description to display. These computations may prove to be too much for some servers (eg, a shared hosting environment).
If the AJAX flag is true, then this will allow the broswer to give suggestions to the user as they type. To use this, you must either use the AUTO_SUGGEST_FILE configuration option, or override the suggested_words() method.
suggested_words()
The name of the file where the suggested words are stored. These words should be in alphabetical order with one word per line.
A boolean indicating whether or not the results of the AUTO_SUGGEST_FILE should be cached in memory or not. This will save repeated file accesses when used in a persistant environment.
An integer count of the most suggestions to show the user at a time. This is useful when you don't want to overwhelm the end user and take over their screen with all of your helpful suggestions.
This is the root directory to use when looking for files when using the highlight_local_page run mode.
highlight_local_page
A default template is provided inside the module which will be used if you don't specify a template. This is useful for testing out the module and may also serve as a base for your template development.
Two more elaborate templates are provided as examples of how to use this module in the tmpl/ directory. Please feel free to copy and change them in what ever way you see fit. To help in giving you more information to display (or not display, depending on your preference) the following variables are available for your templates:
These variables are available throughout the templates and contain information related to the search as a whole:
ajax
A boolean indicating whether or not this search is an AJAX search or not. You can use this flag to exclude everything but your search results in your template.
The URL of this application. This is useful if you want to use the same templates in multiple applications, especially if you are using the AJAX capabilities since they require the URL to submit to.
searched
A boolean indicating whether or not a search was performed.
The exact string that was recieved by the server from the input named 'keywords'
elapsed_time
A string representing the number of seconds that the search took. This will be a floating point number with a precision of 3.
hits
This is an array of hashs (TMPL_LOOP in H::T) that contains one entry for each result returned (for the current page). Each entry contains the following keys:
The swishreccount property of the results as indexed by SWISH-E
swishreccount
The rank to the result as given by SWISH-E (the swishrank property)
swishrank
The swishtitle property of the results as indexed by SWISH-E
swishtitle
The swishdocpath property of the results as indexed by SWISH-E
swishdocpath
The swishlastmodified property of the results as indexed by SWISH-E and then formatted using Time::Piece::strftime with a format string of %B %d, %Y.
swishlastmodified
%B %d, %Y
The swishdocsize property of the results as indexed by SWISH-E and then formatted with Number::Format::format_bytes
swishdocsize
The swishdescription property of the results as indexed by SWISH-E. If HIGHLIGHT is true, then this description will also have search terms highlighted and will only be, at most, DESCRIPTION_LENGTH characters long.
swishdescription
pages
This is an array of hashes (TMPL_LOOP in H::T) that contains paging information for the results. It contains the following keys:
A boolean indicating whether or not this iteration is the current page or not.
The integer number of the page.
first_page
This is a boolean indicating whether or not this page of the results is the first or not.
last_page
This is a boolean indicating whether or not this page of the results is the last or not.
prev_page
The integer number of the previous page. Will be 0 if there is no previous page.
next_page
The integer number of the next page. Will be 0 if there is no next page.
start_num
This is the number of the first result on the current page
stop_num
This is the number of the last result on the current page
total_entries
The total number of results in their search, not the total number shown on the page.
If at any time prior to the execution of the 'perform_search' run mode you set the <$self-param('results')>> parameter a search will not be performed, but rather and empty set of results is returned. This is helpful when you decide in either cgiapp_init that this user does not have permissions to perform the desired search.
<$self-
You must use the StoreDescription setting in your Swish-e configuration file. If you don't you'll get an error when C::A::Search tries to retrieve a description for each hit.
Michael Peters <mpeters@plusthree.com>
Thanks to Plus Three, LP (http://www.plusthree.com) for sponsoring my work on this module.
To install CGI::Application::Search, copy and paste the appropriate command in to your terminal.
cpanm
cpanm CGI::Application::Search
CPAN shell
perl -MCPAN -e shell install CGI::Application::Search
For more information on module installation, please visit the detailed CPAN module installation guide.