Dezi::Aggregator - document aggregation base class
package MyAggregator; use Moose; extends 'Dezi::Aggregator'; sub get_doc { my ($self, $url) = @_; # do something to create a Dezi::Indexer::Doc object from $url return $doc; } sub crawl { my ($self, @where) = @_; foreach my $place (@where) { # do something to search $place for docs to pass to get_doc() } } 1;
Dezi::Aggregator is a base class that defines the basic API for writing an aggregator. Only two methods are required: get_doc() and crawl(). See the SYNOPSIS for the prototypes.
See Dezi::Aggregator::FS and Dezi::Aggregator::Spider for examples of aggregators that crawl the filesystem and web, respectively.
Set object flags per Dezi::Class API. These are also accessors, and include:
This will set the parser() value in swish_filter() based on the MIME type of the doc_class() object.
A Dezi::Indexer object.
The name of the Dezi::Indexer::Doc-derived class to use in get_doc(). Default is Dezi::Indexer::Doc.
A SWISH::Filter object. If not passed in new() one is created for you.
Dry run mode, just prints info on stderr but does not build index.
Value should be a CODE ref. This is passed through to set_filter() internally at BUILD() time. If you need to adjust the filter after the Aggregator object is created, use set_filter().
Value should be a Unix timestamp (epoch seconds). Default is undef. If set, aggregators should skip files that have a modification time older than the timestamp.
You may get/set the ok_if_newer_than value with the ok_if_newer_than() attribute method, but use set_ok_if_newer_than() to include validation of the supplied timestamp value.
Get/set a progress object. The default used in the examples/swish3 script is Term::ProgressBar. If set, it will be incremented just like count() is.
Returns the Dezi::Indexer::Config object from the Indexer being used. This is a read-only method (accessor not mutator).
Returns the total number of doc_class() objects returned by get_doc().
Override this method in your subclass. It does the aggregation, and passes each doc_class() object from get_doc() to indexer->process().
Override this method in your subclass. Should return a doc_class() object.
Passes the content() of the doc_class_object through SWISH::Filter and transforms it to something index-able. Returns the doc_class_object, filtered.
NOTE: This method should be called by all aggregators after get_doc() and before passing to the indexer().
See the SWISH::Filter documentation.
Use code_ref as the doc_class filter. This method called by BUILD() if filter param set in constructor.
doc_class
filter
Set the ok_if_newer_than attribute. timestamp should be a Unix epoch value.
Peter Karman, <perl@peknet.com>
Please report any bugs or feature requests to bug-swish-prog at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Dezi-App. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
bug-swish-prog at rt.cpan.org
You can find documentation for this module with the perldoc command.
perldoc Dezi
You can also look for information at:
Mailing list
http://lists.swish-e.org/listinfo/users
RT: CPAN's request tracker
http://rt.cpan.org/NoAuth/Bugs.html?Dist=Dezi-App
AnnoCPAN: Annotated CPAN documentation
http://annocpan.org/dist/Dezi-App
CPAN Ratings
http://cpanratings.perl.org/d/Dezi-App
Search CPAN
http://search.cpan.org/dist/Dezi-App/
Copyright 2008-2018 by Peter Karman
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
http://swish-e.org/
To install Dezi::App, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Dezi::App
CPAN shell
perl -MCPAN -e shell install Dezi::App
For more information on module installation, please visit the detailed CPAN module installation guide.