- SEE ALSO
- COPYRIGHT AND LICENSE
Treex::Core::DocumentReader - interface for all document readers
Document readers are a Treex concept how to load documents to be processed by Treex. The documents can be stored in files (in various formats) or read from
STDIN or retrieved from a socket etc.
To be implemented
These methods must be implemented in classes that consume this role.
Return next document (Treex::Core::Document).
Total number of documents that will be produced by this reader. If the number is unknown in advance,
undefshould be returned.
Is the document that was most recently returned by
$self-next_document()> supposed to be processed by this job? Job indices and document numbers are 1-based, so e.g. for
jobs = 5, jobindex = 3we want to load documents with numbers 3,8,13,18,...
jobs = 5, jobindex = 5we want to load documents with numbers 5,10,15,20,... i.e. those documents where
(doc_number-1) % jobs == (jobindex-1).
Returns a next document which should be processed by this job. If
jobindexis set, returns "modulo number of jobs". See
Total number of documents that will be produced by this reader for this job. It's computed based on
Start reading again from the first document. This implementation just sets the attribute
doc_numberto zero. You can add additional behavior using the Moose
Martin Popel <firstname.lastname@example.org>
COPYRIGHT AND LICENSE
Copyright © 2011 by Institute of Formal and Applied Linguistics, Charles University in Prague
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.