Treex::Block::Read::BaseReader - abstract ancestor for document readers
This class serves as an common ancestor for document readers, that have a parameter
from with a space or comma separated list of filenames to be loaded. It is designed to implement the Treex::Core::DocumentReader interface.
In derived classes you need to define the
next_document method, and you can use
- from (required)
space or comma separated list of filenames, or
An '@' directly in front of a file name causes this file to be interpreted as a file list, with one file name per line, e.g. '@filelist.txt' causes the reader to open 'filelist.txt' and read a list of files from it. File lists may be arbitrarily mixed with regular files in the parameter.
(If you use this method via API you can specify a string array reference or a Treex::Core::Files object.)
- file_stem (optional)
How to name the loaded documents. This attribute will be saved to the same-named attribute in documents and it will be used in document writers to decide where to save the files.
This method must be overridden in derived classes. (The implementation in this class just issues fatal error.)
returns the next filename (full path) to be loaded (from the list specified in the attribute
Returns a new empty document with pre-filled attributes
pathwhich are guessed based on
returns the last filename returned by
Is the document that will be returned by
next_documentsupposed to be processed by this job? This is relevant only in parallel processing, where each job has a different
Returns the number of documents that will be read by this reader. If
true, then the number of documents equals the number of files given in
from. Otherwise, this method returns
Martin Popel <email@example.com>
COPYRIGHT AND LICENSE
Copyright © 2011-2012 by Institute of Formal and Applied Linguistics, Charles University in Prague
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 232:
Non-ASCII character seen before =encoding in '©'. Assuming UTF-8