Treex::Block::Read::BaseReader - abstract ancestor for document readers
version 2.20151216
This class serves as a common ancestor for document readers that have the parameter from with a space or comma separated list of filenames to be loaded. It is designed to implement the Treex::Core::DocumentReader interface.
from
In derived classes you need to define the next_document method, and you can use next_filename and new_document methods.
next_document
next_filename
new_document
space or comma separated list of filenames, or - for STDIN
-
An '@' directly in front of a file name causes this file to be interpreted as a file list, with one file name per line, e.g. '@filelist.txt' causes the reader to open 'filelist.txt' and read a list of files from it. File lists may be arbitrarily mixed with regular files in the parameter.
Similarly, you can use ! for wildcard expansion, e.g. treex -Len Read::Treex from='!dir??/file*.txt'. The single quotes are needed for two reasons. First, to prevent bash from interpreting the wildcard characters. Second, to prevent bash from interpreting the exclamation mark as history expansion.
treex -Len Read::Treex from='!dir??/file*.txt'
The @filelist and !wildcard conventions are used in several tools, e.g. 7z or javac.
(If you use this method via API you can specify a string array reference or a Treex::Core::Files object.)
How to name the loaded documents. This attribute will be saved to the same-named attribute in documents and it will be used in document writers to decide where to save the files.
This method must be overridden in derived classes. (The implementation in this class just issues fatal error.)
returns the next filename (full path) to be loaded (from the list specified in the attribute from)
Returns a new empty document with pre-filled attributes loaded_from, file_stem, file_number and path which are guessed based on current_filename.
loaded_from
file_stem
file_number
path
current_filename
returns the last filename returned by next_filename
Is the document that will be returned by next_document supposed to be processed by this job? This is relevant only in parallel processing, where each job has a different $jobnumber assigned.
$jobnumber
Returns the number of documents that will be read by this reader. If is_one_doc_per_file returns true, then the number of documents equals the number of files given in from. Otherwise, this method returns undef.
is_one_doc_per_file
true
undef
Treex::Block::Read::BaseTextReader Treex::Block::Read::Text
Martin Popel <popel@ufal.mff.cuni.cz>
Copyright © 2011-2012 by Institute of Formal and Applied Linguistics, Charles University in Prague
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
To install Treex::Core, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Treex::Core
CPAN shell
perl -MCPAN -e shell install Treex::Core
For more information on module installation, please visit the detailed CPAN module installation guide.