DiaColloDB::Corpus::Filters - collocation db, source corpus content filters
##======================================================================== ## PRELIMINARIES use DiaColloDB::Corpus::Filters; ##======================================================================== ## Methods $filters = $CLASS_OR_OBJECT->new(%opts); $filters = $CLASS_OR_OBJECT->null(); $filters = $filters->clear(); $bool = $filters1->equal($filters2); \%name2obj = $filters->compile(); \%line2undef = $coldb->loadListFile($filename_or_undef);
DiaColloDB::Corpus::Filters is a class representing corpus content filters (e.g. stopword lists and regular expressions) used by DiaColloDB::Corpus::Compiled and implicitly by the DiaColloDB::create()|DiaColloDB/create method as called by the top-level command-line utility dcdb-corpus-create.perl(1).
DiaColloDB::create()|DiaColloDB/create
DiaColloDB::Corpus::Filters inherits from DiaColloDB::Persistent. It also uses Exporter for compatibility with older versions of the DiaColloDB distribution in which the package-global default variables resided directly in the DiaColloDB package itself.
(formerly defined in DiaColloDB.pm)
Don't use qr// for regex defaults, because Storable doesn't like pre-compiled Regexps.
qr//
Default positive PoS-regex for document parsing. Default = q/^(?:N|TRUNC|VV|ADJ)/.
q/^(?:N|TRUNC|VV|ADJ)/
Default negative PoS-regex for document parsing. Default = undef (none).
Default positive word regex for document parsing. Default = q/[[:alpha:]]/
q/[[:alpha:]]/
Default negative word regex for document parsing. Default = q/[\.]/.
q/[\.]/
Default positive lemma regex for document parsing. Default = undef (none).
Default negative lemma regex for document parsing. Default = undef (none).
$filters = $CLASS_OR_OBJECT->new(%opts);
Returns a new DiaColloDB::Corpus::Filters object, which is a simple HASH-ref wrapping %opts:
%opts
##-- part-of-speech filters pgood => $re, ##-- PoS whitelist regex pgoodfile => $file, ##-- PoS whitelist filename pbad => $re, ##-- PoS blacklist regex pbadfile => $file, ##-- PoS blacklist filename ##-- word surface text filters wgood => $re, ##-- word whitelist regex wgoodfile => $file, ##-- word whitelist filename wbad => $re, ##-- word blacklist regex wbadfile => $file, ##-- word blacklkist filename (= "stopword list") ##-- lemma filters lgood => $re, ##-- lemma whitelist regex lgoodfile => $file, ##-- lemma whitelist filename lbad => $re, ##-- lemma blacklist regex lbadfile => $file, ##-- lemma blacklist filename
See "Defaults" for the default values.
$filters = $CLASS_OR_OBJECT->null();
Returns a new DiaColloDB::Corpus::Filters object representing a "null-filter", i.e. with all filter properties undefined.
$filters = $filters->clear();
Deletes all filter properties (white- and blacklist regexes and filenames) from the $filters object.
$filters
$bool = $filters->isnull();
Returns true iff $filters does not define any supported filter properties at all (i.e. application of $filters would be a no-op).
$bool = $filters1->equal($filters2); $bool = $CLASS->equal($filters1,$filters2)
Returns true iff filter object operands define the all and only the same supported filter properties with identical values.
\%name2obj = $filters->compile(); \%name2obj = $CLASS->compile(\%filters);
Returns a HASH-ref of compiled filter regexes and (stop|go)-hashes of the form
${NAME} => $REGEXP, ${NAME}file => \%HASHREF,
\%line2undef = $coldb->loadListFile($filename_or_undef);
Low-level utility method used to load (stop|go)-list files.
Bryan Jurish <moocow@cpan.org>
Copyright (C) 2015-2020 by Bryan Jurish
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.2 or, at your option, any later version of Perl 5 you may have available.
dcdb-corpus-compile.per(1), DiaColloDB::Corpus::Compiled(3pm), DiaColloDB(3pm), perl(1), ...
To install DiaColloDB, copy and paste the appropriate command in to your terminal.
cpanm
cpanm DiaColloDB
CPAN shell
perl -MCPAN -e shell install DiaColloDB
For more information on module installation, please visit the detailed CPAN module installation guide.