DiaColloDB::Corpus::Compiled - collocation db, source corpus (pre-compiled)
##======================================================================== ## PRELIMINARIES use DiaColloDB::Corpus::Compiled; ##======================================================================== ## Constructors etc. $corpus = $CLASS_OR_OBJECT->new(%args); ##======================================================================== ## Persistent API @keys = $obj->headerKeys(); @files = $obj->diskFiles(); $bool = $obj->unlink(%opts); ##======================================================================== ## Corpus API ##-- Corpus API: open/close $bool = $corpus->open([$dbdir], %opts); ##-- compat; $bool = $corpus->close(); ##-- Corpus API: iteration $nfiles = $corpus->size(); $bool = $corpus->iok(); $label = $corpus->ifile(); $doc_or_undef = $corpus->idocument(); ##======================================================================== ## Compiled API $ccorpus = $CLASS_OR_OBJECT->create($src_corpus, %opts); $ccorpus = $CLASS_OR_OBJECT->union(\@sources, %opts); ##======================================================================== ## Convenience Methods $bool = $corpus->opened(); $bool = $corpus->flush(); $corpus = $corpus->reopen(%opts); $dirname = $corpus->datadir(); $bool = $corpus->truncate(); $filters = $ccorpus->filters();
DiaColloDB::Corpus::Compiled is an intermediate abstraction layer for storing pre-filtered corpus data in a format suitable for fast I/O. It should not be necessaray for end users to use this class directly, since the DiaColloDB::create() method should implicitly create a (temporary) DiaColloDB::Corpus::Compiled object whenever required.
DiaColloDB::Corpus::Compiled
DiaColloDB::Corpus::Compiled inherited from DiaColloDB::Corpus and supports all DiaColloDB::Corpus methods.
$corpus = $CLASS_OR_OBJECT->new(%args);
%args, object structure:
( ##-- NEW in DiaColloDB::Corpus::Compiled dbdir => $dbdir, ##-- data directory for compiled corpus flags => $flags, ##-- open mode flags (fcntl flags or perl-style; default='r') filters => \%filters, ##-- corpus filters ( DiaColloDB::Corpus::Filters object or HASH-ref ) njobs => $njobs, ##-- number of parallel worker jobs for create(); default=-1 (= nCores) temp => $bool, ##-- implicitly unlink() on exit? logThreads => $level ##-- log-level for thread stuff (default='off') ## ##-- INHERITED from DiaColloDB::Corpus #files => \@files, ##-- source files (OVERRIDE: unused) #dclass => $dclass, ##-- DiaColloDB::Document subclass for loading (OVERRIDE forces 'DiaColloDB::Document::JSON') dopts => \%opts, ##-- options for $dclass->fromFile() (override default={}) cur => $i, ##-- index of current file logOpen => $level, ##-- log-level for open(); default='info' )
Implicitly calls calls the open() method if the dbdir property is defined.
dbdir
Destructor implicitly calls the close() method, and may also implicitly call unlink() if the temp property is true.
temp
@keys = $obj->headerKeys();
Override filters out more object-specific keys.
@files = $obj->diskFiles();
Returns disk storage files; override retuns singleton list $obj->{dbdir}.
$obj->{dbdir}
$bool = $obj->unlink(%opts);
Removes all disk file(s) associated with the object. Override accepts additional %opts:
close => $bool, ##-- mall $obj->close() before unlinking? (default=1)
$bool = $corpus->open([$dbdir], %opts); ##-- compat $bool = $corpus->open($dbdir, %opts); ##-- new
Opens compiled corpus directory $dbdir, which must be specified as either a simple scalar or a singleton ARRAY-ref, or must already be defined as $corpus->{dbdir} or $opts{dbdir}.
$dbdir
$corpus->{dbdir}
$opts{dbdir}
Superclass %opts accepted by DiaColloDB::Corpus:
compiled => $bool, ##-- implicitly true here glob => $bool, ##-- (ignored here) whether to glob arguments list => $bool, ##-- (ignored here) whether arguments are file-lists
$bool = $corpus->close();
Close currently opened corpus if any. Override implicitly calls $corpus->flush() if $corpus is opened in write-mode.
$corpus
$nfiles = $corpus->size();
Returns total number of file(s) in the corpus (constant time).
$bool = $corpus->iok();
True if corpus file-iterator is valid.
$label = $corpus->ifile(); $label = $corpus->ifile($pos);
Get current iterator filename (first form), or filename at index $pos (second form). Override always returns filenames of the form "$corpus->{dbdir}/$pos.json".
$pos
"$corpus->{dbdir}/$pos.json"
$doc_or_undef = $corpus->idocument(); $doc_or_undef = $corpus->idocument($pos);
Gets current document (first form) or document at index $pos (second form).
$ccorpus = $CLASS->create($src_corpus, %opts); $ccorpus = $ccorpus->create($src_corpus, %opts);
Compile or append a single $src_corpus to the compiled corpus directory $opts{dbdir}. If specified %opts, overrides %$ccorpus properties. Returns a (possibly new) DiaColloDB::Corpus::Compiled object $ccorpus. Honors perl- or fcntl-style $opts{flags} for append and truncate.
$src_corpus
%opts
%$ccorpus
$opts{flags}
Parses all document file(s) from $src_corpus, applies the corpus content filters specified by the HASH-ref or DiaColloDB::Corpus::Filters object specified by $ccorpus->{filters}, and saves the compiled data to the compiled corpus directory $ccorpus->{dbdir}. If the threads module is available, compilation may use multiple parallell threads as specified by the $DiaColloDB::NJOBS variable; see DiacolloDB::Utils::nJobs() for details.
$ccorpus->{filters}
$ccorpus->{dbdir}
$DiaColloDB::NJOBS
$ccorpus = $CLASS->union(\@sources, %opts); $ccorpus = $ccorpus->union(\@sources, %opts);
Merges pre-compiled corpora \@sources to the output directory $opts{dbdir}. If specified %opts, overrides %$ccorpus properties. Returns a (possibly new) DiaColloDB::Corpus::Compiled object $ccorpus representing the union over @sources. Honors $ccorpus->{flags} for append and truncate.
\@sources
@sources
$ccorpus->{flags}
Each $src in \@sources is either a DiaColloDB::Corpus::Compiled object or a simple scalar (which is interpreteed as the dbdir of a DiaColloDB::Corpus::Compiled object). No content filters are applied, and output data files are created as links to the input data-files from @sources (hard-links if possible, otherwise symbolic links).
$dirname = $corpus->datadir(); $dirname = $corpus->datadir($dir);
Wrapper for $corpus->{dbdir}.
$bool = $corpus->truncate();
Removes all disk data (including header) and resets $corpus->{size} to 0 (zero).
$corpus->{size}
$filters = $ccorpus->filters();
Return corpus content filters as a DiaColloDB::Corpus::Filters object.
$bool = $corpus->opened();
Returns true iff $corpus is currently opened.
$bool = $corpus->flush();
Writes any pending corpus data (e.g. header) to disk.
$corpus = $corpus->reopen(%opts);
Closes and re-opened corpus, e.g. with different flags.
flags
Bryan Jurish <moocow@cpan.org>
Copyright (C) 2015-2020 by Bryan Jurish
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.2 or, at your option, any later version of Perl 5 you may have available.
dcdb-corpus-compile.per(1), dcdb-create.per(1), DiaColloDB::Corpus::Filters(3pm), DiaColloDB::Corpus(3pm), DiaColloDB(3pm), perl(1), ...
To install DiaColloDB, copy and paste the appropriate command in to your terminal.
cpanm
cpanm DiaColloDB
CPAN shell
perl -MCPAN -e shell install DiaColloDB
For more information on module installation, please visit the detailed CPAN module installation guide.