##========================================================================
## POD DOCUMENTATION, auto-generated by podextract.perl

##========================================================================
## NAME
=pod

=head1 NAME

DiaColloDB::Corpus::Compiled - collocation db, source corpus (pre-compiled)

=cut

##========================================================================
## SYNOPSIS
=pod

=head1 SYNOPSIS

 ##========================================================================
 ## PRELIMINARIES
 
 use DiaColloDB::Corpus::Compiled;
 
 ##========================================================================
 ## Constructors etc.
 
 $corpus = $CLASS_OR_OBJECT->new(%args);
 
 ##========================================================================
 ## Persistent API
 
 @keys = $obj->headerKeys();
 @files = $obj->diskFiles();
 $bool = $obj->unlink(%opts);
 
 ##========================================================================
 ## Corpus API
 
 ##-- Corpus API: open/close
 $bool = $corpus->open([$dbdir], %opts);  ##-- compat;
 $bool = $corpus->close();
 
 ##-- Corpus API: iteration
 $nfiles = $corpus->size();
 $bool = $corpus->iok();
 $label = $corpus->ifile();
 $doc_or_undef = $corpus->idocument();
 
 ##========================================================================
 ## Compiled API
 
 $ccorpus = $CLASS_OR_OBJECT->create($src_corpus, %opts);
 $ccorpus = $CLASS_OR_OBJECT->union(\@sources, %opts);
 
 ##========================================================================
 ## Convenience Methods
 
 $bool = $corpus->opened();
 $bool = $corpus->flush();
 $corpus = $corpus->reopen(%opts);
 
 $dirname = $corpus->datadir();
 $bool = $corpus->truncate();
 $filters = $ccorpus->filters();
 

=cut

##========================================================================
## DESCRIPTION
=pod

=head1 DESCRIPTION

DiaColloDB::Corpus::Compiled is an intermediate abstraction layer
for storing pre-filtered corpus data in a format suitable for fast I/O.
It should not be necessaray for end users to use this class directly,
since the L<DiaColloDB::create()|DiaColloDB::compile/create> method should
implicitly create a (temporary) C<DiaColloDB::Corpus::Compiled> object
whenever required.

=cut

##----------------------------------------------------------------
## DESCRIPTION: DiaColloDB::Corpus::Compiled: Globals & Constants
=pod

=head2 Globals & Constants

=over 4

=item Variable: @ISA

C<DiaColloDB::Corpus::Compiled>
inherited from L<DiaColloDB::Corpus|DiaColloDB::Corpus>
and supports all L<DiaColloDB::Corpus|DiaColloDB::Corpus> methods.

=back

=cut

##----------------------------------------------------------------
## DESCRIPTION: DiaColloDB::Corpus::Compiled: Constructors etc.
=pod

=head2 Constructors etc.

=over 4

=item new

 $corpus = $CLASS_OR_OBJECT->new(%args);

%args, object structure:

   (
    ##-- NEW in DiaColloDB::Corpus::Compiled
    dbdir   => $dbdir,     ##-- data directory for compiled corpus
    flags   => $flags,     ##-- open mode flags (fcntl flags or perl-style; default='r')
    filters => \%filters,  ##-- corpus filters ( DiaColloDB::Corpus::Filters object or HASH-ref )
    njobs   => $njobs,     ##-- number of parallel worker jobs for create(); default=-1 (= nCores)
    temp    => $bool,      ##-- implicitly unlink() on exit?
    logThreads => $level   ##-- log-level for thread stuff (default='off')
    ##
    ##-- INHERITED from DiaColloDB::Corpus
    #files => \@files,      ##-- source files (OVERRIDE: unused)
    #dclass => $dclass,     ##-- DiaColloDB::Document subclass for loading (OVERRIDE forces 'DiaColloDB::Document::JSON')
    dopts  => \%opts,      ##-- options for $dclass->fromFile() (override default={})
    cur    => $i,          ##-- index of current file
    logOpen => $level,     ##-- log-level for open(); default='info'
   )

Implicitly calls calls the L<open()|open> method if the C<dbdir> property is defined.

=item DESTROY

Destructor implicitly calls the L<close()|close> method,
and may also implicitly call L<unlink()|unlink> if the C<temp> property
is true.


=back

=cut

##----------------------------------------------------------------
## DESCRIPTION: DiaColloDB::Corpus::Compiled: Persistent API
=pod

=head2 Persistent API

=over 4

=item headerKeys

 @keys = $obj->headerKeys();

Override filters out more object-specific keys.

=item diskFiles

 @files = $obj->diskFiles();

Returns disk storage files; override retuns singleton list
C<$obj-E<gt>{dbdir}>.

=item unlink

 $bool = $obj->unlink(%opts);

Removes all disk file(s) associated with the object.
Override accepts additional %opts:

 close => $bool,  ##-- mall $obj->close() before unlinking? (default=1)

=back

=cut

##----------------------------------------------------------------
## DESCRIPTION: DiaColloDB::Corpus::Compiled: Corpus API

##----------------------------------------------------------
## Corpus API: open/close
=pod

=head2 Corpus API: open/close

=over 4

=item open

 $bool = $corpus->open([$dbdir], %opts);  ##-- compat
 $bool = $corpus->open($dbdir,   %opts);  ##-- new

Opens compiled corpus directory C<$dbdir>,
which must be specified as either a simple scalar or a singleton
ARRAY-ref, or must already be defined as C<$corpus-E<gt>{dbdir}> or C<$opts{dbdir}>.

Superclass %opts accepted by L<DiaColloDB::Corpus|DiaColloDB::Corpus>:

 compiled => $bool, ##-- implicitly true here
 glob => $bool,     ##-- (ignored here) whether to glob arguments
 list => $bool,     ##-- (ignored here) whether arguments are file-lists

=item close

 $bool = $corpus->close();

Close currently opened corpus if any.
Override implicitly calls L<$corpus-E<gt>flush()|flush>
if C<$corpus> is opened in write-mode.

=back

=cut

##----------------------------------------------------------
## Corpus API: iteration
=pod

=head2 Corpus API: iteration

=over 4

=item size

 $nfiles = $corpus->size();

Returns total number of file(s) in the corpus (constant time).

=item iok

 $bool = $corpus->iok();

True if corpus file-iterator is valid.

=item ifile

 $label = $corpus->ifile();
 $label = $corpus->ifile($pos);

Get current iterator filename (first form),
or filename at index C<$pos> (second form).
Override always returns filenames of the form
C<"$corpus-E<gt>{dbdir}/$pos.json">.

=item idocument

 $doc_or_undef = $corpus->idocument();
 $doc_or_undef = $corpus->idocument($pos);

Gets current document (first form)
or document at index C<$pos> (second form).

=back

=cut

##----------------------------------------------------------------
## DESCRIPTION: DiaColloDB::Corpus::Compiled: Corpus::Compiled API
=pod

=head2 Corpus::Compiled API

=over 4

=item create

 $ccorpus = $CLASS->create($src_corpus,   %opts);
 $ccorpus = $ccorpus->create($src_corpus, %opts);

Compile or append a single C<$src_corpus> to the compiled corpus directory C<$opts{dbdir}>.
If specified C<%opts>, overrides C<%$ccorpus> properties.
Returns a (possibly new) DiaColloDB::Corpus::Compiled object $ccorpus.
Honors perl- or fcntl-style C<$opts{flags}> for append and truncate.

Parses all document file(s) from C<$src_corpus>, applies
the corpus content filters specified by the HASH-ref or
L<DiaColloDB::Corpus::Filters> object specified by C<$ccorpus-E<gt>{filters}>,
and saves the compiled data to the compiled corpus directory C<$ccorpus-E<gt>{dbdir}>.
If the L<threads|threads> module is available, compilation may
use multiple parallell threads as specified by the C<$DiaColloDB::NJOBS> variable;
see L<DiacolloDB::Utils::nJobs()|DiaColloDB::Utils/nJobs> for details.

=item union

 $ccorpus = $CLASS->union(\@sources, %opts);
 $ccorpus = $ccorpus->union(\@sources, %opts);

Merges pre-compiled corpora C<\@sources> to the output directory C<$opts{dbdir}>.
If specified C<%opts>, overrides C<%$ccorpus> properties.
Returns a (possibly new) DiaColloDB::Corpus::Compiled object $ccorpus
representing the union over C<@sources>.
Honors C<$ccorpus-E<gt>{flags}> for append and truncate.

Each $src in \@sources is either a DiaColloDB::Corpus::Compiled object or a simple scalar
(which is interpreteed as the C<dbdir> of a DiaColloDB::Corpus::Compiled object).
No content filters are applied, and output data files are created as
links to the input data-files from @sources (hard-links if possible, otherwise symbolic links).

=back

=cut


##----------------------------------------------------------------
## DESCRIPTION: DiaColloDB::Corpus::Compiled: Convenience Methods
=pod

=head2 Convenience Methods: disk files etc.

=over 4

=item datadir

 $dirname = $corpus->datadir();
 $dirname = $corpus->datadir($dir);

Wrapper for C<$corpus-E<gt>{dbdir}>.

=item truncate

 $bool = $corpus->truncate();

Removes all disk data (including header) and resets C<$corpus-E<gt>{size}> to 0 (zero).

=item filters

 $filters = $ccorpus->filters();

Return corpus content filters as a L<DiaColloDB::Corpus::Filters|DiaColloDB::Corpus::Filters> object.

=back

=cut

##----------------------------------------------------------------
## DESCRIPTION: DiaColloDB::Corpus::Compiled: Compiled API: open/close
=pod

=head2 Convenience Methods: open/close

=over 4

=item opened

 $bool = $corpus->opened();

Returns true iff $corpus is currently opened.

=item flush

 $bool = $corpus->flush();

Writes any pending corpus data (e.g. header) to disk.

=item reopen

 $corpus = $corpus->reopen(%opts);

Closes and re-opened corpus, e.g. with different C<flags>.

=back

=cut


##========================================================================
## END POD DOCUMENTATION, auto-generated by podextract.perl

##======================================================================
## Footer
##======================================================================
=pod

=head1 AUTHOR

Bryan Jurish E<lt>moocow@cpan.orgE<gt>

=head1 COPYRIGHT AND LICENSE

Copyright (C) 2015-2020 by Bryan Jurish

This package is free software; you can redistribute it and/or modify
it under the same terms as Perl itself, either Perl version 5.14.2 or,
at your option, any later version of Perl 5 you may have available.

=head1 SEE ALSO

L<dcdb-corpus-compile.per(1)|dcdb-corpus-compile.perl>,
L<dcdb-create.per(1)|dcdb-create.perl>,
L<DiaColloDB::Corpus::Filters(3pm)|DiaColloDB::Corpus::Filters>,
L<DiaColloDB::Corpus(3pm)|DiaColloDB::Corpus>,
L<DiaColloDB(3pm)|DiaColloDB>,
L<perl(1)|perl>,
...


=cut