DiaColloDB::Relation::Unigrams - diachronic collocation db, profiling relation: native unigram index
##======================================================================== ## PRELIMINARIES use DiaColloDB::Relation::Unigrams; ##======================================================================== ## Constructors etc. $ug = $CLASS_OR_OBJECT->new(%args); ##======================================================================== ## API: disk usage @files = $obj->diskFiles(); ##======================================================================== ## I/O: open/close $ug_or_undef = $ug->open($base,$flags); $ug_or_undef = $ug->close(); $bool = $ug->opened(); ##======================================================================== ## I/O: header @keys = $ug->headerKeys(); $bool = $ug->loadHeaderData($hdr); ##======================================================================== ## I/O: text $ug = $ug->loadTextFh($fh,%opts) $ug = $ug->saveTextFh($fh,%opts); ##======================================================================== ## Relation API: creation $ug = $CLASS_OR_OBJECT->create($coldb,$tokdat_file,%opts); $ug = $CLASS_OR_OBJECT->union($coldb, \@pairs, %opts); ##======================================================================== ## Relation API: default \%slice2prf = $rel->subprofile1(\@tids,\%opts); \%qinfo = $rel->qinfo($coldb, %opts);
DiaColloDB::Relation::Unigrams is a DiaColloDB::Relation subclass for native indices over attribute-tuple unigrams using the DiaColloDB::PackedFile API for low-level index data.
DiaColloDB::Relation::Unigrams inherits from DiaColloDB::Relation.
$ug = $CLASS_OR_OBJECT->new(%args);
%args, object structure:
##-- user options base => $basename, ##-- file basename (default=undef:none); use files "${base}.dba1", "${base}.dba2", "${base}.hdr" flags => $flags, ##-- fcntl flags or open-mode (default='r') perms => $perms, ##-- creation permissions (default=(0666 &~umask)) pack_i => $pack_i, ##-- pack-template for IDs (default='N') pack_f => $pack_f, ##-- pack-template for frequencies (default='N') pack_d => $pack_d, ##-- pack-tempalte for dates (default='n') keeptmp => $bool, ##-- keep temporary files? (default=false) logCompat => $level, ##-- log-level for compatibility warnings (default='warn') ## ##-- size info (after open() or load()) size1 => $size1, ##-- == $r1->size() size2 => $size2, ##-- == $r2->size() ## ##-- low-level data r1 => $r1, ##-- pf: [$end2] @ $i1 : constant (logical index) r2 => $r2, ##-- pf: [$d1,$f1]* @ end2($i1-1)..(end2($i1+1)-1) : sorted by $d1 for each $i1 N => $N, ##-- sum($f1) version => $version, ##-- file version, for compatibility checks
destructor implicitly calls close().
@files = $obj->diskFiles();
returns disk storage files, used by du() and timestamp()
$ug_or_undef = $ug->open($base,$flags); $ug_or_undef = $ug->open($base); $ug_or_undef = $ug->open();
Opens underlying index files.
$ug_or_undef = $ug->close();
Closes underlying index files. Implicitly calls flush() if index is opened for writing.
$bool = $ug->opened();
Returns true iff index is opened.
@keys = $ug->headerKeys();
keys to save as header
$bool = $ug->loadHeaderData($hdr);
instantiates header data from $hdr; overrides DiaColloDB::Persistent implementation.
$ug = $ug->loadTextFh($fh,%opts);
loads from text file as saved by saveTextFh().
input fh must be sorted numerically by ($i1,$d1).
($i1,$d1)
supports multiple lines for pairs ($i1,$d1) provided the above condition(s) hold.
supports loading of $ug->{N} from single-component lines.
$ug->{N}
%opts: clobber %$ug
$bool = $ug->saveTextFh($fh,%opts);
save as text with lines of the form:
N ##-- 1 field : N FREQ ID1 DATE ##-- 3 fields: unigram frequency for (ID1,DATE)
%opts:
i2s => \&CODE, ##-- code-ref for formatting indices; called as $s=CODE($i)
$ug = $CLASS_OR_OBJECT->create($coldb,$tokdat_file,%opts);
populates unigram database from $tokdat_file, a tt-style text file with lines of the form:
TID DATE ##-- single token "\n" ##-- blank line ~ EOS (hard co-occurrence boundary)
$ug = $CLASS_OR_OBJECT->union($coldb, \@pairs, %opts);
merge multiple unigram indices into new object. @pairs is an array of pairs ([$argug,\@ti2u],...) of unigram relations $argug and tuple-id maps \@ti2u for $argug. implicitly flushes the new index.
@pairs
([$argug,\@ti2u],...)
$argug
\@ti2u
\%slice2prf = $ug->subprofile1(\@tids,\%opts);
Get slice-wise unigram profile(s) for tuple-IDs @tids. $ug must be opened. %opts: as for DiaColloDB::Relation::subprofile1().
@tids
$ug
\%slice2prf = $rel->subextend(\%slice2prf,\%opts);
Populate independent collocate frequencies in %slice2prf values. Override just returns a new empty DiaColloDB::Profile::Multi object.
%slice2prf
\%qinfo = $rel->qinfo($coldb, %opts);
get query-info hash for profile administrivia (ddc hit links) %opts: as for profile(), additionally:
qreqs => \@qreqs, ##-- as returned by $coldb->parseRequest($opts{query}) gbreq => \%groupby, ##-- as returned by $coldb->groupby($opts{groupby})
Bryan Jurish <moocow@cpan.org>
Copyright (C) 2015-2020 by Bryan Jurish
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.2 or, at your option, any later version of Perl 5 you may have available.
DiaColloDB::Relation(3pm), DiaColloDB::Relation::Cofreqs(3pm), DiaColloDB::Relation::TDF(3pm), DiaColloDB::Relation::DDC(3pm), DiaColloDB(3pm), perl(1), ...
To install DiaColloDB, copy and paste the appropriate command in to your terminal.
cpanm
cpanm DiaColloDB
CPAN shell
perl -MCPAN -e shell install DiaColloDB
For more information on module installation, please visit the detailed CPAN module installation guide.