The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

DiaColloDB::Relation::Unigrams - diachronic collocation db, profiling relation: native unigram index

ALIASES

DiaColloDB::Relation::Unigrams
DiaColloDB::Unigrams

SYNOPSIS

 ##========================================================================
 ## PRELIMINARIES
 
 use DiaColloDB::Relation::Unigrams;
 
 ##========================================================================
 ## Constructors etc.
 
 $ug = $CLASS_OR_OBJECT->new(%args);
 
 ##========================================================================
 ## API: disk usage
 
 @files = $obj->diskFiles();

 ##========================================================================
 ## I/O: open/close
 
 $ug_or_undef = $ug->open($base,$flags);
 $ug_or_undef = $ug->close();
 $bool = $ug->opened();
 
 ##========================================================================
 ## I/O: header
 
 @keys = $ug->headerKeys();
 $bool = $ug->loadHeaderData($hdr);
 
 ##========================================================================
 ## I/O: text
 
 $ug = $ug->loadTextFh($fh,%opts)
 $ug = $ug->saveTextFh($fh,%opts);
 
 ##========================================================================
 ## Relation API: creation
 
 $ug = $CLASS_OR_OBJECT->create($coldb,$tokdat_file,%opts);
 $ug = $CLASS_OR_OBJECT->union($coldb, \@pairs, %opts);
 
 ##========================================================================
 ## Relation API: default
 
 \%slice2prf = $rel->subprofile1(\@tids,\%opts);
 \%qinfo = $rel->qinfo($coldb, %opts);

DESCRIPTION

DiaColloDB::Relation::Unigrams is a DiaColloDB::Relation subclass for native indices over attribute-tuple unigrams using the DiaColloDB::PackedFile API for low-level index data.

Globals & Constants

Variable: @ISA

DiaColloDB::Relation::Unigrams inherits from DiaColloDB::Relation.

Constructors etc.

new
 $ug = $CLASS_OR_OBJECT->new(%args);

%args, object structure:

 ##-- user options
 base     => $basename,   ##-- file basename (default=undef:none); use files "${base}.dba1", "${base}.dba2", "${base}.hdr"
 flags    => $flags,      ##-- fcntl flags or open-mode (default='r')
 perms    => $perms,      ##-- creation permissions (default=(0666 &~umask))
 pack_i   => $pack_i,     ##-- pack-template for IDs (default='N')
 pack_f   => $pack_f,     ##-- pack-template for frequencies (default='N')
 pack_d   => $pack_d,     ##-- pack-tempalte for dates (default='n')
 keeptmp  => $bool,       ##-- keep temporary files? (default=false)
 logCompat => $level,     ##-- log-level for compatibility warnings (default='warn')
 ##
 ##-- size info (after open() or load())
 size1    => $size1,      ##-- == $r1->size()
 size2    => $size2,      ##-- == $r2->size()
 ##
 ##-- low-level data
 r1 => $r1,               ##-- pf: [$end2]      @ $i1                           : constant (logical index)
 r2 => $r2,               ##-- pf: [$d1,$f1]*   @ end2($i1-1)..(end2($i1+1)-1)  : sorted by $d1 for each $i1
 N  => $N,                ##-- sum($f1)
 version => $version,     ##-- file version, for compatibility checks
DESTROY

destructor implicitly calls close().

API: disk usage

diskFiles
 @files = $obj->diskFiles();

returns disk storage files, used by du() and timestamp()

I/O: open/close

open
 $ug_or_undef = $ug->open($base,$flags);
 $ug_or_undef = $ug->open($base);
 $ug_or_undef = $ug->open();

Opens underlying index files.

close
 $ug_or_undef = $ug->close();

Closes underlying index files. Implicitly calls flush() if index is opened for writing.

opened
 $bool = $ug->opened();

Returns true iff index is opened.

I/O: header

headerKeys
 @keys = $ug->headerKeys();

keys to save as header

loadHeaderData
 $bool = $ug->loadHeaderData($hdr);

instantiates header data from $hdr; overrides DiaColloDB::Persistent implementation.

I/O: text

loadTextFh
 $ug = $ug->loadTextFh($fh,%opts);
  • loads from text file as saved by saveTextFh().

  • input fh must be sorted numerically by ($i1,$d1).

  • supports multiple lines for pairs ($i1,$d1) provided the above condition(s) hold.

  • supports loading of $ug->{N} from single-component lines.

  • %opts: clobber %$ug

saveTextFh
 $bool = $ug->saveTextFh($fh,%opts);

save as text with lines of the form:

 N                 ##-- 1 field : N
 FREQ ID1 DATE     ##-- 3 fields: unigram frequency for (ID1,DATE)

%opts:

 i2s => \&CODE,    ##-- code-ref for formatting indices; called as $s=CODE($i)

Relation API: creation

create
 $ug = $CLASS_OR_OBJECT->create($coldb,$tokdat_file,%opts);

populates unigram database from $tokdat_file, a tt-style text file with lines of the form:

 TID DATE       ##-- single token
 "\n"           ##-- blank line ~ EOS (hard co-occurrence boundary)

%opts: clobber %$ug

union
 $ug = $CLASS_OR_OBJECT->union($coldb, \@pairs, %opts);

merge multiple unigram indices into new object. @pairs is an array of pairs ([$argug,\@ti2u],...) of unigram relations $argug and tuple-id maps \@ti2u for $argug. implicitly flushes the new index.

%opts: clobber %$ug

Relation API: default

subprofile1
 \%slice2prf = $ug->subprofile1(\@tids,\%opts);

Get slice-wise unigram profile(s) for tuple-IDs @tids. $ug must be opened. %opts: as for DiaColloDB::Relation::subprofile1().

subextend
 \%slice2prf = $rel->subextend(\%slice2prf,\%opts);

Populate independent collocate frequencies in %slice2prf values. Override just returns a new empty DiaColloDB::Profile::Multi object.

qinfo
 \%qinfo = $rel->qinfo($coldb, %opts);

get query-info hash for profile administrivia (ddc hit links) %opts: as for profile(), additionally:

 qreqs => \@qreqs,      ##-- as returned by $coldb->parseRequest($opts{query})
 gbreq => \%groupby,    ##-- as returned by $coldb->groupby($opts{groupby})

AUTHOR

Bryan Jurish <moocow@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2015-2020 by Bryan Jurish

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.2 or, at your option, any later version of Perl 5 you may have available.

SEE ALSO

DiaColloDB::Relation(3pm), DiaColloDB::Relation::Cofreqs(3pm), DiaColloDB::Relation::TDF(3pm), DiaColloDB::Relation::DDC(3pm), DiaColloDB(3pm), perl(1), ...