The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

DiaColloDB::Relation::Cofreqs - diachronic collocation db, profiling relation: native fixed-window co-frequency index

ALIASES

DiaColloDB::Relation::Cofreqs
DiaColloDB::Cofreqs

SYNOPSIS

 ##========================================================================
 ## PRELIMINARIES
 
 use DiaColloDB::Relation::Cofreqs;
 
 ##========================================================================
 ## Constructors etc.
 
 $cof = $CLASS_OR_OBJECT->new(%args);
 
 ##========================================================================
 ## I/O: open/close
 
 $cof_or_undef = $cof->open($base,$flags);
 $cof_or_undef = $cof->close();
 $bool = $cof->opened();
 
 ##========================================================================
 ## I/O: header
 
 @keys = $cof->headerKeys();
 $bool = $cof->loadHeaderData($hdr);
 
 ##========================================================================
 ## I/O: text
 
 $cof  = $cof->loadTextFh($fh,%opts)
 $cof  = $cof->loadTextFile_create($fh,%opts);
 $bool = $cof->saveTextFh($fh,%opts);
 
 ##========================================================================
 ## Relation API: create
 
 $rel = $CLASS_OR_OBJECT->create($coldb,$tokdat_file,%opts);
 
 ##========================================================================
 ## Relation API: union
 
 $cof = CLASS_OR_OBJECT->union($coldb, \@pairs, %opts);
 
 ##========================================================================
 ## Utilities: lookup
 
 $f = $cof->f1( @ids);
 $f12 = $cof->f12($id1,$id2);
 
 ##========================================================================
 ## Relation API: default: profiling
 
 $prf = $cof->subprofile(\@xids, %opts);
 
 ##========================================================================
 ## Relation API: default: query info
 
 \%qinfo = $rel->qinfo($coldb, %opts);
 

DESCRIPTION

DiaColloDB::Relation::Cofreqs is a DiaColloDB::Relation subclass for native indices over collocation frequencies within a fixed-length window of context words using a pair of DiaColloDB::PackedFile objects for low-level index data.

Globals & Constants

Variable: @ISA

DiaColloDB::Relation::Cofreqs inherits from DiaColloDB::Relation.

Constructors etc.

new
 $cof = CLASS_OR_OBJECT->new(%args);

%args, object structure:

 ##-- user options
 class    => $class,      ##-- optional, useful for debugging from header file
 base     => $basename,   ##-- file basename (default=undef:none); use files "${base}.dba1", "${base}.dba2", "${base}.hdr"
 flags    => $flags,      ##-- fcntl flags or open-mode (default='r')
 perms    => $perms,      ##-- creation permissions (default=(0666 &~umask))
 dmax     => $dmax,       ##-- maximum distance for co-occurrences (default=5)
 fmin     => $fmin,       ##-- minimum pair frequency (default=0)
 pack_i   => $pack_i,     ##-- pack-template for IDs (default='N')
 pack_f   => $pack_f,     ##-- pack-template for IDs (default='N')
 keeptmp  => $bool,       ##-- keep temporary files? (default=false)
 ##
 ##-- size info (after open() or load())
 size1    => $size1,      ##-- == $r1->size()
 size2    => $size2,      ##-- == $r2->size()
 ##
 ##-- low-level data
 r1 => $r1,               ##-- pf: [$end2,$f1] @ $i1
 r2 => $r2,               ##-- pf: [$i2,$f12]  @ end2($i1-1)..(end2($i1)-1)
 N  => $N,                ##-- sum($f12)
DESTROY

Destructor implicitly calls close().

I/O: open/close

open
 $cof_or_undef = $cof->open($base,$flags);
 $cof_or_undef = $cof->open($base)
 $cof_or_undef = $cof->open()

Opens underlying index files.

close
 $cof_or_undef = $cof->close();

Closes underlying index files. Implicitly calls flush() if index is opened for writing.

opened
 $bool = $cof->opened();

Returns true iff index is opened.

I/O: header

See also DiaColloDB::Persistent.

headerKeys
 @keys = $cof->headerKeys();

keys to save as header

loadHeaderData
 $bool = $cof->loadHeaderData($hdr);

instantiates header data from $hdr; overrides DiaColloDB::Persistent implementation.

I/O: text

loadTextFh
 $cof = $cof->loadTextFh($fh,%opts)
  • loads from text file as saved by saveTextFh()

  • supports semi-sorted input: input fh must be sorted by $i1, and all $i2 for each $i1 must be adjacent (i.e. no intervening $j1 != $i1)

  • supports multiple lines for pairs ($i1,$i2) provided the above conditions hold

  • supports loading of $cof->{N} from single-value lines

  • %opts: clobber %$cof

loadTextFile_create
 $cof = $cof->loadTextFile_create($fh,%opts);

old, slightly faster version of loadTextFile() which doesn't support {N}, semi-sorted input, or multiple ($i1,$i2) entries; not useable by union() method.

saveTextFh
 $bool = $cof->saveTextFh($fh,%opts);

save from text file with initial line "N" and subsequent lines of the form:

 FREQ ID1 ID2

%opts:

 i2s => \&CODE,   ##-- code-ref for formatting indices; called as $s=CODE($i)

Relation API: create

create
 $rel = $CLASS_OR_OBJECT->create($coldb,$tokdat_file,%opts);

populates current index from $tokdat_file, a tt-style text file containing 1 token-id perl line with optional blank lines.

%opts: clobber %$rel, also:

 size=>$size,  ##-- set initial size (number of types)

Relation API: union

union
 $cof = CLASS_OR_OBJECT->union($coldb, \@pairs, %opts);

merge multiple unigram indices from \@pairs into new object. @pairs is an array of pairs ([$cof,\@xi2u],...) of unigram-objects $cof and tuple-id maps \@xi2u for $cof; \@xi2u may also be a mapping object supporting a toArray() method. implicitly flushes the new index.

%opts: clobber %$cof

Utilities: lookup

f1
 $f = $cof->f1( @ids);
 $f = $cof->f1(\@ids);

get total marginal unigram frequency (index must be opened)

f12
 $f12 = $cof->f12($id1,$id2);

return joint frequency for pair ($id1,$id2)

currently UNUSED

Relation API: default: profiling

subprofile
 $prf = $cof->subprofile(\@xids, %opts);

get co-frequency profile for @xids (index must be opened). %opts:

 groupby => \&gbsub,  ##-- key-extractor $key2_or_undef = $gbsub-E<gt>($i2)

Relation API: default: query info

qinfo
 \%qinfo = $rel->qinfo($coldb, %opts);

get query-info hash for profile administrivia (ddc hit links).

%opts: as for profile(), additionally:

 qreqs => \@qreqs,      ##-- as returned by $coldb->parseRequest($opts{query})
 gbreq => \%groupby,    ##-- as returned by $coldb->groupby($opts{groupby})

AUTHOR

Bryan Jurish <moocow@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2015 by Bryan Jurish

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.2 or, at your option, any later version of Perl 5 you may have available.

SEE ALSO

dcdb-create.per(1), dcdb-query.perl(1), dcdb-info.perl(1), dcdb-export.perl(1), dcdb-dump.perl(1), DiaColloDB(3pm), perl(1), ...