The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

DiaColloDB::Relation - diachronic collocation db, relation API (abstract & utilities)

SYNOPSIS

 ##========================================================================
 ## PRELIMINARIES
 
 use DiaColloDB::Relation;
 
 ##========================================================================
 ## Constructors etc.
 
 $rel = $CLASS_OR_OBJECT->new(%args);
 
 ##========================================================================
 ## Relation API: creation
 
 $rel = $CLASS_OR_OBJECT->create($coldb, $tokdat_file, %opts);
 $rel = $CLASS_OR_OBJECT->union($coldb, \@pairs, %opts);
 
 ##========================================================================
 ## Relation API: profiling
 
 $mprf = $rel->profile($coldb, %opts);
 $mprf = $rel->extend($coldb, %opts);
 $mpdiff = $rel->compare($coldb, %opts);
 $mpdiff = $rel->diff($coldb, %opts);
 
 ##========================================================================
 ## Relation API: default
 
 \%slice2prf = $rel->subprofile1(\@tids, \%opts);
 \%slice2prf = $rel->subprofile2(\%slice2prf, %opts);
 \%slice2prf = $rel->subextend(\%slice2prf, \%opts);
 
 \%qinfo = $rel->qinfo($coldb, %opts);
 (\@q1strs,\@q2strs,\@qxstrs,\@fstrs) = $rel->qinfoData($coldb,%opts);

DESCRIPTION

DiaColloDB::Relation is a base class for low-level indices capable of returning raw frequency data suitable for constructing DiaColloDB::Profile::Multi objects. In addition to the API specification, the DiaColloDB::Relation package also provides several common utility methods used by native DiaColloDB index types.

Globals & Constants

Variable: @ISA

DiaColloDB::Relation inherits from DiaColloDB::Persistent.

Constructors etc.

new
 $rel = CLASS_OR_OBJECT->new(%args);

%args, object structure: nothing here, see subclass documentation for details.

Relation API: creation

create
 $rel = $CLASS_OR_OBJECT->create($coldb, $tokdat_file, %opts);

populates relation database from $tokdat_file, a tt-style text file with lines of the form:

 TID DATE       ##-- single token
 "\n"           ##-- blank line ~ EOS (hard co-occurrence boundary)

%opts: clobber %$rel

union
 $rel = $CLASS_OR_OBJECT->union($coldb, \@pairs, %opts);
  • merge multiple co-frequency indices into new object

  • @pairs : array of pairs ([$argrel,\@ti2u],...) of relation-objects $argrel and tuple-id maps \@ti2u for $argrel

  • %opts: clobber %$rel

  • should implicitly flush the new relation index

Relation API: profiling

profile
 $mprf = $rel->profile($coldb, %opts);

Get a relation-specific profile for selected items as a DiaColloDB::Profile::Multi object; called by DiaColloDB::profile().

%opts:

 ##-- selection parameters
 query => $query,           ##-- target request ATTR:REQ...
 date  => $date1,           ##-- string or array or range "MIN-MAX" (inclusive) : default=all
 ##
 ##-- aggregation parameters
 slice   => $slice,         ##-- date slice (default=1, 0 for global profile)
 groupby => $groupby,       ##-- string or array "ATTR1[:HAVING1] ...": default=$coldb->attrs; see groupby() method
 ##
 ##-- scoring and trimming parameters
 eps     => $eps,           ##-- smoothing constant (default=0)
 score   => $func,          ##-- scoring function (f|fm|lf|lfm|mi|ld) : default="f"
 kbest   => $k,             ##-- return only $k best collocates per date (slice) : default=-1:all
 cutoff  => $cutoff,        ##-- minimum score
 global  => $bool,          ##-- trim profiles globally (vs. locally for each date-slice?) (default=0)
 ##
 ##-- profiling and debugging parameters
 strings => $bool,          ##-- do/don't stringify (default=do)
 fill    => $bool,          ##-- if true, returned multi-profile will have null profiles inserted for missing slices
 onepass => $bool,          ##-- if true, use fast but incorrect 1-pass method (default=0; Cofreqs subclass only)

The default implementation

  • parses the request and extracts target tuple-ids,

  • calls $rel->subprofile1() to compute slice-wise joint frequency profiles (f12),

  • calls $rel->subprofile2() to compute independent collocate frequencies (f2), and finally

  • collects the result in a DiaColloDB::Profile::Multi object.

Default values for %opts should be set by a higher-level call, e.g. DiaColloDB::profile().

extend
 $mprf = $rel->extend($coldb, %opts);

Get independent f2 frequencies for $opts{slice2keys} as a DiaColloDB::Profile::Multi object; called by DiaColloDB::extend().

%opts: as for profile(), also:

 slice2keys => \%slice2keys, ##-- target f2-items by slice-label (REQUIRED)

Default implementation calls $rel->subextend().

compare
 $mpdiff = $rel->compare($coldb, %opts);

Get a relation-specific comparison profile for selected items as a DiaColloDB::Profile::MultiDiff object.

%opts:

 ##-- selection parameters
 (a|b)?query => $query,       ##-- target query as for parseRequest()
 (a|b)?date  => $date1,       ##-- string or array or range "MIN-MAX" (inclusive) : default=all
 ##
 ##-- aggregation parameters
 groupby      => $groupby,    ##-- string or array "ATTR1[:HAVING1] ...": default=$coldb->attrs; see groupby() method
 (a|b)?slice  => $slice,      ##-- date slice (default=1, 0 for global profile)
 ##
 ##-- scoring and trimming parameters
 eps     => $eps,           ##-- smoothing constant (default=0)
 score   => $func,          ##-- scoring function (f|fm|lf|lfm|mi|ld) : default="f"
 kbest   => $k,             ##-- return only $k best collocates per date (slice) : default=-1:all
 cutoff  => $cutoff,        ##-- minimum score
 global  => $bool,          ##-- trim profiles globally (vs. locally for each date-slice?) (default=0)
 diff    => $diff,          ##-- low-level score-diff operation (diff|adiff|sum|min|max|avg|havg); default='adiff'
 ##
 ##-- profiling and debugging parameters
 strings => $bool,          ##-- do/don't stringify (default=do)
 onepass => $bool,          ##-- if true, use fast but incorrect 1-pass profiling method (default=0)
 ##
 ##-- sublcass abstraction parameters
 _gbparse => $bool,         ##-- if true (default), 'groupby' clause will be parsed only once, using $coldb->groupby() method
 _abkeys  => \@abkeys,      ##-- additional key-suffixes KEY s.t. (KEY=>VAL) gets passed to profile() calls if e.g. (aKEY=>VAL) is in %opts

The default implementation just wraps the profile() method; default values for %opts should be set by higher-level call, e.g. DiaColloDB::compare().

diff
 $mpdiff = $rel->diff($coldb, %opts);

alias for compare()

Relation API: default

subprofile1
 \%slice2prf = $rel->subprofile1(\@tids,\%opts);

Native index API low-level first-pass profiling function for joint frequency acquisition (f12); default implementation just throws an error.

subprofile2
 \%slice2prf = $rel->subprofile2(\%slice2prf, %opts);

Native index API low-level second-pass profiling function for independent frequency acquisition (f2); default implementation just returns \%slice2prf, which is appropriate for relations which use a single-pass strategy to populate $prf->{f2} in their implementation of subprofile1().

subextend
 \%slice2prf = $rel->subextend(\%slice2prf,\%opts);

Native index API low-level profile-extension function for slice-wise independent frequency acquisition (f2). Default implementation throws an error.

qinfo
 \%qinfo = $rel->qinfo($coldb, %opts);

get query-info hash for profile administrivia (ddc kwic links). %opts: as for profile(), additionally:

 qreqs => \@areqs,      ##-- as returned by $coldb->parseRequest($opts{query})
 gbreq => \%groupby,    ##-- as returned by $coldb->groupby($opts{groupby})
qinfoData
 (\@q1strs,\@q2strs,\@qxstrs,\@fstrs) = $rel->qinfoData($coldb,%opts);

parses @opts{qw(qreqs gbreq)} into conditions on w1, w2 and metadata filters (for ddc linkup). call this from subclass qinfo() methods.

AUTHOR

Bryan Jurish <moocow@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2015-2016 by Bryan Jurish

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.2 or, at your option, any later version of Perl 5 you may have available.

SEE ALSO

DiaColloDB::Persistent(3pm), DiaColloDB::Relation::Cofreqs(3pm), DiaColloDB::Relation::Unigrams(3pm), DiaColloDB::Relation::TDF(3pm), DiaColloDB::Relation::DDC(3pm), DiaColloDB(3pm), perl(1), ...