The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

DiaColloDB::Relation::DDC - diachronic collocation db, profiling relation: ddc client

ALIASES

DiaColloDB::Relation::DDC
DiaColloDB::DDC

SYNOPSIS

 ##========================================================================
 ## PRELIMINARIES
 
 use DiaColloDB::Relation::DDC;
 
 ##========================================================================
 ## Constructors etc.
 
 $ddc = $CLASS_OR_OBJECT->new(%args);
 $rel_or_undef = $CLASS_OR_OBJECT->fromDB($coldb,%opts);
 
 ##========================================================================
 ## Relation API: creation
 
 $rel = $CLASS_OR_OBJECT->create($coldb,$tokdat_file,%opts);
 $rel = $CLASS_OR_OBJECT->union($coldb, \@pairs, %opts);      ##-- SKETCHY!
 
 ##========================================================================
 ## Relation API: profiling
 
 $mprf   = $rel->profile($coldb, %opts);
 $mprf   = $rel->extend($coldb, %opts);
 $mpdiff = $rel->compare($coldb, %opts);
 
 ##========================================================================
 ## Utils: profiling
 
 $dclient = $rel->ddcClient(%opts);
 $results = $rel->ddcQuery($coldb, $query_or_str, %opts);
 $fcoef = $rel->fcoef($cquery);
 $qcount = $rel->countQuery($coldb,\%opts);
 

DESCRIPTION

DiaColloDB::Relation::DDC is a DiaColloDB::Relation subclass using the DDC::Client::Distributed module for acquiring fine-grained collocation frequency profile data from a remote DDC server. It is generally much slower than the native index types DiaColloDB::Relation::Cofreqs and DiaColloDB::Relation::Unigrams, but is much more flexible regarding selection of corpus subsets, collocation targets, and aggregation parameters.

Globals & Constants

Variable: @ISA

DiaColloDB::Relation::DDC inherits from DiaColloDB::Relation.

Constructors etc.

new
 $ddc = CLASS_OR_OBJECT->new(%args);

%args, object structure:

 ##-- persistent options
 base => $basename,               ##-- configuration header basename (default=undef)
 ##
 ##-- ddc client options
 ddcServer => "$server:$port",    ##-- ddc server (required; default=$coldb->{ddcServer} via fromDB() method)
 ddcTimeout => $timeout,          ##-- ddc timeout; default=300
 ddcLimit   => $limit,            ##-- default limit for ddc queries (default=-1)
 ddcSample  => $sample,           ##-- default sample size for ddc queries (default=-1:all)
 dmax       => $maxDistance,      ##-- default distance for near() queries (default=5; 1=immediate adjacency; ~ ddc CQNear.Dist+1)
 cfmin      => $minFreq,          ##-- default minimum frequency for count() queries (default=2)
 ##
 ##-- low-level data
 dclient   => $ddcClient,         ##-- a DDC::Client::Distributed object
fromDB
 $rel_or_undef = $CLASS_OR_OBJECT->fromDB($coldb,%opts);

default implementation clobbers $rel->headerKeys() from %$coldb, %opts

Relation API: creation

create
 $rel = $CLASS_OR_OBJECT->create($coldb,$tokdat_file,%opts);

nothing really interesting happens here; default just calls fromDB() and saveHeaderFile().

union
 $rel = $CLASS_OR_OBJECT->union($coldb, \@pairs, %opts);

merge multiple ddc relations into new object. @pairs is an array of ARRAY-refs ([$ddc,...],...) whose initial elements are the DiaColloDB::Relation::DDC objects to be merged.

%opts: clobber %$rel

default implementation just calls create(), but should probably create a list of ddc servers to query, which isn't supported yet.

TODO: union() method without a shared DDC server should probably create some kind of temporary server-list and use the DiaColloDB::Client::list routines for querying multiple back-end DDC servers.

Relation API: profiling

profile
 $mprf = $rel->profile($coldb, %opts);

get a relation profile for selected items as a DiaColloDB::Profile::Multi object. %opts: as for DiaColloDB::Relation::profile(), also:

 ##-- sampling options
 limit => $limit,       ##-- maximum number of items to return from ddc; sets $qconfig{limit} (default: query "#limit[N]" or $rel->{ddcLimit})
 sample => $sample,     ##-- ddc sample size; sets $qconfig{qcount} Sample property (default: query "#sample[N]" or $rel->{ddcSample})
 cfmin => $cfmin,       ##-- minimum subcorpus frequency for returned items (default: query "#fmin[N]" or $rel->{cfmin})
 dmax  => $dmax,        ##-- maxmimum distance for implicit near() queries (default: query "#dmax[N]" or $rel->{dmax})
extend
 $mprf = $rel->extend($coldb, %opts);

Get independent f2 frequencies for $opts{slice2keys} as a DiaColloDB::Profile::Multi object. Override generates a large approximate DDC batch-query and filters results. May fail for large extension sets if the DDC server's request length limit (CHost.m_maxReceiveBytes = DDC_STATC_BUFLEN ?= 4096) is exceeded.

compare
 $mprf = $rel->compare($coldb, %opts);

Get a comparison profile for selected items as a DiaColloDB::Profile::MultiDiff object.

%opts: as for DiaColloDB::Relation::compare(), also:

 ##-- sampling options
 (a|b)?limit => $limit,       ##-- maximum number of items to return from ddc; sets $qconfig{limit} (default: query "#limit[N]" or $rel->{ddcLimit})
 (a|b)?sample => $sample,     ##-- ddc sample size; sets $qconfig{qcount} Sample property (default: query "#sample[N]" or $rel->{ddcSample})
 (a|b)?cfmin => $cfmin,       ##-- minimum subcorpus frequency for returned items (default: query "#fmin[N]" or $rel->{cfmin})
 (a|b)?dmax  => $dmax,        ##-- maxmimum distance for implicit near() queries (default: query "#dmax[N]" or $rel->{dmax})

Utils: profiling

ddcClient
 $dclient = $rel->ddcClient(%opts);

returns cached $rel->{dclient} if defined, otherwise creates and caches a new client. chokes if ddcServer is not defined

%opts: clobber %{$rel->{dclient}}

ddcQuery
 $results = $rel->ddcQuery($coldb, $query_or_str, %opts);

Returns decoded JSON results for DDC client query $query_or_str, optionally logging the query and tracking errors.

%opts:

 logas => $prefix,   ##-- log prefix (default: 'ddcQuery()')
 loglevel => $level, ##-- log level (default=$coldb-E<gt>{logProfile})
 limit => $limit,    ##-- set result client limit (default: current client limit, or -1 for limit=E<gt>undef)
fcoef
 $fcoef = $rel->fcoef($cquery);

Get expected frequency coefficient for the DDC::XS::CQuery object $cquery. Used to estimate total independent marginal frequencies (f1,f2,N) for profile construction. The default implementation should provide reasonable guesses for common query types.

countQuery
 $qcount = $rel->countQuery($coldb,\%opts);

creates a DDC::XS::CQCount object for profile() options %opts. sets following keys in %opts:

 limit  => $limit,         ##-- hit return limit for ddc query
 dslo   => $dslo,          ##-- minimum date-slice, from @opts{qw(date slice fill)}
 dshi   => $dshi,          ##-- maximum date-slice, from @opts{qw(date slice fill)}
 dlo    => $dlo,           ##-- minimum date request (ddc)
 dhi    => $dhi,           ##-- maximum date request (ddc)
 fcoef  => $fcoef,         ##-- frequency coefficient, parsed from "#coef[N]", auto-generated if not set
 qtemplate => $qtemplate,  ##-- query template for ddc hit link-up

AUTHOR

Bryan Jurish <moocow@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2015-2016 by Bryan Jurish

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.2 or, at your option, any later version of Perl 5 you may have available.

SEE ALSO

DiaColloDB::Relation(3pm), DiaColloDB::Relation::Cofreqs(3pm), DiaColloDB::Relation::Unigrams(3pm), DiaColloDB::Relation::TDF(3pm), DiaColloDB(3pm), perl(1), ...