NAME
DiaColloDB::Relation::DDC - diachronic collocation db, profiling relation: ddc client
ALIASES
SYNOPSIS
##========================================================================
## PRELIMINARIES
##========================================================================
## Constructors etc.
$ddc
=
$CLASS_OR_OBJECT
->new(
%args
);
$rel_or_undef
=
$CLASS_OR_OBJECT
->fromDB(
$coldb
,
%opts
);
##========================================================================
## Relation API: creation
$rel
=
$CLASS_OR_OBJECT
->create(
$coldb
,
$tokdat_file
,
%opts
);
$rel
=
$CLASS_OR_OBJECT
->union(
$coldb
, \
@pairs
,
%opts
);
##-- SKETCHY!
##========================================================================
## Relation API: profiling
$mprf
=
$rel
->profile(
$coldb
,
%opts
);
$mprf
=
$rel
->extend(
$coldb
,
%opts
);
$mpdiff
=
$rel
->compare(
$coldb
,
%opts
);
##========================================================================
## Utils: profiling
$dclient
=
$rel
->ddcClient(
%opts
);
$results
=
$rel
->ddcQuery(
$coldb
,
$query_or_str
,
%opts
);
$fcoef
=
$rel
->fcoef(
$cquery
);
$qcount
=
$rel
->countQuery(
$coldb
,\
%opts
);
DESCRIPTION
DiaColloDB::Relation::DDC is a DiaColloDB::Relation subclass using the DDC::Client::Distributed module for acquiring fine-grained collocation frequency profile data from a remote DDC server. It is generally much slower than the native index types DiaColloDB::Relation::Cofreqs and DiaColloDB::Relation::Unigrams, but is much more flexible regarding selection of corpus subsets, collocation targets, and aggregation parameters.
Globals & Constants
- Variable: @ISA
-
DiaColloDB::Relation::DDC inherits from DiaColloDB::Relation.
Constructors etc.
- new
-
$ddc
= CLASS_OR_OBJECT->new(
%args
);
%args, object structure:
##-- persistent options
base
=>
$basename
,
##-- configuration header basename (default=undef)
##
##-- ddc client options
ddcServer
=>
"$server:$port"
,
##-- ddc server (required; default=$coldb->{ddcServer} via fromDB() method)
ddcTimeout
=>
$timeout
,
##-- ddc timeout; default=300
ddcLimit
=>
$limit
,
##-- default limit for ddc queries (default=-1)
ddcSample
=>
$sample
,
##-- default sample size for ddc queries (default=-1:all)
dmax
=>
$maxDistance
,
##-- default distance for near() queries (default=5; 1=immediate adjacency; ~ ddc CQNear.Dist+1)
cfmin
=>
$minFreq
,
##-- default minimum frequency for count() queries (default=2)
##
##-- logging options
logTrunc
=>
$nchars
,
##-- max length of query strings to log (default=256)
##
##-- low-level data
dclient
=>
$ddcClient
,
##-- a DDC::Client::Distributed object
- fromDB
-
$rel_or_undef
=
$CLASS_OR_OBJECT
->fromDB(
$coldb
,
%opts
);
default implementation clobbers $rel->headerKeys() from %$coldb, %opts
Relation API: creation
- create
-
$rel
=
$CLASS_OR_OBJECT
->create(
$coldb
,
$tokdat_file
,
%opts
);
nothing really interesting happens here; default just calls fromDB() and saveHeaderFile().
- union
-
$rel
=
$CLASS_OR_OBJECT
->union(
$coldb
, \
@pairs
,
%opts
);
merge multiple ddc relations into new object. @pairs is an array of ARRAY-refs ([$ddc,...],...) whose initial elements are the DiaColloDB::Relation::DDC objects to be merged.
%opts: clobber %$rel
default implementation just calls create(), but should probably create a list of ddc servers to query, which isn't supported yet.
TODO: union() method without a shared DDC server should probably create some kind of temporary server-list and use the DiaColloDB::Client::list routines for querying multiple back-end DDC servers.
Relation API: profiling
- profile
-
$mprf
=
$rel
->profile(
$coldb
,
%opts
);
get a relation profile for selected items as a DiaColloDB::Profile::Multi object. %opts: as for DiaColloDB::Relation::profile(), also:
##-- sampling options
limit
=>
$limit
,
##-- maximum number of items to return from ddc; sets $qconfig{limit} (default: query "#limit[N]" or $rel->{ddcLimit})
sample
=>
$sample
,
##-- ddc sample size; sets $qconfig{qcount} Sample property (default: query "#sample[N]" or $rel->{ddcSample})
cfmin
=>
$cfmin
,
##-- minimum subcorpus frequency for returned items (default: query "#fmin[N]" or $rel->{cfmin})
dmax
=>
$dmax
,
##-- maxmimum distance for implicit near() queries (default: query "#dmax[N]" or $rel->{dmax})
- extend
-
$mprf
=
$rel
->extend(
$coldb
,
%opts
);
Get independent f2 frequencies for
$opts{slice2keys}
as a DiaColloDB::Profile::Multi object. May generate multiple DDC requests for large extension sets to avoid exceeding the underlying DDC server's request length limit (CHost.m_maxReceiveBytes = DDC_STATC_BUFLEN ?= 4096). - compare
-
$mprf
=
$rel
->compare(
$coldb
,
%opts
);
Get a comparison profile for selected items as a DiaColloDB::Profile::MultiDiff object.
%opts: as for DiaColloDB::Relation::compare(), also:
##-- sampling options
(a|b)?
limit
=>
$limit
,
##-- maximum number of items to return from ddc; sets $qconfig{limit} (default: query "#limit[N]" or $rel->{ddcLimit})
(a|b)?
sample
=>
$sample
,
##-- ddc sample size; sets $qconfig{qcount} Sample property (default: query "#sample[N]" or $rel->{ddcSample})
(a|b)?
cfmin
=>
$cfmin
,
##-- minimum subcorpus frequency for returned items (default: query "#fmin[N]" or $rel->{cfmin})
(a|b)?
dmax
=>
$dmax
,
##-- maxmimum distance for implicit near() queries (default: query "#dmax[N]" or $rel->{dmax})
Utils: profiling
- ddcClient
-
$dclient
=
$rel
->ddcClient(
%opts
);
returns cached $rel->{dclient} if defined, otherwise creates and caches a new client. chokes if ddcServer is not defined
%opts: clobber %{$rel->{dclient}}
- ddcQuery
-
$results
=
$rel
->ddcQuery(
$coldb
,
$query_or_str
,
%opts
);
Returns decoded JSON results for DDC client query $query_or_str, optionally logging the query and tracking errors.
%opts:
logas
=>
$prefix
,
##-- log prefix (default: 'ddcQuery()')
loglevel
=>
$level
,
##-- log level (default=$coldb-E<gt>{logProfile})
limit
=>
$limit
,
##-- set result client limit (default: current client limit, or -1 for limit=E<gt>undef)
- fcoef
-
$fcoef
=
$rel
->fcoef(
$cquery
);
Get expected frequency coefficient for the DDC::XS::CQuery object $cquery. Used to estimate total independent marginal frequencies (f1,f2,N) for profile construction. The default implementation should provide reasonable guesses for common query types.
- countQuery
-
$qcount
=
$rel
->countQuery(
$coldb
,\
%opts
);
creates a DDC::XS::CQCount object for profile() options %opts. sets following keys in %opts:
gbexprs
=>
$gbexprs
,
##-- groupby expressions (DDC::Any::CQCountKeyExprList)
gbrestr
=>
$gbrestr
,
##-- groupby item2 restrictions (DDC::Any::CQWith conjunction of token expressions)
gbfilters
=> \
@gbfilters
,
##-- groupby filter restrictions (ARRAY-ref of DDC::Any::CQFilter)
gbtitles
=> \
@gbtitles
,
##-- groupby column titles (ARRAY-ref of strings)
limit
=>
$limit
,
##-- hit return limit for ddc query
dslo
=>
$dslo
,
##-- minimum date-slice, from @opts{qw(date slice fill)}
dshi
=>
$dshi
,
##-- maximum date-slice, from @opts{qw(date slice fill)}
dlo
=>
$dlo
,
##-- minimum date request (ddc)
dhi
=>
$dhi
,
##-- maximum date request (ddc)
fcoef
=>
$fcoef
,
##-- frequency coefficient, parsed from "#coef[N]", auto-generated for near() queries
qtemplate
=>
$qtemplate
,
##-- query template for ddc hit link-up
qcount1
=>
$qcount1
,
##-- count-query for f1 acquisition
fcoef1
=>
$fcoef1
,
##-- f1 coefficient for qcount1
- collocantCountQuery
-
$qcount1
=
$rel
->collocantCountQuery(
$qcount
,
$matchId
)
maps count-queries returned by countQuery() to ${matchId}-item queries (default $matchid=1).
- itemCountNode
-
$nod2_or_undef
=
$rel
->itemCountNode(
$nod
,
$matchId
)
Guts for collocantCountQuery(): maps countQuery() nodes to ${matchId}-query nodes only; simplifies by removing extraneous CQBinOp, CQNear, and CQSeq nodes.
- collocateCountQueries
-
\
@qcounts2
=
$rel
->collocateCountQueries(
$qcount
,\
%slice2prf
,\
%opts
)
Gets a list of DDC::Any::CQCount object(s) for f2-acquisition given profile() options %opts, which are as for countQuery(), DiaColloDB::Relation::DDC::profile(), etc. Sets following keys in %opts:
needCountsByToken
=>
$bool
,
##-- see needCountsByToken()
If $opts{onepass} option is set, generates a single large batch-query as for DiaColloDB <= v0.12.016, otherwise uses an overgenerating MSPA ("most specific projected attribute") query strategy which may return more than 1 query.
- needCountsByToken
-
$bool
=
$CLASS_OR_OBJECT
->needCountsByToken(
$qcount
)
Returns true iff $qcount groups by any token attributes for match-id =2.
AUTHOR
Bryan Jurish <moocow@cpan.org>
COPYRIGHT AND LICENSE
Copyright (C) 2015-2020 by Bryan Jurish
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.2 or, at your option, any later version of Perl 5 you may have available.
SEE ALSO
DiaColloDB::Relation(3pm), DiaColloDB::Relation::Cofreqs(3pm), DiaColloDB::Relation::Unigrams(3pm), DiaColloDB::Relation::TDF(3pm), DiaColloDB(3pm), perl(1), ...