DiaColloDB::Profile::Diff - diachronic collocation db, diff profiles
##======================================================================== ## PRELIMINARIES use DiaColloDB::Profile::Diff; ##======================================================================== ## Constructors etc. $prf = $CLASS_OR_OBJECT->new(%args); $dprf2 = $dprf->clone(); ##======================================================================== ## Basic Access ($prf1,$prf2) = $dprf->operands(); $bool = $dprf->empty(); ##======================================================================== ## I/O: JSON $obj = $CLASS_OR_OBJECT->loadJsonData( $data,%opts); ##======================================================================== ## I/O: Text undef = $CLASS_OR_OBJECT->saveTextHeader($fh, hlabel=>$hlabel, titles=>\@titles); $bool = $prf->saveTextFh($fh, %opts); ##======================================================================== ## I/O: HTML $bool = $prf->saveHtmlFile($filename_or_handle, %opts); ##======================================================================== ## Compilation $dprf = $dprf->populate(); $dprf = $dprf->compile($func,%opts); $dprf = $dprf->uncompile(); $opname = $CLASS_OR_OBJECT->diffop($opNameOrAlias); $opsub = $CLASS_OR_OBJECT->diffsub($opNameOrAlias); $how = $CLASS_OR_OBJECT->diffpretrim($opNameOrAlias); $key = $CLASS_OR_OBJECT->diffkbest($opNameOrAlias); $diff = diffop_diff($ascore,$bscore); $diff = diffop_sum($ascore,$bscore); $diff = diffop_min($ascore,$bscore); $diff = diffop_max($ascore,$bscore); $diff = diffop_avg($ascore,$bscore); $diff = diffop_havg($ascore,$bscore); $diff = diffop_gavg($ascore,$bscore); $diff = diffop_lavg($ascore,$bscore); ##======================================================================== ## Trimming \@keys = $dprf->which(%opts); $dprf = $dprf->trim(%opts); ($pa,$pb) = $CLASS_OR_OJBECT->pretrim($pa,$pb,%opts); ##======================================================================== ## Stringification $dprf = $dprf->stringify( $obj); ##======================================================================== ## Binary operations $dprf = $dprf->_add($dprf2,%opts);
DiaColloDB::Profile::Diff is a DiaColloDB::Profile subclass class for representing low-level collocate frequency comparison data for a single date-slice as arising from the comparison of two DiaColloDB::Profile objects.
DiaColloDB::Profile::Diff inherits from DiaColloDB::Profile.
Canonical diff-operation names keyed by alias.
$prf = $CLASS_OR_OBJECT->new(%args); $prf = $CLASS_OR_OBJECT->new($prf1,$prf2,%args)
%args, object structure:
##-- DiaColloDB::Profile::Diff prf1 => $prf1, ##-- 1st operand prf2 => $prf2, ##-- 2nd operand diff => $diff, ##-- low-level score-diff binary operation (default='adiff') ##-- DiaColloDB::Profile keys label => $label, ##-- string label (used by Multi; undef for none(default)) #N => $N, ##-- OVERRIDE:unused: total marginal relation frequency #f1 => $f1, ##-- OVERRIDE:unused: total marginal frequency of target word(s) #f2 => \%f2, ##-- OVERRIDE:unused: total marginal frequency of collocates: ($i2=>$f2, ...) #f12 => \%f12, ##-- OVERRIDE:unused: collocation frequencies, %f12 = ($i2=>$f12, ...) ## eps => $eps, ##-- smoothing constant (default=0: no smoothing) score => $func, ##-- selected scoring function ('f12', 'mi', or 'ld') mi => \%mi12, ##-- DIFFERENCE: score: mutual information * logFreq a la Wortprofil; requires compile_mi() ld => \%ld12, ##-- DIFFERENCE: score: log-dice a la Wortprofil; requires compile_ld() fm => \%fm12, ##-- DIFFERENCE: score: frequency per million; requires compile_fm()
The diff option selects the function to be used to to compute final scores from operand profiles. The default value is 'adiff'. Currently known values are:
diff
adiff # $score=$a-$b # aliases=qw(absolute-difference abs-difference abs-diff adiff adifference a-) ; select=kbesta diff # $score=$a-$b # aliases=qw(difference diff d minus -) sum # $score=$a+$b # aliases=qw(sum add plus +) min # $score=min($a,$b) # aliases=qw(minimum min <) max # $score=max($a,$b) # aliases=qw(maximum max >) avg # $score=avg($a,$b) # aliases=qw(average avg mean) havg # $score~=harmonic_avg($a,$b) # aliases=qw(harmonic-average harmonic-mean havg hmean ha h) gavg # $score~=geometric_avg($a,$b) # aliases=qw(geometric-average geometric-mean gavg gmean ga g) lavg # $score~=log_avg($a,$b) # aliases=qw(logarithmic-average logarithmic-mean log-average log-mean lavg lmean la l)
To avoid singularities resulting from sparse data, the havg and gavg operations actually compute the arithmetic average of the harmonic (rsp. geometric) mean of and the raw arithmetic mean; e.g.
havg
gavg
score_havg($a,$b) = (($a<0 || $b<0 ? 0 : (2*$a*$b)/($a+$b) ##-- harmonic mean + ($a+$b)/2 ##-- arithmetic mean )/2 ##-- average of harmonic- and arithmetic-means
The default diff operation is adiff, which selects those items with the greatest absolute differences among the (pre-trimmed) k-best items in its operand profiles. The sum and avg operations return equivalent rankings, but may assign undesirably high score values for non-uniform operand values (e.g. avg(0,8)=avg(4,4)=4, but only the latter configuration indicates similar collocation behavior in the operand profiles). The havg, gavg, and lavg operations attempt to address this shortcoming by penalizing non-uniform score-pairs, and tend to return similar rankings in the range [$a:$b].
adiff
sum
avg
avg(0,8)=avg(4,4)=4
lavg
$dprf2 = $dprf->clone(); $dprf2 = $dprf->clone($keep_compiled);
clones %$dprf; if $keep_score is true, compiled data is cloned too.
($prf1,$prf2) = $dprf->operands();
get operand profiles.
$bool = $dprf->empty();
returns true iff both operands are empty
$obj = $CLASS_OR_OBJECT->loadJsonData( $data,%opts);
guts for loadJsonString(), loadJsonFile()
See also DiaColloDB::Persistent.
undef = $CLASS_OR_OBJECT->saveTextHeader($fh, hlabel=>$hlabel, titles=>\@titles);
print column title header for text output.
$bool = $prf->saveTextFh($fh, %opts);
save flat TAB-separated text, format:
Na Nb F1a F1b F2a F2b F12a F12b SCOREa SCOREb SCOREdiff LABEL ITEM2...
%opts:
label => $label, ##-- override $prf->{label} (used by Profile::Multi), no tab-separators required format => $fmt, ##-- printf score formatting (default="%.4f") header => $bool, ##-- include header-row? (default=1) hlabel => $hlabel, ##-- prefix header item-cells with $hlabel (used by Profile::MultiDiff)
$bool = $prf->saveHtmlFile($filename_or_handle, %opts);
Save flat HTML table data with rows of the form
SCOREa SCOREb DIFF PREFIX? ITEM2...
If verbose option is specified and true, saved table has the form
verbose
Na Nb F1a F1b F2a F2b F12a F12b SCOREa SCOREb DIFF PREFIX? ITEM2...
Options %opts:
table => $bool, ##-- include <table>..</table> ? (default=1) body => $bool, ##-- include <html><body>..</html></body> ? (default=1) header => $bool, ##-- include header-row? (default=1) verbose => $bool, ##-- include verbose output? (default=0) hlabel => $hlabel, ##-- prefix header item-cells with $hlabel (used by Profile::Multi), no '<th>..</th>' required label => $label, ##-- prefix item-cells with $label (used by Profile::Multi), no '<td>..</td>' required format => $fmt, ##-- printf score formatting (default="%.4f")
$dprf = $dprf->populate(); $dprf = $dprf->populate($prf1,$prf2);
populates diff-profile by subtracting $prf2 scores from $prf1.
$dprf = $dprf->compile($func,%opts);
compile for score-function $func, one of qw(f fm mi ld); default='f'.
$dprf = $dprf->uncompile();
un-compiles all scores for $dprf
$opname = $dprf->diffop(); $opname = $CLASS_OR_OBJECT->diffop($opNameOrAlias);
Returns canonical diff operation-name for $opNameOrAlias.
$opNameOrAlias
\&FUNC = $dprf->diffsub(); \&FUNC = $CLASS_OR_OBJECT->diffsub($opNameOrAlias);
Returns low-level binary diff operation for diff-operation $opNameOrAlias (default=$dprf->{diff}).
$dprf->{diff}
$how = $dprf->diffpretrim() $how = $CLASS_OR_OBJECT->diffpretrim($opNameOrAlias)
Returns whether and how a diff operation $opNameOrAlias should pre-trim operand profiles. Returned value is one of:
'restrict' # intersect defined collocates (min,avg,havg,gavg) 'kbest' # union of k-best collocates (diff,adiff,max) 0 # don't pre-trim at all (everythiing else)
$selector = $dprf->diffkbest(); $selector = $CLASS_OR_OBJECT->diffkbest($opNameOrAlias);
Returns 'kbest' selector appropriate for which() or trim() methods.
$diff = diffop_diff($ascore,$bscore)
Low-level diff-operation subs.
$dprf = $dprf->trim(%opts);
trims profile and operands; %opts:
kbest => $kbest, ##-- retain only $kbest items (by score value) kbesta => $kbesta, ##-- retain only $kbest items (by score absolute value) cutoff => $cutoff, ##-- retain only items with $prf->{$prf->{score}}{$item} >= $cutoff keep => $keep, ##-- retain keys @$keep (ARRAY) or keys(%$keep) (HASH) drop => $drop, ##-- drop keys @$drop (ARRAY) or keys(%$drop) (HASH)
($pa,$pb) = $CLASS_OR_OBJECT->pretrim($pa,$pb,%opts);
Perform pre-trimming on aligned profile pair ($pa,$pb) in the manner indicated by $CLASS_OR_OBJECT->diffpretrim($opts{diff}).
$dprf = $dprf->stringify( $obj); $dprf = $dprf->stringify(\@key2str) $dprf = $dprf->stringify(\&key2str) $dprf = $dprf->stringify(\%key2str)
stringifies profile and operands (destructive) via $obj->i2s($key2), $key2str->($i2) or $key2str->{$i2}.
$dprf = $dprf->_add($dprf2,%opts);
adds $dprf2 operatnd frequency data to $dprf operands (destructive); implicitly un-compiles $dprf. %opts:
N => $bool, ##-- whether to add N values (default:true) f1 => $bool, ##-- whether to add f1 values (default:true)
Bryan Jurish <moocow@cpan.org>
Copyright (C) 2015-2016 by Bryan Jurish
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.2 or, at your option, any later version of Perl 5 you may have available.
DiaColloDB::Profile::MultiDiff(3pm), DiaColloDB::Profile(3pm), DiaColloDB(3pm), perl(1), ...
To install DiaColloDB, copy and paste the appropriate command in to your terminal.
cpanm
cpanm DiaColloDB
CPAN shell
perl -MCPAN -e shell install DiaColloDB
For more information on module installation, please visit the detailed CPAN module installation guide.