The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

DiaColloDB::Profile::Diff - diachronic collocation db, diff profiles

SYNOPSIS

 ##========================================================================
 ## PRELIMINARIES
 
 use DiaColloDB::Profile::Diff;
 
 ##========================================================================
 ## Constructors etc.
 
 $prf   = $CLASS_OR_OBJECT->new(%args);
 $dprf2 = $dprf->clone();
 
 ##========================================================================
 ## Basic Access
 
 ($prf1,$prf2) = $dprf->operands();
 $bool = $dprf->empty();
 
 ##========================================================================
 ## I/O: JSON
 
 $obj = $CLASS_OR_OBJECT->loadJsonData( $data,%opts);
 
 ##========================================================================
 ## I/O: Text
 
 undef = $CLASS_OR_OBJECT->saveTextHeader($fh, hlabel=>$hlabel, titles=>\@titles);
 $bool = $prf->saveTextFh($fh, %opts);
 
 ##========================================================================
 ## I/O: HTML
 
 $bool = $prf->saveHtmlFile($filename_or_handle, %opts);
 
 ##========================================================================
 ## Compilation
 
 $dprf = $dprf->populate();
 $dprf = $dprf->compile($func,%opts);
 $dprf = $dprf->uncompile();
 
 $opname = $CLASS_OR_OBJECT->diffop($opNameOrAlias);
 $opsub  = $CLASS_OR_OBJECT->diffsub($opNameOrAlias);
 $how    = $CLASS_OR_OBJECT->diffpretrim($opNameOrAlias);
 $key    = $CLASS_OR_OBJECT->diffkbest($opNameOrAlias);
 
 $diff   = diffop_diff($ascore,$bscore);
 $diff   = diffop_sum($ascore,$bscore);
 $diff   = diffop_min($ascore,$bscore);
 $diff   = diffop_max($ascore,$bscore);
 $diff   = diffop_avg($ascore,$bscore);
 $diff   = diffop_havg($ascore,$bscore);
 $diff   = diffop_gavg($ascore,$bscore);
 $diff   = diffop_lavg($ascore,$bscore);
 
 ##========================================================================
 ## Trimming
 
 \@keys    = $dprf->which(%opts);
 $dprf     = $dprf->trim(%opts);
 ($pa,$pb) = $CLASS_OR_OJBECT->pretrim($pa,$pb,%opts);
 
 ##========================================================================
 ## Stringification
 
 $dprf = $dprf->stringify( $obj);
 
 ##========================================================================
 ## Binary operations
 
 $dprf = $dprf->_add($dprf2,%opts);
 

DESCRIPTION

DiaColloDB::Profile::Diff is a DiaColloDB::Profile subclass class for representing low-level collocate frequency comparison data for a single date-slice as arising from the comparison of two DiaColloDB::Profile objects.

Globals & Constants

@ISA

DiaColloDB::Profile::Diff inherits from DiaColloDB::Profile.

%DIFFOPS

Canonical diff-operation names keyed by alias.

Constructors etc.

new
 $prf = $CLASS_OR_OBJECT->new(%args);
 $prf = $CLASS_OR_OBJECT->new($prf1,$prf2,%args)

%args, object structure:

 ##-- DiaColloDB::Profile::Diff
 prf1 => $prf1,     ##-- 1st operand
 prf2 => $prf2,     ##-- 2nd operand
 diff => $diff,     ##-- low-level score-diff binary operation (default='adiff')
 ##-- DiaColloDB::Profile keys
 label => $label,   ##-- string label (used by Multi; undef for none(default))
 #N   => $N,         ##-- OVERRIDE:unused: total marginal relation frequency
 #f1  => $f1,        ##-- OVERRIDE:unused: total marginal frequency of target word(s)
 #f2  => \%f2,       ##-- OVERRIDE:unused: total marginal frequency of collocates: ($i2=>$f2, ...)
 #f12 => \%f12,      ##-- OVERRIDE:unused: collocation frequencies, %f12 = ($i2=>$f12, ...)
 ##
 eps => $eps,       ##-- smoothing constant (default=0: no smoothing)
 score => $func,    ##-- selected scoring function ('f12', 'mi', or 'ld')
 mi => \%mi12,      ##-- DIFFERENCE: score: mutual information * logFreq a la Wortprofil; requires compile_mi()
 ld => \%ld12,      ##-- DIFFERENCE: score: log-dice a la Wortprofil; requires compile_ld()
 fm => \%fm12,      ##-- DIFFERENCE: score: frequency per million; requires compile_fm()

The diff option selects the function to be used to to compute final scores from operand profiles. The default value is 'adiff'. Currently known values are:

 adiff     # $score=$a-$b      # aliases=qw(absolute-difference abs-difference abs-diff adiff adifference a-) ; select=kbesta
 diff      # $score=$a-$b      # aliases=qw(difference diff d minus -)
 sum       # $score=$a+$b      # aliases=qw(sum add plus +)
 min       # $score=min($a,$b) # aliases=qw(minimum min <)
 max       # $score=max($a,$b) # aliases=qw(maximum max >)
 avg       # $score=avg($a,$b) # aliases=qw(average avg mean)
 havg      # $score~=harmonic_avg($a,$b)  # aliases=qw(harmonic-average harmonic-mean havg hmean ha h)
 gavg      # $score~=geometric_avg($a,$b) # aliases=qw(geometric-average geometric-mean gavg gmean ga g)
 lavg      # $score~=log_avg($a,$b)       # aliases=qw(logarithmic-average logarithmic-mean log-average log-mean lavg lmean la l)

To avoid singularities resulting from sparse data, the havg and gavg operations actually compute the arithmetic average of the harmonic (rsp. geometric) mean of and the raw arithmetic mean; e.g.

 score_havg($a,$b) = (($a<0 || $b<0 ? 0 : (2*$a*$b)/($a+$b) ##-- harmonic mean
                      + ($a+$b)/2                           ##-- arithmetic mean
                     )/2                                    ##-- average of harmonic- and arithmetic-means

The default diff operation is adiff, which selects those items with the greatest absolute differences among the (pre-trimmed) k-best items in its operand profiles. The sum and avg operations return equivalent rankings, but may assign undesirably high score values for non-uniform operand values (e.g. avg(0,8)=avg(4,4)=4, but only the latter configuration indicates similar collocation behavior in the operand profiles). The havg, gavg, and lavg operations attempt to address this shortcoming by penalizing non-uniform score-pairs, and tend to return similar rankings in the range [$a:$b].

clone
 $dprf2 = $dprf->clone();
 $dprf2 = $dprf->clone($keep_compiled);

clones %$dprf; if $keep_score is true, compiled data is cloned too.

Basic Access

operands
 ($prf1,$prf2) = $dprf->operands();

get operand profiles.

empty
 $bool = $dprf->empty();

returns true iff both operands are empty

I/O: JSON

loadJsonData
 $obj = $CLASS_OR_OBJECT->loadJsonData( $data,%opts);

guts for loadJsonString(), loadJsonFile()

I/O: Text

See also DiaColloDB::Persistent.

saveTextHeader
 undef = $CLASS_OR_OBJECT->saveTextHeader($fh, hlabel=>$hlabel, titles=>\@titles);

print column title header for text output.

saveTextFh
 $bool = $prf->saveTextFh($fh, %opts);

save flat TAB-separated text, format:

 Na Nb F1a F1b F2a F2b F12a F12b SCOREa SCOREb SCOREdiff LABEL ITEM2...

%opts:

 label => $label,   ##-- override $prf->{label} (used by Profile::Multi), no tab-separators required
 format => $fmt,    ##-- printf score formatting (default="%.4f")
 header => $bool,   ##-- include header-row? (default=1)
 hlabel => $hlabel, ##-- prefix header item-cells with $hlabel (used by Profile::MultiDiff)

I/O: HTML

saveHtmlFile
 $bool = $prf->saveHtmlFile($filename_or_handle, %opts);

Save flat HTML table data with rows of the form

 SCOREa SCOREb DIFF PREFIX? ITEM2...

If verbose option is specified and true, saved table has the form

 Na Nb F1a F1b F2a F2b F12a F12b SCOREa SCOREb DIFF PREFIX? ITEM2...

Options %opts:

 table   => $bool,    ##-- include <table>..</table> ? (default=1)
 body    => $bool,    ##-- include <html><body>..</html></body> ? (default=1)
 header  => $bool,    ##-- include header-row? (default=1)
 verbose => $bool,    ##-- include verbose output? (default=0)
 hlabel  => $hlabel,  ##-- prefix header item-cells with $hlabel (used by Profile::Multi), no '<th>..</th>' required
 label   => $label,   ##-- prefix item-cells with $label (used by Profile::Multi), no '<td>..</td>' required
 format  => $fmt,     ##-- printf score formatting (default="%.4f")

Compilation

populate
 $dprf = $dprf->populate();
 $dprf = $dprf->populate($prf1,$prf2);

populates diff-profile by subtracting $prf2 scores from $prf1.

compile
 $dprf = $dprf->compile($func,%opts);

compile for score-function $func, one of qw(f fm mi ld); default='f'.

uncompile
 $dprf = $dprf->uncompile();

un-compiles all scores for $dprf

diffop
 $opname = $dprf->diffop();
 $opname = $CLASS_OR_OBJECT->diffop($opNameOrAlias);

Returns canonical diff operation-name for $opNameOrAlias.

diffsub
 \&FUNC = $dprf->diffsub();
 \&FUNC = $CLASS_OR_OBJECT->diffsub($opNameOrAlias);

Returns low-level binary diff operation for diff-operation $opNameOrAlias (default=$dprf->{diff}).

diffpretrim
 $how = $dprf->diffpretrim()
 $how = $CLASS_OR_OBJECT->diffpretrim($opNameOrAlias)

Returns whether and how a diff operation $opNameOrAlias should pre-trim operand profiles. Returned value is one of:

 'restrict' # intersect defined collocates (min,avg,havg,gavg)
 'kbest'    # union of k-best collocates (diff,adiff,max)
 0          # don't pre-trim at all (everythiing else)
diffkbest
 $selector = $dprf->diffkbest();
 $selector = $CLASS_OR_OBJECT->diffkbest($opNameOrAlias);

Returns 'kbest' selector appropriate for which() or trim() methods.

diffop_diff
diffop_sum
diffop_min
diffop_max
diffop_avg
diffop_havg
diffop_gavg
diffop_lavg
  $diff = diffop_diff($ascore,$bscore)

Low-level diff-operation subs.

Trimming

trim
 $dprf = $dprf->trim(%opts);

trims profile and operands; %opts:

 kbest => $kbest,    ##-- retain only $kbest items (by score value)
 kbesta => $kbesta,  ##-- retain only $kbest items (by score absolute value)
 cutoff => $cutoff,  ##-- retain only items with $prf->{$prf->{score}}{$item} >= $cutoff
 keep => $keep,      ##-- retain keys @$keep (ARRAY) or keys(%$keep) (HASH)
 drop => $drop,      ##-- drop keys @$drop (ARRAY) or keys(%$drop) (HASH)
pretrim
 ($pa,$pb) = $CLASS_OR_OBJECT->pretrim($pa,$pb,%opts);

Perform pre-trimming on aligned profile pair ($pa,$pb) in the manner indicated by $CLASS_OR_OBJECT->diffpretrim($opts{diff}).

Stringification

stringify
 $dprf = $dprf->stringify( $obj);
 $dprf = $dprf->stringify(\@key2str)
 $dprf = $dprf->stringify(\&key2str)
 $dprf = $dprf->stringify(\%key2str)

stringifies profile and operands (destructive) via $obj->i2s($key2), $key2str->($i2) or $key2str->{$i2}.

Binary operations

_add
 $dprf = $dprf->_add($dprf2,%opts);

adds $dprf2 operatnd frequency data to $dprf operands (destructive); implicitly un-compiles $dprf. %opts:

 N  => $bool, ##-- whether to add N values (default:true)
 f1 => $bool, ##-- whether to add f1 values (default:true)

AUTHOR

Bryan Jurish <moocow@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2015-2016 by Bryan Jurish

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.2 or, at your option, any later version of Perl 5 you may have available.

SEE ALSO

DiaColloDB::Profile::MultiDiff(3pm), DiaColloDB::Profile(3pm), DiaColloDB(3pm), perl(1), ...