The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

DiaColloDB::Profile - diachronic collocation db, (co-)frequency profile

SYNOPSIS

 ##========================================================================
 ## PRELIMINARIES
 
 use DiaColloDB::Profile;
 
 ##========================================================================
 ## Constructors etc.
 
 $prf = CLASS_OR_OBJECT->new(%args);
 $prf2 = $prf->clone();
 $prf2 = $prf->shadow();
 
 ##========================================================================
 ## Basic Access
 
 $label = $prf->label();
 \@titles_or_undef = $prf->titles();
 @keys = $prf->scoreKeys();
 $bool = $prf->empty();
 
 ##========================================================================
 ## I/O: JSON
 
  *TO_JSON = \&TO_JSON__table;
 
 ##========================================================================
 ## I/O: Text
 
 undef = $CLASS_OR_OBJECT->saveTextHeader($fh, hlabel=>$hlabel, titles=>\@titles);
 $bool = $prf->saveTextFh($fh, %opts);
 
 ##========================================================================
 ## I/O: HTML
 
 $bool = $prf->saveHtmlFile($filename_or_handle, %opts);
 
 ##========================================================================
 ## Compilation
 
 $prf = $prf->compile($func,%opts);
 $prf = $prf->uncompile();
 $prf = $prf->compile_f();
 $prf = $prf->compile_fm();
 $prf = $prf->compile_mi(%opts);
 $prf = $prf->compile_ld(%opts);
 
 ##========================================================================
 ## Trimming
 
 \@keys = $prf->which(%opts);
 $prf   = $prf->trim(%opts);
 
 ##========================================================================
 ## Stringification
 
 $i2s = $prf->stringify_map( $obj);
 $prf = $prf->stringify( $obj);
 
 ##========================================================================
 ## Algebraic operations
 
 $prf = $prf->_add($prf2,%opts);
 $prf3 = $prf1->add($prf2,%opts);
 $psum = $CLASS_OR_OBJECT->_sum(\@profiles,%opts);
 $psum = $CLASS_OR_OBJECT->sum(\@profiles,%opts);
 $diff = $prf1->diff($prf2,%opts);

DESCRIPTION

DiaColloDB::Profile is a class for representing low-level collocate frequency profile data for a single date-slice as retrieved e.g. from a native index or DDC back-end. It includes methods for compiling profile scores via several score functions (e.g. frequency, pointwise mi * log-frequency, log Dice), k-best trimming, stringification, basic algebraic manipulation, and serialization (text, HTML, or JSON).

Globals & Constants

Variable: @ISA

DiaColloDB::Profile inherits from DiaColloDB::Persistent.

Constructors etc.

new
 $prf = CLASS_OR_OBJECT->new(%args);

%args, object structure:

 label => $label,    ##-- string label (used by Multi; undef for none(default))
 N   => $N,          ##-- total marginal relation frequency
 f1  => $f1,         ##-- total marginal frequency of target word(s)
 f2  => \%f2,        ##-- total marginal frequency of collocates: ($i2=>$f2, ...)
 f12 => \%f12,       ##-- collocation frequencies, %f12 = ($i2=>$f12, ...)
 titles => \@titles, ##-- item group titles (default:undef: unknown)
 ##
 eps => $eps,        ##-- smoothing constant (default=0.5)
 score => $func,     ##-- selected scoring function ('f12', 'mi', or 'ld')
 mi => \%mi12,       ##-- score: mutual information * logFreq a la Wortprofil; requires compile_mi()
 ld => \%ld12,       ##-- score: log-dice a la Wortprofil; requires compile_ld()
 fm => \%fm12,       ##-- frequency per million score; requires compile_fm()
clone
 $prf2 = $prf->clone();
 $prf2 = $prf->clone($keep_compiled)

clones the profile $prf. if $keep_score is true, compiled data is cloned too.

shadow
 $prf2 = $prf->shadow();
 $prf2 = $prf->shadow($keep_compiled)

shadows %$prf. if $keep_score is true, compiled data is shadowed too (all zeroes).

Basic Access

label
 $label = $prf->label();

get profile label

titles
 \@titles_or_undef = $prf->titles();

get item titles

scoreKeys
 @keys = $prf->scoreKeys();

returns known score function keys

empty
 $bool = $prf->empty();

returns true iff profile is empty

I/O: JSON

TO_JSON__table
 $thingy = $obj->TO_JSON__table()

test alternative JSON format (small but slow).

TO_JSON__flat
 $thingy = $obj->TO_JSON__flat()

test alternative JSON format (small but slow).

I/O: Text

See also DiaColloDB::Persistent.

saveTextHeader
 undef = $CLASS_OR_OBJECT->saveTextHeader($fh, hlabel=>$hlabel, titles=>\@titles);

prints column titles for text output.

saveTextFh
 $bool = $prf->saveTextFh($fh, %opts);

save flat TAB-separated text, format:

 N F1 F2 F12 SCORE LABEL ITEM2...

%opts:

 label => $label,   ##-- override $prf->{label} (used by Profile::Multi), no tab-separators required
 format => $fmt,    ##-- printf format for scores (default="%f")
 header => $bool,   ##-- include header-row? (default=1)
 hlabel => $hlabel, ##-- prefix header item-cells with $hlabel (used by Profile::Multi)

I/O: HTML

saveHtmlFile
 $bool = $prf->saveHtmlFile($filename_or_handle, %opts);

Save flat HTML table data with rows of the form

 N F1 F2 F12 SCORE PREFIX? ITEM2...

%opts:

 table  => $bool,     ##-- include <table>..</table> ? (default=1)
 body   => $bool,     ##-- include <html><body>..</html></body> ? (default=1)
 header => $bool,     ##-- include header-row? (default=1)
 hlabel => $hlabel,   ##-- prefix header item-cells with $hlabel (used by Profile::Multi), no '<th>..</th>' required
 label  => $label,    ##-- prefix item-cells with $label (used by Profile::Multi), no '<td>..</td>' required
 format => $fmt,      ##-- printf score formatting (default="%.4f")

Compilation

compile
 $prf = $prf->compile($func,%opts);

compile for score-function $func, one of qw(f fm mi ld); default='f'

uncompile
 $prf = $prf->uncompile();

un-compiles all scores for $prf

compile_f
 $prf = $prf->compile_f();

just sets $prf->{score} = 'f12'

compile_fm
 $prf = $prf->compile_fm();

computes frequency-per-million in $prf->{fm}; sets $prf->{score}='fm'.

compile_mi
 $prf = $prf->compile_mi(%opts);

computes MI*logF-profile in $prf->{mi} a la Rychly (2008); sets $prf->{score}='mi'. %opts:

 eps => $eps  #-- clobber $prf->{eps}
compile_ld
 $prf = $prf->compile_ld(%opts);

computes log-dice profile in $prf->{ld} a la Rychly (2008); sets $pf->{score}='ld'. %opts:

 eps => $eps  #-- clobber $prf->{eps}

Trimming

which
 \@keys = $prf->which(%opts);

returns 'good' keys for trimming options %opts:

 cutoff => $cutoff,  ##-- retain only items with $prf->{$prf->{score}}{$item} >= $cutoff
 kbest  => $kbest,   ##-- retain only $kbest items
 kbesta => $kbesta,  ##-- retain only $kbest items (absolute value)
 return => $which,   ##-- either 'good' (default) or 'bad'
 as     => $as,      ##-- 'hash' or 'array'; default='array'
trim
 $prf = $prf->trim(%opts);

trim profile to contain only 'good' keys.

%opts:

 kbest => $kbest,    ##-- retain only $kbest items (by score value)
 kbesta => $kbesta,  ##-- retain only $kbest items (by score absolute value)
 cutoff => $cutoff,  ##-- retain only items with $prf->{$prf->{score}}{$item} >= $cutoff
 keep => $keep,      ##-- retain keys @$keep (ARRAY) or keys(%$keep) (HASH)
 drop => $drop,      ##-- drop keys @$drop (ARRAY) or keys(%$drop) (HASH)

NOTE: this COULD be factored out into s.t. like $prf->trim($prf->which(%opts)), but it's about 15% faster inline.

Stringification

stringify_map
 $i2s = $prf->stringify_map( $obj);
 $i2s = $prf->stringify_map(\@key2str);
 $i2s = $prf->stringify_map(\&key2str);
 $i2s = $prf->stringify_map(\%key2str);

guts for stringify: get a map for stringification

stringify
 $prf = $prf->stringify( $obj);
 $prf = $prf->stringify(\@key2str)
 $prf = $prf->stringify(\&key2str)
 $prf = $prf->stringify(\%key2str)

stringifies profile (destructive) via $obj->i2s($key2), $key2str->($i2) or $key2str->{$i2}.

Algebraic operations

_add
 $prf = $prf->_add($prf2,%opts);

adds $prf2 frequency data to $prf (destructive); implicitly un-compiles $prf.

%opts:

 N  => $bool, ##-- whether to add N values (default:true)
 f1 => $bool, ##-- whether to add f1 values (default:true)
add
 $prf3 = $prf1->add($prf2,%opts);

returns sum of $prf1 and $prf2 frequency data (destructive). %opts: as for _add().

_sum
 $psum = $CLASS_OR_OBJECT->_sum(\@profiles,%opts);
  • returns a profile representing sum of \@profiles, passing %opts to _add().

  • if called as a class method and \@profiles contains only 1 element, that element is returned

  • otherwise, \@profiles are added to the (new) object

sum
 $psum = $CLASS_OR_OBJECT->sum(\@profiles,%opts);

returns a new profile representing sum of \@profiles; see _sum().

diff
 $diff = $prf1->diff($prf2,%opts);

wraps DiaColloDB::Profile::Diff->new($prf1,$prf2,%opts).

AUTHOR

Bryan Jurish <moocow@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2015 by Bryan Jurish

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.2 or, at your option, any later version of Perl 5 you may have available.

SEE ALSO

dcdb-create.per(1), dcdb-query.perl(1), dcdb-info.perl(1), dcdb-export.perl(1), dcdb-dump.perl(1), DiaColloDB(3pm), perl(1), ...