DiaColloDB::Profile - diachronic collocation db, (co-)frequency profile
##======================================================================== ## PRELIMINARIES use DiaColloDB::Profile; ##======================================================================== ## Constructors etc. $prf = CLASS_OR_OBJECT->new(%args); $prf2 = $prf->clone(); $prf2 = $prf->shadow(); ##======================================================================== ## Basic Access $label = $prf->label(); \@titles_or_undef = $prf->titles(); @keys = $prf->scoreKeys(); $bool = $prf->empty(); ##======================================================================== ## I/O: JSON *TO_JSON = \&TO_JSON__table; ##======================================================================== ## I/O: Text undef = $CLASS_OR_OBJECT->saveTextHeader($fh, hlabel=>$hlabel, titles=>\@titles); $bool = $prf->saveTextFh($fh, %opts); ##======================================================================== ## I/O: HTML $bool = $prf->saveHtmlFile($filename_or_handle, %opts); ##======================================================================== ## Compilation $prf = $prf->compile($func,%opts); $prf = $prf->uncompile(); $prf = $prf->compile_f(); $prf = $prf->compile_fm(); $prf = $prf->compile_mi(%opts); $prf = $prf->compile_ld(%opts); ##======================================================================== ## Trimming \@keys = $prf->which(%opts); $prf = $prf->trim(%opts); ##======================================================================== ## Stringification $i2s = $prf->stringify_map( $obj); $prf = $prf->stringify( $obj); ##======================================================================== ## Algebraic operations $prf = $prf->_add($prf2,%opts); $prf3 = $prf1->add($prf2,%opts); $psum = $CLASS_OR_OBJECT->_sum(\@profiles,%opts); $psum = $CLASS_OR_OBJECT->sum(\@profiles,%opts); $diff = $prf1->diff($prf2,%opts);
DiaColloDB::Profile is a class for representing low-level collocate frequency profile data for a single date-slice as retrieved e.g. from a native index or DDC back-end. It includes methods for compiling profile scores via several score functions (e.g. frequency, pointwise mi * log-frequency, log Dice), k-best trimming, stringification, basic algebraic manipulation, and serialization (text, HTML, or JSON).
DiaColloDB::Profile inherits from DiaColloDB::Persistent.
$prf = CLASS_OR_OBJECT->new(%args);
%args, object structure:
label => $label, ##-- string label (used by Multi; undef for none(default)) N => $N, ##-- total marginal relation frequency f1 => $f1, ##-- total marginal frequency of target word(s) f2 => \%f2, ##-- total marginal frequency of collocates: ($i2=>$f2, ...) f12 => \%f12, ##-- collocation frequencies, %f12 = ($i2=>$f12, ...) titles => \@titles, ##-- item group titles (default:undef: unknown) ## eps => $eps, ##-- smoothing constant (default=0.5) score => $func, ##-- selected scoring function ('f12', 'mi', or 'ld') mi => \%mi12, ##-- score: mutual information * logFreq a la Wortprofil; requires compile_mi() ld => \%ld12, ##-- score: log-dice a la Wortprofil; requires compile_ld() fm => \%fm12, ##-- frequency per million score; requires compile_fm()
$prf2 = $prf->clone(); $prf2 = $prf->clone($keep_compiled)
clones the profile $prf. if $keep_score is true, compiled data is cloned too.
$prf2 = $prf->shadow(); $prf2 = $prf->shadow($keep_compiled)
shadows %$prf. if $keep_score is true, compiled data is shadowed too (all zeroes).
$label = $prf->label();
get profile label
\@titles_or_undef = $prf->titles();
get item titles
@keys = $prf->scoreKeys();
returns known score function keys
$bool = $prf->empty();
returns true iff profile is empty
$thingy = $obj->TO_JSON__table()
test alternative JSON format (small but slow).
$thingy = $obj->TO_JSON__flat()
See also DiaColloDB::Persistent.
undef = $CLASS_OR_OBJECT->saveTextHeader($fh, hlabel=>$hlabel, titles=>\@titles);
prints column titles for text output.
$bool = $prf->saveTextFh($fh, %opts);
save flat TAB-separated text, format:
N F1 F2 F12 SCORE LABEL ITEM2...
%opts:
label => $label, ##-- override $prf->{label} (used by Profile::Multi), no tab-separators required format => $fmt, ##-- printf format for scores (default="%f") header => $bool, ##-- include header-row? (default=1) hlabel => $hlabel, ##-- prefix header item-cells with $hlabel (used by Profile::Multi)
$bool = $prf->saveHtmlFile($filename_or_handle, %opts);
Save flat HTML table data with rows of the form
N F1 F2 F12 SCORE PREFIX? ITEM2...
table => $bool, ##-- include <table>..</table> ? (default=1) body => $bool, ##-- include <html><body>..</html></body> ? (default=1) header => $bool, ##-- include header-row? (default=1) hlabel => $hlabel, ##-- prefix header item-cells with $hlabel (used by Profile::Multi), no '<th>..</th>' required label => $label, ##-- prefix item-cells with $label (used by Profile::Multi), no '<td>..</td>' required format => $fmt, ##-- printf score formatting (default="%.4f")
$prf = $prf->compile($func,%opts);
compile for score-function $func, one of qw(f fm mi ld); default='f'
$prf = $prf->uncompile();
un-compiles all scores for $prf
$prf = $prf->compile_f();
just sets $prf->{score} = 'f12'
$prf = $prf->compile_fm();
computes frequency-per-million in $prf->{fm}; sets $prf->{score}='fm'.
$prf = $prf->compile_mi(%opts);
computes MI*logF-profile in $prf->{mi} a la Rychly (2008); sets $prf->{score}='mi'. %opts:
eps => $eps #-- clobber $prf->{eps}
$prf = $prf->compile_ld(%opts);
computes log-dice profile in $prf->{ld} a la Rychly (2008); sets $pf->{score}='ld'. %opts:
\@keys = $prf->which(%opts);
returns 'good' keys for trimming options %opts:
cutoff => $cutoff, ##-- retain only items with $prf->{$prf->{score}}{$item} >= $cutoff kbest => $kbest, ##-- retain only $kbest items kbesta => $kbesta, ##-- retain only $kbest items (absolute value) return => $which, ##-- either 'good' (default) or 'bad' as => $as, ##-- 'hash' or 'array'; default='array'
$prf = $prf->trim(%opts);
trim profile to contain only 'good' keys.
kbest => $kbest, ##-- retain only $kbest items (by score value) kbesta => $kbesta, ##-- retain only $kbest items (by score absolute value) cutoff => $cutoff, ##-- retain only items with $prf->{$prf->{score}}{$item} >= $cutoff keep => $keep, ##-- retain keys @$keep (ARRAY) or keys(%$keep) (HASH) drop => $drop, ##-- drop keys @$drop (ARRAY) or keys(%$drop) (HASH)
NOTE: this COULD be factored out into s.t. like $prf->trim($prf->which(%opts)), but it's about 15% faster inline.
$i2s = $prf->stringify_map( $obj); $i2s = $prf->stringify_map(\@key2str); $i2s = $prf->stringify_map(\&key2str); $i2s = $prf->stringify_map(\%key2str);
guts for stringify: get a map for stringification
$prf = $prf->stringify( $obj); $prf = $prf->stringify(\@key2str) $prf = $prf->stringify(\&key2str) $prf = $prf->stringify(\%key2str)
stringifies profile (destructive) via $obj->i2s($key2), $key2str->($i2) or $key2str->{$i2}.
$prf = $prf->_add($prf2,%opts);
adds $prf2 frequency data to $prf (destructive); implicitly un-compiles $prf.
N => $bool, ##-- whether to add N values (default:true) f1 => $bool, ##-- whether to add f1 values (default:true)
$prf3 = $prf1->add($prf2,%opts);
returns sum of $prf1 and $prf2 frequency data (destructive). %opts: as for _add().
$psum = $CLASS_OR_OBJECT->_sum(\@profiles,%opts);
returns a profile representing sum of \@profiles, passing %opts to _add().
if called as a class method and \@profiles contains only 1 element, that element is returned
otherwise, \@profiles are added to the (new) object
$psum = $CLASS_OR_OBJECT->sum(\@profiles,%opts);
returns a new profile representing sum of \@profiles; see _sum().
$diff = $prf1->diff($prf2,%opts);
wraps DiaColloDB::Profile::Diff->new($prf1,$prf2,%opts).
Bryan Jurish <moocow@cpan.org>
Copyright (C) 2015 by Bryan Jurish
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.2 or, at your option, any later version of Perl 5 you may have available.
dcdb-create.per(1), dcdb-query.perl(1), dcdb-info.perl(1), dcdb-export.perl(1), dcdb-dump.perl(1), DiaColloDB(3pm), perl(1), ...
To install DiaColloDB, copy and paste the appropriate command in to your terminal.
cpanm
cpanm DiaColloDB
CPAN shell
perl -MCPAN -e shell install DiaColloDB
For more information on module installation, please visit the detailed CPAN module installation guide.