DiaColloDB::Document::DDCTabs - diachronic collocation db, source document, DDC tab-dump
##======================================================================== ## PRELIMINARIES use DiaColloDB::Document::DDCTabs; ##======================================================================== ## Constructors etc. $doc = CLASS_OR_OBJECT->new(%args); ##======================================================================== ## API: I/O: parse $bool = $doc->fromFile($filename_or_fh, %opts);
DiaColloDB::Document::DDCTabs provides a DiaColloDB::Document-compliant API for parsing DDC tab-dump files as produced by ddc_dump --full --tabs; see http://odo.dwds.de/~moocow/software/ddc/ddc_tabs.html for details.
ddc_dump --full --tabs
DiaColloDB::Document::DDCTabs inherits from DiaColloDB::Document and supports the DiaColloDB::Document API.
$doc = CLASS_OR_OBJECT->new(%args);
%args, object structure:
##-- parsing options eosre => $re, ##-- EOS regex (empty or undef for file-breaks only; default='^$') utf8 => $bool, ##-- enable utf8 parsing? (default=1) trimAuthor => $bool, ##-- trim "author" meta-attribute (eliminate DTA PNDs)? (default=1) trimGenre => $bool, ##-- create trimmed "genre" meta-attribute? (default=1) trimPND => $bool, ##-- create trimmed "pnd" meta-attribute? (default=1) foreign => $bool, ##-- disable D*-specific hacks (trimAuthor, trimGenre, trimPND) ## ##-- document data date =>$date, ##-- year wf =>$iw, ##-- index-field for $word attribute (default=0) pf =>$ip, ##-- index-field for $pos attribute (default=1) lf =>$il, ##-- index-field for $lemma attribute (default=2) tokens =>\@tokens, ##-- tokens, including undef for EOS meta =>\%meta, ##-- document metadata (e.g. author, title, collection, ...)
Each token in @tokens is a HASH-ref {w=>$word,p=>$pos,l=>$lemma,...}, or undef for EOS.
$bool = $doc->fromFile($filename_or_fh, %opts);
parse tokens from $filename_or_fh. %opts: clobbers %$doc
The following is an example file in the format accepted by this module:
%%$DDC:meta.date_=2016-02-25 %%$DDC:meta.author=Jurish, Bryan %%$DDC:meta.collection=tiny %%$DDC:meta.textClass=dummy:test-data %%$DDC:meta.title=test document %%$DDC:BREAK.s[1]=5 %%$DDC:BREAK.p[1]=11 %%$DDC:BREAK.file[1]=17 %%$DDC:BREAK.textarea[1]=17 %%$DDC:index[0]=Token w %%$DDC:index[1]=Pos p %%$DDC:index[2]=Lemma l This DT this is VBZ be a DT a test NN test . SENT . %%$DDC:BREAK.s[2]=5 This DT this is VBZ be only RB only a DT a test NN test . SENT . %%$DDC:BREAK.s[3]=11 %%$DDC:BREAK.p[2]=11 This DT this is VBZ be still RB still a DT a test NN test . SENT .
Bryan Jurish <moocow@cpan.org>
Copyright (C) 2015-2020 by Bryan Jurish
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.2 or, at your option, any later version of Perl 5 you may have available.
http://odo.dwds.de/~moocow/software/ddc/ddc_tabs.html, DiaColloDB::Document(3pm), DiaColloDB(3pm), perl(1), ...
To install DiaColloDB, copy and paste the appropriate command in to your terminal.
cpanm
cpanm DiaColloDB
CPAN shell
perl -MCPAN -e shell install DiaColloDB
For more information on module installation, please visit the detailed CPAN module installation guide.