The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

DiaColloDB::Document::DDCTabs - diachronic collocation db, source document, DDC tab-dump

SYNOPSIS

 ##========================================================================
 ## PRELIMINARIES
 
 use DiaColloDB::Document::DDCTabs;
 
 ##========================================================================
 ## Constructors etc.
 
 $doc = CLASS_OR_OBJECT->new(%args);
 
 ##========================================================================
 ## API: I/O: parse
 
 $bool = $doc->fromFile($filename_or_fh, %opts);
 

DESCRIPTION

DiaColloDB::Document::DDCTabs provides a DiaColloDB::Document-compliant API for parsing DDC tab-dump files as produced by ddc_dump --full --tabs; see http://odo.dwds.de/~moocow/software/ddc/ddc_tabs.html for details.

Globals & Constants

Variable: @ISA

DiaColloDB::Document::DDCTabs inherits from DiaColloDB::Document and supports the DiaColloDB::Document API.

Constructors etc.

new
 $doc = CLASS_OR_OBJECT->new(%args);

%args, object structure:

 ##-- parsing options
 eosre => $re,        ##-- EOS regex (empty or undef for file-breaks only; default='^$')
 utf8  => $bool,      ##-- enable utf8 parsing? (default=1)
 trimAuthor => $bool, ##-- trim "author" meta-attribute (eliminate DTA PNDs)? (default=1)
 trimGenre  => $bool, ##-- create trimmed "genre" meta-attribute? (default=1)
 trimPND    => $bool, ##-- create trimmed "pnd" meta-attribute? (default=1)
 foreign    => $bool, ##-- disable D*-specific hacks (trimAuthor, trimGenre, trimPND)
 ##
 ##-- document data
 date   =>$date,     ##-- year
 wf     =>$iw,       ##-- index-field for $word attribute (default=0)
 pf     =>$ip,       ##-- index-field for $pos attribute (default=1)
 lf     =>$il,       ##-- index-field for $lemma attribute (default=2)
 tokens =>\@tokens,  ##-- tokens, including undef for EOS
 meta   =>\%meta,    ##-- document metadata (e.g. author, title, collection, ...)

Each token in @tokens is a HASH-ref {w=>$word,p=>$pos,l=>$lemma,...}, or undef for EOS.

API: I/O: parse

fromFile
 $bool = $doc->fromFile($filename_or_fh, %opts);

parse tokens from $filename_or_fh. %opts: clobbers %$doc

EXAMPLE

The following is an example file in the format accepted by this module:

 %%$DDC:meta.date_=2016-02-25
 %%$DDC:meta.author=Jurish, Bryan
 %%$DDC:meta.collection=tiny
 %%$DDC:meta.textClass=dummy:test-data
 %%$DDC:meta.title=test document
 %%$DDC:BREAK.s[1]=5
 %%$DDC:BREAK.p[1]=11
 %%$DDC:BREAK.file[1]=17
 %%$DDC:BREAK.textarea[1]=17
 %%$DDC:index[0]=Token w
 %%$DDC:index[1]=Pos p
 %%$DDC:index[2]=Lemma l
 This   DT      this
 is     VBZ     be
 a      DT      a
 test   NN      test
 .      SENT    .
 
 %%$DDC:BREAK.s[2]=5
 This   DT      this
 is     VBZ     be
 only   RB      only
 a      DT      a
 test   NN      test
 .      SENT    .
 
 %%$DDC:BREAK.s[3]=11
 %%$DDC:BREAK.p[2]=11
 This   DT      this
 is     VBZ     be
 still  RB      still
 a      DT      a
 test   NN      test
 .      SENT    .
 

AUTHOR

Bryan Jurish <moocow@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2015-2020 by Bryan Jurish

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.2 or, at your option, any later version of Perl 5 you may have available.

SEE ALSO

http://odo.dwds.de/~moocow/software/ddc/ddc_tabs.html, DiaColloDB::Document(3pm), DiaColloDB(3pm), perl(1), ...