DTA::CAB::Format::TJ - Datum parser: one-token-per-line text; token data as JSON
use DTA::CAB::Format::TJ; ##======================================================================== ## Constructors etc. $fmt = DTA::CAB::Format::TJ->new(%args); ##======================================================================== ## Methods: Input $fmt = $fmt->close(); $fmt = $fmt->fromString($string); $doc = $fmt->parseDocument(); ##======================================================================== ## Methods: Output $fmt = $fmt->flush(); $str = $fmt->toString(); $fmt = $fmt->putToken($tok); $fmt = $fmt->putSentence($sent); $fmt = $fmt->putDocument($doc);
DTA::CAB::Format::TJ inherits from DTA::CAB::Format::TT.
DTA::CAB::Format::TJ registers the filename regex:
/\.(?i:tj|cab-tj)$/
with DTA::CAB::Format.
$fmt = CLASS_OR_OBJ->new(%args);
%args, %$fmt:
##-- Input doc => $doc, ##-- buffered input document ## ##-- Output outbuf => $stringBuffer, ##-- buffered output #level => $formatLevel, ##-- n/a ## ##-- Common encoding => $inputEncoding, ##-- default: UTF-8, where applicable
@keys = $class_or_obj->noSaveKeys();
Returns list of keys not to be saved. This implementation returns qw(doc outbuf).
qw(doc outbuf)
$fmt = $fmt->close();
Override: close current input source, if any.
$fmt = $fmt->fromString($string);
Override: select input from string $string.
$fmt = $fmt->parseTJString($str)
Guts for fromString(): parse string $str into local document buffer $fmt->{doc}.
$doc = $fmt->parseDocument();
Override: just returns local document buffer $fmt->{doc}.
$fmt = $fmt->flush();
Override: flush accumulated output
$str = $fmt->toString(); $str = $fmt->toString($formatLevel)
Override: flush buffered output document to byte-string. Just encodes string in $fmt->{outbuf}.
$fmt = $fmt->putToken($tok);
Override: token output.
$fmt = $fmt->putSentence($sent);
Override: sentence output.
$fmt = $fmt->putDocument($doc);
Override: document output.
An example file in the format accepted/generated by this module (with very long lines) is:
%%$TJ:SENT={"lang":"de"} wie {"errid":"ec","hasmorph":"1","msafe":"1","moot":{"word":"wie","tag":"PWAV","lemma":"wie"},"exlex":"wie","lang":["de"],"xlit":{"latin1Text":"wie","isLatin1":"1","isLatinExt":"1"},"text":"wie"} oede {"moot":{"word":"öde","tag":"ADJD","lemma":"öde"},"text":"oede","xlit":{"latin1Text":"oede","isLatin1":"1","isLatinExt":"1"},"msafe":"0"} ! {"errid":"ec","exlex":"!","msafe":"1","xlit":{"isLatin1":"1","isLatinExt":"1","latin1Text":"!"},"text":"!","moot":{"word":"!","tag":"$.","lemma":"!"}}
Bryan Jurish <moocow@cpan.org>
Copyright (C) 2009-2019 by Bryan Jurish
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.24.1 or, at your option, any later version of Perl 5 you may have available.
1 POD Error
The following errors were encountered while parsing the POD:
Non-ASCII character seen before =encoding in '{"moot":{"word":"öde","tag":"ADJD","lemma":"öde"},"text":"oede","xlit":{"latin1Text":"oede","isLatin1":"1","isLatinExt":"1"},"msafe":"0"}'. Assuming UTF-8
To install DTA::CAB, copy and paste the appropriate command in to your terminal.
cpanm
cpanm DTA::CAB
CPAN shell
perl -MCPAN -e shell install DTA::CAB
For more information on module installation, please visit the detailed CPAN module installation guide.