DTA::CAB::Format::TCF - Datum parser|formatter: CLARIN-D TCF (selected features only)
##======================================================================== ## PRELIMINARIES use DTA::CAB::Format::TCF; ##======================================================================== ## Constructors etc. $fmt = CLASS_OR_OBJ->new(%args); ##======================================================================== ## Methods: Input: Generic API $doc = $fmt->parseDocument(); ##======================================================================== ## Methods: Output: MIME & HTTP stuff $short = $fmt->shortName(); $type = $fmt->mimeType(); $ext = $fmt->defaultExtension(); ##======================================================================== ## Methods: Output: output selection $fmt = $fmt->flush(); ##======================================================================== ## Methods: Output: Generic API $fmt = $fmt->putDocument($doc);
DTA::CAB::Format::TCF inherits from DTA::CAB::Format::XmlCommon.
$fmt = CLASS_OR_OBJ->new(%args);
object structure: HASH ref
{ ##-- new in TCF tcfbufr => \$buf, ##-- raw TCF buffer, for spliceback mode textbufr => \$text, ##-- raw text buffer, for spliceback mode tcflog => $level, ##-- debugging log-level (default: 'off') spliceback => $bool, ##-- (output) if true (default), splice data back into 'tcfbufr' if available; otherwise create new TCF doc tcflayers => $tcf_layer_names, ##-- layer names to include, space-separated list; known='tei text tokens sentences postags lemmas orthography' tcftagset => $tagset, ##-- tagset name for POStags element (default='stts') logsplice => $level, ##-- log level for spliceback messages (default:'none') trimtext => $bool, ##-- if true (default), waste tokenizer hints will be trimmed from 'text' layer ##-- input: inherited from XmlCommon xdoc => $xdoc, ##-- XML::LibXML::Document xprs => $xprs, ##-- XML::LibXML parser ##-- output: inherited from XmlCommon level => $level, ##-- output formatting level (OVERRIDE: default=1) output => [$how,$arg] ##-- either ['fh',$fh], ['file',$filename], or ['str',\$buf] }
$doc = $fmt->parseDocument();
parse buffered XML::LibXML::Document from $fmt->{xdoc}
$short = $fmt->shortName();
returns "official" short name for this format; override returns "tcf".
$type = $fmt->mimeType();
override returns text/xml
$ext = $fmt->defaultExtension();
returns default filename extension for this format; override returns ".tcf.xml".
$fmt = $fmt->flush();
flush any buffered output to selected output source
$fmt = $fmt->putDocument($doc);
override respects local 'spliceback' and 'tcflayers' flags
An example file in the format accepted/generated by this module is:
<?xml version="1.0" encoding="UTF-8"?> <D-Spin xmlns="http://www.dspin.de/data" version="0.4"> <MetaData xmlns="http://www.dspin.de/data/metadata"/> <TextCorpus xmlns="http://www.dspin.de/data/textcorpus" lang="de"> <text>wie oede!</text> <tokens> <token ID="w1">wie</token> <token ID="w2">oede</token> <token ID="w3">!</token> </tokens> <sentences> <sentence ID="s1" tokenIDs="w1 w2 w3"/> </sentences> <lemmas> <lemma tokenIDs="w1">wie</lemma> <lemma tokenIDs="w2">öde</lemma> <lemma tokenIDs="w3">!</lemma> </lemmas> <POStags tagset="stts"> <tag tokenIDs="w1">PWAV</tag> <tag tokenIDs="w2">ADJD</tag> <tag tokenIDs="w3">$.</tag> </POStags> <orthography> <correction tokenIDs="w2" operation="replace">öde</correction> </orthography> </TextCorpus> </D-Spin>
If the input contains a 'text' layer but no 'tokens' or 'sentences' layers, the 'text' layer will be tokenized using the DTA::CAB::Format::Raw class.
Bryan Jurish <moocow@cpan.org>
Copyright (C) 2015-2019 by Bryan Jurish
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.24.1 or, at your option, any later version of Perl 5 you may have available.
dta-cab-analyze.perl(1), dta-cab-convert.perl(1), dta-cab-http-server.perl(1), dta-cab-http-client.perl(1), dta-cab-xmlrpc-server.perl(1), dta-cab-xmlrpc-client.perl(1), DTA::CAB::Server(3pm), DTA::CAB::Client(3pm), DTA::CAB::Format(3pm), DTA::CAB(3pm), perl(1), ...
To install DTA::CAB, copy and paste the appropriate command in to your terminal.
cpanm
cpanm DTA::CAB
CPAN shell
perl -MCPAN -e shell install DTA::CAB
For more information on module installation, please visit the detailed CPAN module installation guide.