NAME
DTA::CAB::Format::Raw::Waste - Datum parser: raw untokenized text (using moot/waste)
SYNOPSIS
##========================================================================
## PRELIMINARIES
##========================================================================
## Constructors etc.
$fmt
= CLASS_OR_OBJ->new(
%args
);
##========================================================================
## Methods: Persistence
@keys
=
$class_or_obj
->noSaveKeys();
##========================================================================
## Methods: Local: model caching
\
%wmodel_or_undef
=
$fmt
->ensureModel();
\
%config
= CLASS_OR_OBJECT->loadModelConfig(
$wasterc
);
##========================================================================
## Methods: Model I/O
$fmt_or_undef
=
$fmt
->ensureLoaded();
$fmt_or_undef
=
$fmt
->loadModel();
##========================================================================
## Methods: Input: Input selection
$fmt
=
$fmt
->
close
();
+
default
calls fromFh();
##========================================================================
## Methods: Input: Generic API
$doc
=
$fmt
->parseDocument();
##========================================================================
## Methods: Output: Generic
$type
=
$fmt
->mimeType();
$ext
=
$fmt
->defaultExtension();
DESCRIPTION
DTA::CAB::Format::Raw::Waste is an input DTA::CAB::Format subclass for untokenized raw string input using moot/WASTE as an underlying tokenizer. As an output format, inherits from DTA::CAB::Format::Raw::Base for output.
Globals
- Variable: @ISA
-
Inherits from DTA::CAB::Format::Raw::Base.
- Variable: @DEFAULT_WASTERC_PATHS
-
List of default paths to search for waste.rc config files; see mootfiles(5); default value:
(
$ENV
{TOKWRAP_RCDIR} ?
"$ENV{TOKWRAP_RCDIR}/waste/waste.rc"
:
qw()
),
(
defined
(
$DTA::TokWrap::Version::VERSION
) ?
"$DTA::TokWrap::Version::RCDIR/waste/waste.rc"
:
qw()
),
"$ENV{HOME}/.wasterc"
,
"/etc/wasterc"
,
"/etc/default/wasterc"
- Variable: $logLoad
- Variable: $logCache
- Variable: $logRun
Constructors etc.
- new
-
$fmt
= CLASS_OR_OBJ->new(
%args
);
object structure: assumed HASH
{
##-- Input
doc
=>
$doc
,
##-- buffered input document
wasterc
=>
$rcFile
,
##-- waste .rc file; default: "$HOME/.wasterc" || "/etc/wasterc" || "/etc/default/waste"
##-- Runtime
wmodel
=> \
%wmodel
##-- waste model; %wmodel=(
# config => \%config, #-- parsed rcfile (see loadModelConfig())
# loaded => $time, #-- unix timestamp of last model load
# wscanner => $scanner, #-- waste scanner
# wlexer => $lexer, #-- waste lexer
# wtagger => $tagger, #-- waste tagger
# wdecoder => $decoder, #-- waste decoder
# wannotator => $wannot, #-- waste annotator
# wwriter => $wwriter, #-- native-format writer (hack)
# )
##-- logging (in order of increasing verbosity)
logLoad
=>
$level
,
# model loading log-level (default=$logLoad)
logCache
=>
$level
,
# cache operation log-level (default=$logCache)
logRun
=>
$level
,
# runtime operation log-level (default=$logRun)
##-- Common
#utf8 => $bool, ##-- utf8 mode always on
Methods: Persistence
- noSaveKeys
-
@keys
=
$class_or_obj
->noSaveKeys();
Returns list of keys not to be saved; override appends
qw(doc wmodel wscanner wlexer wtagger wdecoder wannotator wwriter)
.
Methods: Local: model caching
- Variable: %MODELS
-
Cached models (
"$wasterc_abspath:$PID" => \%wmodel
) - ensureModel
-
\
%wmodel_or_undef
=
$fmt
->ensureModel();
\
%wmodel_or_undef
=
$fmt
->ensureModel(
$wasterc
)
\
%wmodel_or_undef
= CLASS->ensureModel(
$wasterc
)
Loads cached model if available; otherwise populates cache.
- loadModelConfig
-
\
%config
= CLASS_OR_OBJECT->loadModelConfig(
$wasterc
);
loads rc-file with keys
qw(abbrevs conjunctions stopwords dehyphenate hmm)
Methods: Model I/O
- ensureLoaded
-
$fmt_or_undef
=
$fmt
->ensureLoaded();
ensures model is loaded.
- loadModel
-
$fmt_or_undef
=
$fmt
->loadModel();
$fmt_or_undef
=
$fmt
->loadModel(
$rcfile
);
backwards-compatible method wraps
ensureModel()
.
Methods: Input: Input selection
- close
-
$fmt
=
$fmt
->
close
();
(undocumented)
- fromFh
-
$fmt
=
$fmt
->fromFh(
$fh
)
select input from a filehandle.
Methods: Input: Generic API
Methods: Output: Generic
- mimeType
-
$type
=
$fmt
->mimeType();
default returns
text/plain
- defaultExtension
-
$ext
=
$fmt
->defaultExtension();
returns default filename extension for this format (
.raw
)
AUTHOR
Bryan Jurish <moocow@cpan.org>
COPYRIGHT AND LICENSE
Copyright (C) 2011-2019 by Bryan Jurish
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.24.1 or, at your option, any later version of Perl 5 you may have available.
SEE ALSO
dta-cab-analyze.perl(1), dta-cab-convert.perl(1), dta-cab-http-server.perl(1), dta-cab-http-client.perl(1), dta-cab-xmlrpc-server.perl(1), dta-cab-xmlrpc-client.perl(1), DTA::CAB::Server(3pm), DTA::CAB::Client(3pm), DTA::CAB::Format(3pm), DTA::CAB(3pm), perl(1), ...