The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

DTA::CAB::Format::Raw::HTTP - Document parser: raw untokenized text via HTTP tokenizer API

SYNOPSIS

 use DTA::CAB::Format::Raw::HTTP;
 
 ##========================================================================
 ## Methods
 
 $fmt = DTA::CAB::Format::Raw::HTTP->new(%args);
 @keys = $class_or_obj->noSaveKeys();
 $fmt = $fmt->close();
 $fmt = $fmt->parseRawString(\$str);
 $doc = $fmt->parseDocument();
 $type = $fmt->mimeType();
 $ext = $fmt->defaultExtension();
 

DESCRIPTION

DTA::CAB::Format::Raw::HTTP is an input-only DTA::CAB::Format subclass for untokenized raw string intput using LWP::UserAgent to query a tokenization server via HTTP.

Methods

new
 $fmt = CLASS_OR_OBJ->new(%args);

%$fmt, %args:

 ##-- Input
 doc       => $doc,      ##-- buffered input document
 tokurl    => $url,      ##-- tokenizer (default='http://kaskade.dwds.de/waste/tokenize.fcgi?m=dta&O=mr,loc')
 txtparam  => $param,    ##-- text query parameter (default='t')
 timeout   => $secs,     ##-- user agent timeout (default=300)
 ua        => $agent,    ##-- underlying LWP::UserAgent
noSaveKeys
 @keys = $class_or_obj->noSaveKeys();

Returns list of keys not to be saved Override returns qw(doc ua).

close
 $fmt = $fmt->close();

Deletes buffered input document, if any.

fromString
 $fmt = $fmt->fromString($string)

Select input from string $string.

parseRawString
 $fmt = $fmt->parseRawString(\$str);

Guts for fromString(): parse string $str into local document buffer.

parseDocument
 $doc = $fmt->parseDocument();

Wrapper for $fmt->{doc}.

mimeType
 $type = $fmt->mimeType();

Default returns text/plain.

defaultExtension
 $ext = $fmt->defaultExtension();

Returns default filename extension for this format, here '.raw'.

AUTHOR

Bryan Jurish <moocow@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2013-2019 by Bryan Jurish

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.24.1 or, at your option, any later version of Perl 5 you may have available.

SEE ALSO

dta-cab-convert.perl(1), DTA::CAB::Format::Builtin(3pm), DTA::CAB::Format(3pm), DTA::CAB(3pm), perl(1), ...