The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

DTA::CAB::Analyzer::LangId::Simple - simple language guesser using stopword lists

SYNOPSIS

 ##========================================================================
 ## PRELIMINARIES
 
 use DTA::CAB::Analyzer::LangId::Simple;
 
 ##========================================================================
 ## Methods: Prepare
 
 $bool = $lid->ensureLoaded();
 
 ##========================================================================
 ## Methods: Analysis: v1.x: API
 
 $doc = $anl->analyzeTypes($doc,\%types,\%opts);
 $doc = $anl->analyzeSentences($doc,\%opts);
 

DESCRIPTION

Methods: Constructors etc.

new
 $obj = CLASS_OR_OBJ->new(%args)

Creates a new simple language-guesser object, which inherits from DTA::CAB::Analyzer::Dict::Json. Known options in %args:

 ##-- analysis selection
 label      => 'lang', ##-- analyzer label
 defaultLang => 'de',  ##-- default language (if e.g. known by 'morph')
 defaultCount => 0.1,  ##-- bonus count for default lang (characters)
 minSentLen   => 2,    ##-- minimum number of tokens in sentence required before guessing
 minSentChars => 8,    ##-- minimum number of text characters in sentence required begore guessing

Methods: Prepare

ensureLoaded
 $bool = $lid->ensureLoaded();

ensures analyzer data is loaded from default files.

Methods: Analysis: v1.x: API

analyzeTypes
 $doc = $anl->analyzeTypes($doc,\%types,\%opts);

perform type-wise analysis of all (text) types in $doc->{types}

analyzeSentences
 $doc = $anl->analyzeSentences($doc,\%opts);

perform sentence-wise analysis of all sentences in $doc->{body}.

AUTHOR

Bryan Jurish <moocow@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2013-2019 by Bryan Jurish

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.24.1 or, at your option, any later version of Perl 5 you may have available.

SEE ALSO

dta-cab-analyze.perl(1), dta-cab-convert.perl(1), dta-cab-http-server.perl(1), dta-cab-http-client.perl(1), dta-cab-xmlrpc-server.perl(1), dta-cab-xmlrpc-client.perl(1), DTA::CAB::Server(3pm), DTA::CAB::Client(3pm), DTA::CAB::Format(3pm), DTA::CAB(3pm), perl(1), ...