The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

DTA::CAB::Analyzer::Automaton - generic analysis automaton API

SYNOPSIS

 use DTA::CAB::Analyzer::Automaton;
 
 ##========================================================================
 ## Constructors etc.
 
 $obj = CLASS_OR_OBJ->new(%args);
 $aut = $aut->clear();
 
 ##========================================================================
 ## Methods: Generic
 
 $class = $aut->fstClass();
 $class = $aut->labClass();
 $bool = $aut->fstOk();
 $bool = $aut->labOk();
 
 ##========================================================================
 ## Methods: I/O
 
 $bool = $aut->ensureLoaded();
 $aut = $aut->load(fst=>$fstFile, lab=>$labFile);
 $aut = $aut->loadFst($fstfile);
 $aut = $aut->loadLabels($labfile);
 $aut = $aut->parseLabels();
 
 ##========================================================================
 ## Methods: Persistence: Perl
 
 @keys = $class_or_obj->noSaveKeys();
 $loadedObj = $CLASS_OR_OBJ->loadPerlRef($ref);
 
 ##========================================================================
 ## Methods: Analysis
 
 $bool = $anl->canAnalyze();
 $doc = $anl->analyzeTypes($doc,\%types,\%opts);
 

DESCRIPTION

Globals

Variable: @ISA

DTA::CAB::Analyzer::Automaton inherits from DTA::CAB::Analyzer.

Constructors etc.

new
 $aut = CLASS_OR_OBJ->new(%args);

Constuctor.

%args, %$aut:

 ##-- Filename Options
 fstFile => $filename,     ##-- source FST file (default: none)
 labFile => $filename,     ##-- source labels file (default: none)
 ##
 ##-- Analysis Output
 analyzeGet     => $code,  ##-- accessor: coderef or string: source text (default=$DEFAULT_ANALYZE_GET; return undef for no analysis)
 analyzeSet     => $code,  ##-- accessor: coderef or string: set analyses (default=$DEFAULT_ANALYZE_SET)
 wantAnalysisLo => $bool,     ##-- set to true to include 'lo'    keys in analyses (default: true)
 wantAnalysisLemma => $bool,  ##-- set to true to include 'lemma' keys in analyses (default: false)
 ##
 ##-- Analysis Options
 eow            => $sym,  ##-- EOW symbol for analysis FST
 check_symbols  => $bool, ##-- check for unknown symbols? (default=1)
 labenc         => $enc,  ##-- encoding of labels file (default='auto': utf8 if valid, else latin1)
 auto_connect   => $bool, ##-- whether to call $result->_connect() after every lookup   (default=0)
 tolower        => $bool, ##-- if true, all input words will be bashed to lower-case (default=0)
 tolowerNI      => $bool, ##-- if true, all non-initial characters of inputs will be lower-cased (default=0)
 toupperI       => $bool, ##-- if true, initial character will be upper-cased (default=0)
 bashWS         => $str,  ##-- if defined, input whitespace will be bashed to '$str' (default='_')
 attInput       => $bool, ##-- if true, respect AT&T lextools-style escapes in input (default=0)
 attOutput      => $bool, ##-- if true, generate AT&T escapes in output (default=1)
 allowTextRegex => $re,   ##-- if defined, only tokens with matching 'text' will be analyzed (default: none)
                          ##   : useful: /(?:^[[:alpha:]\-\x{ac}]*[[:alpha:]]+$)|(?:^[[:alpha:]]+[[:alpha:]\-\x{ac}]+$)/
 ##-- Analysis objects
 fst  => $gfst,      ##-- (child classes only) e.g. a Gfsm::Automaton object (default=new)
 lab  => $lab,       ##-- (child classes only) e.g. a Gfsm::Alphabet object (default=new)
 labh => \%sym2lab,  ##-- (?) label hash:  $sym2lab{$labSym} = $labId;
 laba => \@lab2sym,  ##-- (?) label array:  $lab2sym[$labId]  = $labSym;
 labc => \@chr2lab,  ##-- (?)chr-label array: $chr2lab[ord($chr)] = $labId;, by unicode char number (e.g. unpack('U0U*'))
 ##
 ##-- INHERITED from DTA::CAB::Analyzer
 label => $label,    ##-- analyzer label (default: from analyzer class name)
 typeKeys => \@keys, ##-- type-wise keys to expand
clear
 $aut = $aut->clear();

Clears the object.

Methods: Generic

fstClass
 $class = $aut->fstClass();

Returns default FST class for "loadFst"() method. Used by sub-classes.

labClass
 $class = $aut->labClass();

Returns default alphabet class for "loadLabels"() method. Used by sub-classes.

fstOk
 $bool = $aut->fstOk();

Should return false iff fst is undefined or "empty".

labOk
 $bool = $aut->labOk();

Should return false iff alphabet (label-set) is undefined or "empty".

Methods: I/O

ensureLoaded
 $bool = $aut->ensureLoaded();

Ensures automaton data is loaded from default files.

load
 $aut = $aut->load(fst=>$fstFile, lab=>$labFile);

Loads specified files.

loadFst
 $aut = $aut->loadFst($fstfile);

Loads automaton from $fstfile.

loadLabels
 $aut = $aut->loadLabels($labfile);

Loads labels from $labfile.

parseLabels
 $aut = $aut->parseLabels();

Parses some information from a (newly loaded) alphabet.

  • sets up $aut->{labh}, $aut->{laba}, $aut->{labc}

  • fixes encoding difficulties in $aut->{labh}, $aut->{laba}

Methods: Persistence: Perl

noSaveKeys
 @keys = $class_or_obj->noSaveKeys();

Returns list of keys not to be saved

This implementation returns:

 qw(dict fst lab laba labc labh result)
loadPerlRef
 $loadedObj = $CLASS_OR_OBJ->loadPerlRef($ref);

Implicitly calls $obj->clear()

Methods: Analysis

canAnalyze
 $bool = $anl->canAnalyze();

Returns true if analyzer can perform its function (e.g. data is loaded & non-empty) This implementation just returns:

 ($anl->labOk && $anl->fstOk)
analyzeTypes
 $doc = $anl->analyzeTypes($doc,\%types,\%opts);

Perform type-wise analysis of all (text) types in %types (= %{$doc->{types}}).

AUTHOR

Bryan Jurish <moocow@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2009-2019 by Bryan Jurish

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.24.1 or, at your option, any later version of Perl 5 you may have available.

SEE ALSO

dta-cab-analyze.perl(1), DTA::CAB::Analyzer(3pm), DTA::CAB::Chain(3pm), DTA::CAB(3pm), perl(1), ...