NAME
DTA::CAB::Analyzer::Morph::SMOR - morphological analysis via Gfsm automata, for SMOR-style transducers (e.g. Zmorge)
SYNOPSIS
use DTA::CAB::Analyzer::Morph::SMOR;
$morph = DTA::CAB::Analyzer::Morph::SMOR->new(%args);
$morph->analyze($tok);
DESCRIPTION
DTA::CAB::Analyzer::Morph::SMOR is a subclass of DTA::CAB::Analyzer::Morph::Helsinki::DE suitable for use with SMOR-style transducers, including zmorge transducers as produced by the SMORLemma grammar.
To produce a GFSM transducer (zmorge.gfst
) and vocabulary (zmorge.lab
) suitable for use with this module from one of the binary SFST-format transducers available from https://pub.cl.uzh.ch/users/sennrich/zmorge/, do something like the following (in debian at least):
sudo apt-get install sfst unzip wget sed gawk
wget https://pub.cl.uzh.ch/users/sennrich/zmorge/transducers/zmorge-20150315-smor_newlemma.a.zip
unzip zmorge-20150315-smor_newlemma.a.zip
fst-print zmorge-20150315-smor_newlemma.a | sed 's/ /_/g;' > zmorge.tfst
cat zmorge.tfst \
| awk -F$'\t' '{ if (NF >= 4) { print $3 "\n" $4 } }' \
| sed 's/^<>$//;' \
| sort -u \
| sed 's/^$/<>/;' \
| awk '{print $1 "\t" NR-1}' \
> zmorge.lab
gfsmcompile -z0 -l zmorge.lab zmorge.tfst | gfsminvert -z0 | gfsmarcsort -l -F zmorge.gfst
You can then test the compiled transducer with this module by calling e.g.:
dta-cab-analyze.perl -ac=Morph::SMOR -ao=fstFile=zmorge.gfst -ao=labFile=zmorge.lab -fc=text -w Vermittlungsgespräche
which should produce something like the following output:
Vermittlungsgespräche
+[morph] Vermittlungsgespräch[_NN]=Vermittl[<~>]ungs[<#>]gespräch[<+NN>][<Neut>][<Acc>][<Pl>] <0>
+[morph] Vermittlungsgespräch[_NN]=Vermittl[<~>]ungs[<#>]gespräch[<+NN>][<Neut>][<Dat>][<Sg>][<Old>] <0>
+[morph] Vermittlungsgespräch[_NN]=Vermittl[<~>]ungs[<#>]gespräch[<+NN>][<Neut>][<Gen>][<Pl>] <0>
+[morph] Vermittlungsgespräch[_NN]=Vermittl[<~>]ungs[<#>]gespräch[<+NN>][<Neut>][<Nom>][<Pl>] <0>
+[morph] Vermittlungsgespräch[_NN]=Vermittlung[<->]s[<#>]gespräch[<+NN>][<Neut>][<Acc>][<Pl>] <0>
+[morph] Vermittlungsgespräch[_NN]=Vermittlung[<->]s[<#>]gespräch[<+NN>][<Neut>][<Dat>][<Sg>][<Old>] <0>
+[morph] Vermittlungsgespräch[_NN]=Vermittlung[<->]s[<#>]gespräch[<+NN>][<Neut>][<Gen>][<Pl>] <0>
+[morph] Vermittlungsgespräch[_NN]=Vermittlung[<->]s[<#>]gespräch[<+NN>][<Neut>][<Nom>][<Pl>] <0>
AUTHOR
Bryan Jurish <moocow@cpan.org>
COPYRIGHT AND LICENSE
Copyright (C) 2021 by Bryan Jurish
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.24.1 or, at your option, any later version of Perl 5 you may have available.
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 105:
Non-ASCII character seen before =encoding in 'Vermittlungsgespräche'. Assuming UTF-8