The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Lingua::Foma - XS Bindings to the Foma Finite State Morphology Toolkit

SYNOPSIS

  use Lingua::Foma;

  # Create a new transducer based on regular expressions
  my $fsm = Lingua::Foma->new("{climb}|{jump}|{track}");

  # Modify the transducer using foma operations
  $fsm->unify("{paint}")->concat("{ing}:0");

  # Check some automaton properties
  print $fsm->arc_count;
  # 22

  # Save newly created transducer
  $fsm->save("my_transducer.foma");

  # Load a transducer build with foma
  my $new = Lingua::Foma->load("my_transducer.foma");

  # Iterate through transduced results
  my $i = $new->down("climbing");
  while (my $string = $i->next) {
    print $string, "\n";
    # climb
  };

DESCRIPTION

Foma is a C library for dealing with finite state automata and transducers - with a main focus on applications in the field of Finite State Morphology. This module is an XS binding to the Foma library, supporting a wide range of API methods. Most of the time all you need is probably loading transducers created with Foma (using LexC and xfst) and translating strings, but this library also provides methods to create and modify automata and retrieve properties.

This module is a developer realease - the API may change without notification!

ATTRIBUTES

arc_count

  print $fsm->arc_count;

Return the number of arcs.

final_count

  print $fsm->final_count;

Return the number of final states.

is_completed

  if ($fsm->is_completed) {
    print "Complete";
  }
  else if (defined $fsm->is_completed) {
    print "Incomplete";
  }
  else {
    print "Status unknown";
  };

Check, if an automaton is complete or not. Returns undef if the status is unknown, otherwise returns either a true or a false value.

is_deterministic

  if ($fsm->is_deterministic) {
    print "Deterministic";
  }
  else if (defined $fsm->is_deterministic) {
    print "Nondeterministic";
  }
  else {
    print "Status unknown";
  };

Check, if an automaton is deterministic or not. Returns undef if the status is unknown, otherwise returns either a true or a false value.

is_epsilon_free

  if ($fsm->is_epsilon_free) {
    print "Has no epsilon arcs";
  }
  else if (defined $fsm->is_epsilon_free) {
    print "Has epsilon arcs";
  }
  else {
    print "Status unknown";
  };

Check, if an automaton is free of epsilon arcs or not. Returns undef if the status is unknown, otherwise returns either a true or a false value.

is_loop_free

  if ($fsm->is_loop_free) {
    print "Has no loops";
  }
  else if (defined $fsm->is_loop_free) {
    print "Has loops";
  }
  else {
    print "Status unknown";
  };

Check, if an automaton is free of loops or not. Returns undef if the status is unknown, otherwise returns either a true or a false value.

is_minimized

  if ($fsm->is_minimized) {
    print "Is minimized";
  }
  else if (defined $fsm->is_minimized) {
    print "Is not minimized";
  }
  else {
    print "Status unknown";
  };

Check, if an automaton is minimized or not. Returns undef if the status is unknown, otherwise returns either a true or a false value.

is_pruned

  if ($fsm->is_pruned) {
    print "Is pruned";
  }
  else if (defined $fsm->is_pruned) {
    print "Is not pruned";
  }
  else {
    print "Status unknown";
  };

Check, if an automaton is pruned or not. Returns undef if the status is unknown, otherwise returns either a true or a false value.

state_count

  print $fsm->state_count;

Return the number of states.

CONSTRUCTION METHODS

new

  my $fsm = Lingua::Foma->new('{cat}');

Create a new automaton based on a regular expression.

clone

  my $fsm2 = $fsm1->clone;

Create an exact copy of a finite state transducer.

AUTOMATON METHODS

Automaton methods modify existing automata. They are destructive, meaning that the invocant transducer will be modyfied without a copy. All automaton methods return their invocant to make them easily chainable.

unify

  # Regular Expression: A | B

  my $fsm_1 = Lingua::Foma->new('{climb}');
  my $fsm_2 = Lingua::Foma->new('{jump}');

  # Unify one transducer with another
  $fsm_1 = $fsm_1->unify($fsm_2);
  # Regex: {climb}|{jump}

  # Unify multiple automata
  $fsm_1->unify('{check}', '{restrict}');
  # Regex: {climb}|{jump}|{check}|{restrict}

Unify multiple finite state automata. The result may be nondeterministic and nonminimal. Automata are accepted as Lingua::Foma objects or as regular expressions. Returns the invocant for chaining.

concat

  # Regular Expression: A B

  my $fsm_1 = Lingua::Foma->new('{climb}');
  my $fsm_2 = Lingua::Foma->new('{ing}');

  # Concatenate one transducer with another
  $fsm_1 = $fsm_1->concat($fsm_2);
  # Regex: {climbing}

  # Concat multiple automata
  $fsm_1->concat('{tool}', 's');
  # Regex: {climbingtools}

Concatenate multiple finite state automata. The result may be nondeterministic and nonminimal. Automata are accepted as Lingua::Foma objects or as regular expressions. Returns the invocant for chaining.

I/O METHODS

The supported file format for loading and saving is the binary format of Foma.

load

  my $fsm = Lingua::Foma->load("my_automaton.foma");

Load an automaton by passing a filename. Automata can be saved using save.

Files with multiple transducers are currently not supported.

save

  $fsm = $fsm->save("may_automaton.foma");

Save an automaton by passing a filename. Automata can be loaded using load. Returns the invocant for chaining.

APPLICATION METHODS

up

  # Get all interpretation of "climbing"
  my @words = $fsm->up("climbing");

  # Iterate over all interpretations
  my $i = $fsm->up("climbing");
  while (my $string = $i->next) {
    print $string, "\n";
  };

Apply a word through the transducer from the lower language into the upper language. In an array context returns all possible interpretations. In a scalar context returns an Iterator.

down

  # Get all interpretation of "climb"
  my @words = $fsm->down("climb");

  # Iterate over all interpretations
  my $i = $fsm->down("climb");
  while (my $string = $i->next) {
    print $string, "\n";
  };

Apply a word through the transducer from the upper language into the lower language. In an array context returns all possible interpretations. In a scalar context returns an Iterator.

ITERATOR METHODS

All application methods return an iterator object to iterate over results. The iterator object provides the following methods.

next

  # Create an iterator object
  my $iter = $fsm->up("tree");

  # Iterate over all results
  while (my $string = $iter->next) {
    print $string, "\n";
  };

Returns the current matching result and forwards the iterator pointer to the next. Will return undef if no further result can be found.

KNOWN BUGS AND CAVEATS

Currently an iterator may loose a corresponding transducer, if the transducer is modified by a destructive operation. Operations involving the modification of transducers during iteration should therefore be avoided.

There are major leaks when saving and loading transducers.

As the bundled Foma library is not modyfied, we ship the attributed bugs ("many") as well.

AVAILABILITY

  https://github.com/Akron/Lingua-Foma

COPYRIGHT AND LICENSE

Lingua::Foma

Copyright (C) 2014, Nils Diewald.

This program is free software, you can redistribute it and/or modify it under the terms of the GNU General Public License version 2.

Foma 0.9.17alpha (bundled)

Copyright (C) 2008-2012, Mans Hulden.

Licensed under the terms of the GNU General Public License version 2.