The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

Name

Data::NFA - Non deterministic finite state machine from regular expression.

Synopsis

Create a non deterministic finite state machine from a regular expression which can then be converted into a deterministic finite state machine by Data::DFA and used to parse sequences of symbols.

For example, the regular expression:

  ((a|b)*)**4

produces the following machine:

  use Data::NFA qw(:all);
  use Data::Table::Text qw(:all);
  use Test::More qw(no_plan);

  my $N = 4;

  my $s = q(zeroOrMore(choice(element("a"), element("b"))));

  my $nfa = eval qq(fromExpr(($s)x$N));

  ok $nfa->printNws("((a|b)*)**$N: ") eq nws <<END;
((a|b)*)**4:
Location  F  Transitions  Jumps
       0  1  { a => 1 }   [2, 4, 6, 8, 10, 12, 14, 16]
       1  1  undef        [0, 2, 3, 4, 6, 8, 10, 12, 14, 16]
       2  0  { b => 3 }   undef
       3  1  undef        [0, 2, 4, 6, 8, 10, 12, 14, 16]
       4  1  { a => 5 }   [6, 8, 10, 12, 14, 16]
       5  1  undef        [4, 6, 7, 8, 10, 12, 14, 16]
       6  0  { b => 7 }   undef
       7  1  undef        [4, 6, 8, 10, 12, 14, 16]
       8  1  { a => 9 }   [10, 12, 14, 16]
       9  1  undef        [8, 10, 11, 12, 14, 16]
      10  0  { b => 11 }  undef
      11  1  undef        [8, 10, 12, 14, 16]
      12  1  { a => 13 }  [14, 16]
      13  1  undef        [12, 14, 15, 16]
      14  0  { b => 15 }  undef
      15  1  undef        [12, 14, 16]
      16  1  undef        undef
END

Description

Non deterministic finite state machine from regular expression.

Version 20200621.

The following sections describe the methods in each functional area of this module. For an alphabetic listing of all methods by name see Index.

Construct regular expression

Construct a regular expression that defines the language to be parsed using the following combining operations which can all be imported:

element($label)

One element. An element can also be represented by a string or number

     Parameter  Description
  1  $label     Transition symbol

Example:

    my $nfa = fromExpr(𝗲𝗹𝗲𝗺𝗲𝗻𝘁("a"));
    ok $nfa->print("Element: a") eq <<END;
  Element: a
  Location  F  Transitions  Jumps
         0     undef        [1]
         1     { a => 2 }   undef
         2  1  undef        undef\
  END
    ok  $nfa->isFinal(2);
    ok !$nfa->isFinal(0);
    ok  $nfa->parse(qw(a));
    ok !$nfa->parse(qw(a b));
    ok !$nfa->parse(qw(b));
    ok !$nfa->parse(qw(b a));

This is a static method and so should either be imported or invoked as:

  Data::NFA::element

sequence(@elements)

Sequence of elements and/or symbols.

     Parameter  Description
  1  @elements  Elements

Example:

    my $nfa = fromExpr(qw(a b));
    is_deeply $nfa->print("ab"), <<END;
  ab
  Location  F  Transitions  Jumps
         0     undef        [1]
         1     { a => 2 }   undef
         2     undef        [3]
         3     { b => 4 }   undef
         4  1  undef        undef
  END
    ok !$nfa->parse(qw());
    ok  $nfa->parse(qw(a b));
    ok !$nfa->parse(qw(b a));
    ok !$nfa->parse(qw(a));
    ok !$nfa->parse(qw(b));

This is a static method and so should either be imported or invoked as:

  Data::NFA::sequence

optional(@element)

An optional sequence of elements and/or symbols.

     Parameter  Description
  1  @element   Elements

Example:

    my $nfa = fromExpr("a", 𝗼𝗽𝘁𝗶𝗼𝗻𝗮𝗹("b"), "c");
    is_deeply $nfa->print("ab?c"), <<END;
  ab?c
  Location  F  Transitions  Jumps
         0     undef        [1]
         1     { a => 2 }   undef
         2     undef        [3 .. 6]
         3     undef        [4, 5, 6]
         4     { b => 5 }   undef
         5     undef        [6]
         6     { c => 7 }   undef
         7  1  undef        undef
  END
    ok !$nfa->parse(qw(a));
    ok  $nfa->parse(qw(a b c));
    ok  $nfa->parse(qw(a c));
    ok !$nfa->parse(qw(a c b));

This is a static method and so should either be imported or invoked as:

  Data::NFA::optional

zeroOrMore(@element)

Zero or more repetitions of a sequence of elements and/or symbols.

     Parameter  Description
  1  @element   Elements

Example:

    my $nfa = fromExpr("a", 𝘇𝗲𝗿𝗼𝗢𝗿𝗠𝗼𝗿𝗲("b"), "c");
    is_deeply $nfa->print("ab*c"), <<END;
  ab*c
  Location  F  Transitions  Jumps
         0     undef        [1]
         1     { a => 2 }   undef
         2     undef        [3, 4, 6, 7]
         3     undef        [4, 6, 7]
         4     { b => 5 }   undef
         5     undef        [3, 4, 6, 7]
         6     undef        [7]
         7     { c => 8 }   undef
         8  1  undef        undef
  END
    ok  $nfa->parse(qw(a c));
    ok  $nfa->parse(qw(a b c));
    ok  $nfa->parse(qw(a b b c));
    ok !$nfa->parse(qw(a b b d));

    my $nfa = fromExpr("a",
                       𝘇𝗲𝗿𝗼𝗢𝗿𝗠𝗼𝗿𝗲(choice("a",
                       "a")),
                       "a");
    is_deeply $nfa->print("(a(a|a)*a"), <<END;
  (a(a|a)*a
  Location  F  Transitions  Jumps
         0     undef        [1]
         1     { a => 2 }   undef
         2     undef        [3, 4, 5, 7, 8, 10, 11]
         3     undef        [4, 5, 7, 8, 10, 11]
         4     undef        [5, 7, 8]
         5     { a => 6 }   undef
         6     undef        [3, 4, 5, 7 .. 11]
         7     undef        [8]
         8     { a => 9 }   undef
         9     undef        [3, 4, 5, 7, 8, 10, 11]
        10     undef        [11]
        11     { a => 12 }  undef
        12  1  undef        undef
  END

    ok !$nfa->parse(qw(a));
    ok  $nfa->parse(qw(a a));
    ok  $nfa->parse(qw(a a a));
    ok !$nfa->parse(qw(a b a));

This is a static method and so should either be imported or invoked as:

  Data::NFA::zeroOrMore

oneOrMore(@element)

One or more repetitions of a sequence of elements and/or symbols.

     Parameter  Description
  1  @element   Elements

Example:

    my $nfa = fromExpr("a", 𝗼𝗻𝗲𝗢𝗿𝗠𝗼𝗿𝗲("b"), "c");

    is_deeply $nfa->print("One or More: ab+c"), <<END;
  One or More: ab+c
  Location  F  Transitions  Jumps
         0     undef        [1]
         1     { a => 2 }   undef
         2     undef        [3, 4]
         3     undef        [4]
         4     { b => 5 }   undef
         5     undef        [3, 4, 6, 7]
         6     undef        [7]
         7     { c => 8 }   undef
         8  1  undef        undef
  END

    ok !$nfa->parse(qw(a c));
    ok  $nfa->parse(qw(a b c));
    ok  $nfa->parse(qw(a b b c));
    ok !$nfa->parse(qw(a b b d));

This is a static method and so should either be imported or invoked as:

  Data::NFA::oneOrMore

choice(@elements)

Choice from amongst one or more elements and/or symbols.

     Parameter  Description
  1  @elements  Elements to be chosen from

Example:

    my $nfa = fromExpr("a",
                       𝗰𝗵𝗼𝗶𝗰𝗲(qw(b c)),
                       "d");
    is_deeply $nfa->print("(a(b|c)d"), <<END;
  (a(b|c)d
  Location  F  Transitions  Jumps
         0     undef        [1]
         1     { a => 2 }   undef
         2     undef        [3, 4, 6, 7]
         3     undef        [4, 6, 7]
         4     { b => 5 }   undef
         5     undef        [8, 9]
         6     undef        [7]
         7     { c => 8 }   undef
         8     undef        [9]
         9     { d => 10 }  undef
        10  1  undef        undef
  END

    ok  $nfa->parse(qw(a b d));
    ok  $nfa->parse(qw(a c d));
    ok !$nfa->parse(qw(a b c d));

This is a static method and so should either be imported or invoked as:

  Data::NFA::choice

except(@elements)

Choice from amongst all symbols except the ones mentioned

     Parameter  Description
  1  @elements  Elements not to be chosen from

Example:

    my $nfa = fromExpr(choice(qw(a b c)), 𝗲𝘅𝗰𝗲𝗽𝘁(qw(c x)), choice(qw(a b c)));

    is_deeply $nfa->print("(a|b|c)(c!x)(a|b|c)"), <<END;
  (a|b|c)(c!x)(a|b|c)
  Location  F  Transitions  Jumps
         0     undef        [1, 2, 4, 5, 7, 8]
         1     undef        [2, 4, 5, 7, 8]
         2     { a => 3 }   undef
         3     undef        [9, 10, 11, 13, 14]
         4     undef        [5]
         5     { b => 6 }   undef
         6     undef        [9, 10, 11, 13, 14]
         7     undef        [8]
         8     { c => 9 }   undef
         9     undef        [10, 11, 13, 14]
        10     undef        [11, 13, 14]
        11     { a => 12 }  undef
        12     undef        [15, 16, 17, 19, 20, 22, 23]
        13     undef        [14]
        14     { b => 15 }  undef
        15     undef        [16, 17, 19, 20, 22, 23]
        16     undef        [17, 19, 20, 22, 23]
        17     { a => 18 }  undef
        18  1  undef        [24]
        19     undef        [20]
        20     { b => 21 }  undef
        21  1  undef        [24]
        22     undef        [23]
        23     { c => 24 }  undef
        24  1  undef        undef
  END

    ok !$nfa->parse(qw(a a));
    ok  $nfa->parse(qw(a a a));
    ok !$nfa->parse(qw(a c a));

This is a static method and so should either be imported or invoked as:

  Data::NFA::except

Non deterministic finite state machine

Create a non deterministic finite state machine to represent a regular expression.

fromExpr(@expression)

Create an NFA from a regular @expression.

     Parameter    Description
  1  @expression  Regular expressions

Example:

    my $nfa = 𝗳𝗿𝗼𝗺𝗘𝘅𝗽𝗿
     ("a",
      oneOrMore(choice(qw(b c))),
      optional("d"),
      element("e")
     );

    is_deeply $nfa->print("a(b|c)+d?e"), <<END;
  a(b|c)+d?e
  Location  F  Transitions  Jumps
         0     undef        [1]
         1     { a => 2 }   undef
         2     undef        [3, 4, 5, 7, 8]
         3     undef        [4, 5, 7, 8]
         4     undef        [5, 7, 8]
         5     { b => 6 }   undef
         6     undef        [3, 4, 5, 7 .. 14]
         7     undef        [8]
         8     { c => 9 }   undef
         9     undef        [3, 4, 5, 7, 8, 10 .. 14]
        10     undef        [11 .. 14]
        11     undef        [12, 13, 14]
        12     { d => 13 }  undef
        13     undef        [14]
        14     { e => 15 }  undef
        15  1  undef        undef
  END

    is_deeply ['a'..'e'], [$nfa->symbols];

    ok !$nfa->parse(qw(a e));
    ok !$nfa->parse(qw(a d e));
    ok  $nfa->parse(qw(a b c e));
    ok  $nfa->parse(qw(a b c d e));

This is a static method and so should either be imported or invoked as:

  Data::NFA::fromExpr

print($states, $title)

Print the current $states of the non deterministic finite state automaton using the specified $title. If it is non deterministic, the non deterministic jumps will be shown as well as the transitions table. If deterministic, only the transitions table will be shown.

     Parameter  Description
  1  $states    States
  2  $title     Title

Example:

    my $nfa = fromExpr
     ("a",
      oneOrMore(choice(qw(b c))),
      optional("d"),
      element("e")
     );

    is_deeply $nfa->𝗽𝗿𝗶𝗻𝘁("a(b|c)+d?e"), <<END;
  a(b|c)+d?e
  Location  F  Transitions  Jumps
         0     undef        [1]
         1     { a => 2 }   undef
         2     undef        [3, 4, 5, 7, 8]
         3     undef        [4, 5, 7, 8]
         4     undef        [5, 7, 8]
         5     { b => 6 }   undef
         6     undef        [3, 4, 5, 7 .. 14]
         7     undef        [8]
         8     { c => 9 }   undef
         9     undef        [3, 4, 5, 7, 8, 10 .. 14]
        10     undef        [11 .. 14]
        11     undef        [12, 13, 14]
        12     { d => 13 }  undef
        13     undef        [14]
        14     { e => 15 }  undef
        15  1  undef        undef
  END

    is_deeply ['a'..'e'], [$nfa->symbols];

    ok !$nfa->parse(qw(a e));
    ok !$nfa->parse(qw(a d e));
    ok  $nfa->parse(qw(a b c e));
    ok  $nfa->parse(qw(a b c d e));

symbols($states)

Return an array of all the transition symbols.

     Parameter  Description
  1  $states    States

Example:

    my $nfa = fromExpr
     ("a",
      oneOrMore(choice(qw(b c))),
      optional("d"),
      element("e")
     );

    is_deeply $nfa->print("a(b|c)+d?e"), <<END;
  a(b|c)+d?e
  Location  F  Transitions  Jumps
         0     undef        [1]
         1     { a => 2 }   undef
         2     undef        [3, 4, 5, 7, 8]
         3     undef        [4, 5, 7, 8]
         4     undef        [5, 7, 8]
         5     { b => 6 }   undef
         6     undef        [3, 4, 5, 7 .. 14]
         7     undef        [8]
         8     { c => 9 }   undef
         9     undef        [3, 4, 5, 7, 8, 10 .. 14]
        10     undef        [11 .. 14]
        11     undef        [12, 13, 14]
        12     { d => 13 }  undef
        13     undef        [14]
        14     { e => 15 }  undef
        15  1  undef        undef
  END

    is_deeply ['a'..'e'], [$nfa->𝘀𝘆𝗺𝗯𝗼𝗹𝘀];

    ok !$nfa->parse(qw(a e));
    ok !$nfa->parse(qw(a d e));
    ok  $nfa->parse(qw(a b c e));
    ok  $nfa->parse(qw(a b c d e));

isFinal($states, $state)

Whether, in the $states specifying an NFA the named state $state is a final state.

     Parameter  Description
  1  $states    States
  2  $state     Name of state to test

Example:

    my $nfa = fromExpr(element("a"));
    ok $nfa->print("Element: a") eq <<END;
  Element: a
  Location  F  Transitions  Jumps
         0     undef        [1]
         1     { a => 2 }   undef
         2  1  undef        undef\
  END
    ok  $nfa->𝗶𝘀𝗙𝗶𝗻𝗮𝗹(2);
    ok !$nfa->𝗶𝘀𝗙𝗶𝗻𝗮𝗹(0);
    ok  $nfa->parse(qw(a));
    ok !$nfa->parse(qw(a b));
    ok !$nfa->parse(qw(b));
    ok !$nfa->parse(qw(b a));

allTransitions($states)

Return all transitions in the NFA specified by $states as {stateName}{symbol} = [reachable states].

     Parameter  Description
  1  $states    States

Example:

    my $s = q(zeroOrMore(choice("a")));

    my $nfa = eval qq(fromExpr(sequence($s,$s)));

    is_deeply $nfa->print("a*"), <<END;
  a*
  Location  F  Transitions  Jumps
         0  1  undef        [1 .. 4, 6 .. 9, 11]
         1  1  undef        [2, 3, 4, 6 .. 9, 11]
         2  1  undef        [3, 4, 6 .. 9, 11]
         3     undef        [4]
         4     { a => 5 }   undef
         5  1  undef        [2, 3, 4, 6 .. 9, 11]
         6  1  undef        [7, 8, 9, 11]
         7  1  undef        [8, 9, 11]
         8     undef        [9]
         9     { a => 10 }  undef
        10  1  undef        [7, 8, 9, 11]
        11  1  undef        undef
  END

    ok  $nfa->parse(qw());
    ok  $nfa->parse(qw(a));
    ok !$nfa->parse(qw(b));
    ok  $nfa->parse(qw(a a));
    ok !$nfa->parse(qw(b b));
    ok !$nfa->parse(qw(a b));
    ok !$nfa->parse(qw(b a));
    ok !$nfa->parse(qw(c));

    is_deeply $nfa->𝗮𝗹𝗹𝗧𝗿𝗮𝗻𝘀𝗶𝘁𝗶𝗼𝗻𝘀, {
    "0"  => { a => [10, 11, 2 .. 9] },
    "1"  => { a => [10, 11, 2 .. 9] },
    "2"  => { a => [10, 11, 2 .. 9] },
    "3"  => { a => [11, 2 .. 9] },
    "4"  => { a => [11, 2 .. 9] },
    "5"  => { a => [10, 11, 2 .. 9] },
    "6"  => { a => [10, 11, 7, 8, 9] },
    "7"  => { a => [10, 11, 7, 8, 9] },
    "8"  => { a => [10, 11, 7, 8, 9] },
    "9"  => { a => [10, 11, 7, 8, 9] },
    "10" => { a => [10, 11, 7, 8, 9] },
    "11" => { a => [] },
  };

    is_deeply $nfa->print("a*a* 2"), <<END;
  a*a* 2
  Location  F  Transitions  Jumps
         0  1  undef        [1 .. 4, 6 .. 9, 11]
         1  1  undef        [2, 3, 4, 6 .. 9, 11]
         2  1  undef        [3, 4, 6 .. 9, 11]
         3     undef        [4]
         4     { a => 5 }   undef
         5  1  undef        [2, 3, 4, 6 .. 9, 11]
         6  1  undef        [7, 8, 9, 11]
         7  1  undef        [8, 9, 11]
         8     undef        [9]
         9     { a => 10 }  undef
        10  1  undef        [7, 8, 9, 11]
        11  1  undef        undef
  END

parse($states, @symbols)

Parse, using the NFA specified by $states, the list of symbols in @symbols.

     Parameter  Description
  1  $states    States
  2  @symbols   Array of symbols

Example:

    my $nfa = fromExpr(element("a"));
    ok $nfa->print("Element: a") eq <<END;
  Element: a
  Location  F  Transitions  Jumps
         0     undef        [1]
         1     { a => 2 }   undef
         2  1  undef        undef\
  END
    ok  $nfa->isFinal(2);
    ok !$nfa->isFinal(0);
    ok  $nfa->𝗽𝗮𝗿𝘀𝗲(qw(a));
    ok !$nfa->𝗽𝗮𝗿𝘀𝗲(qw(a b));
    ok !$nfa->𝗽𝗮𝗿𝘀𝗲(qw(b));
    ok !$nfa->𝗽𝗮𝗿𝘀𝗲(qw(b a));

Data::NFA::State Definition

NFA State

Output fields

final - Whether this state is final

jumps - {to => 1} : jumps from this state not consuming any input symbols

transitions - {symbol => state} : transitions from this state consuming one input symbol

Private Methods

newNfa(%options)

Create a new NFA

     Parameter  Description
  1  %options   Options

newNfaState(%options)

Create a new NFA state.

     Parameter  Description
  1  %options   Options

addNewState($nfa)

Create a new NFA state and add it to an NFA created with newNfa.

     Parameter  Description
  1  $nfa       Nfa

fromExpr2($states, $expr, $symbols)

Create an NFA from a regular expression.

     Parameter  Description
  1  $states    States
  2  $expr      Regular expression constructed from L<element|/element> L<sequence|/sequence> L<optional|/optional> L<zeroOrMore|/zeroOrMore> L<oneOrMore|/oneOrMore> L<choice|/choice>
  3  $symbols   Set of symbols used by the NFA.

propagateFinalState($states)

Mark the $states that can reach the final state with a jump as final.

     Parameter  Description
  1  $states    States

statesReachableViaJumps($states, $StateName)

Find the names of all the $states that can be reached from a specified $stateName via jumps alone.

     Parameter   Description
  1  $states     States
  2  $StateName  Name of start state

removeEmptyFields($states)

Remove empty fields from the states representing an NFA.

     Parameter  Description
  1  $states    States

printFinalState($state)

Print the final field of the specified $state.

     Parameter  Description
  1  $state     State

printWithJumps($states, $title)

Print the current $states of an NFA with jumps using the specvified $title.

     Parameter  Description
  1  $states    States
  2  $title     Optional title

printWithOutJumps($states, $title)

Print the current $states of an NFA without jumps using the specified $title.

     Parameter  Description
  1  $states    States
  2  $title     Title.

statesReachableViaSymbol($states, $StateName, $symbol, $cache)

Find the names of all the states that can be reached from a specified state via a specified symbol and all the jumps available.

     Parameter   Description
  1  $states     States
  2  $StateName  Name of start state
  3  $symbol     Symbol to reach on
  4  $cache      A hash to be used as a cache

parse2($states, $stateName, @symbols)

Parse an array of symbols

     Parameter   Description
  1  $states     States
  2  $stateName  Current state
  3  @symbols    Remaining symbols

Index

1 addNewState - Create a new NFA state and add it to an NFA created with newNfa.

2 allTransitions - Return all transitions in the NFA specified by $states as {stateName}{symbol} = [reachable states].

3 choice - Choice from amongst one or more elements and/or symbols.

4 element - One element.

5 except - Choice from amongst all symbols except the ones mentioned

6 fromExpr - Create an NFA from a regular @expression.

7 fromExpr2 - Create an NFA from a regular expression.

8 isFinal - Whether, in the $states specifying an NFA the named state $state is a final state.

9 newNfa - Create a new NFA

10 newNfaState - Create a new NFA state.

11 oneOrMore - One or more repetitions of a sequence of elements and/or symbols.

12 optional - An optional sequence of elements and/or symbols.

13 parse - Parse, using the NFA specified by $states, the list of symbols in @symbols.

14 parse2 - Parse an array of symbols

15 print - Print the current $states of the non deterministic finite state automaton using the specified $title.

16 printFinalState - Print the final field of the specified $state.

17 printWithJumps - Print the current $states of an NFA with jumps using the specvified $title.

18 printWithOutJumps - Print the current $states of an NFA without jumps using the specified $title.

19 propagateFinalState - Mark the $states that can reach the final state with a jump as final.

20 removeEmptyFields - Remove empty fields from the states representing an NFA.

21 sequence - Sequence of elements and/or symbols.

22 statesReachableViaJumps - Find the names of all the $states that can be reached from a specified $stateName via jumps alone.

23 statesReachableViaSymbol - Find the names of all the states that can be reached from a specified state via a specified symbol and all the jumps available.

24 symbols - Return an array of all the transition symbols.

25 zeroOrMore - Zero or more repetitions of a sequence of elements and/or symbols.

Installation

This module is written in 100% Pure Perl and, thus, it is easy to read, comprehend, use, modify and install via cpan:

  sudo cpan install Data::NFA

Author

philiprbrenan@gmail.com

http://www.appaapps.com

Copyright

Copyright (c) 2016-2019 Philip R Brenan.

This module is free software. It may be used, redistributed and/or modified under the same terms as Perl itself.