The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

Name

Data::DFA - Deterministic finite state parser from regular expression

Synopsis

Create a deterministic finite state parser over the symbols 'a'..'e' for the regular expression: a (b|c)+ d? e

# Construct a deterministic finite state automaton from the regular expression:

  use Data::DFA qw(:all);
  use Data::Table::Text qw(:all);
  use Test::More qw(no_plan);

  my $dfa = dfaFromExpr
   (element("a"),
    oneOrMore(choice(element("b"), element("c"))),
    optional(element("d")),
    element("e")
   );

# Print the symbols used and the transitions table:

  is_deeply ['a'..'e'], [$dfa->symbols];

  ok $dfa->print("Dfa for a(b|c)+d?e :") eq nws <<END;
Dfa for a(b|c)+d?e :
Location  F  Transitions
       0  0  { a => 1 }
       1  0  { b => 2, c => 2 }
       2  0  { b => 2, c => 2, d => 6, e => 7 }
       6  0  { e => 7 }
       7  1  undef
END

# Create a parser and parse a string of symbols with the returned sub:

  my ($parser, $end, $next, $processed) = $dfa->parser;                         # New parser

  eval { &$parser($_) } for(qw(a b a));                                         # Try to parse a b a

  say STDERR $@;                                                                # Error message
#   Already processed: a b
#   Expected one of  : b c d e
#   But was given    : a

  is_deeply [&$next],      [qw(b c d e)];                                       # Next acceptable symbol
  is_deeply [&$processed], [qw(a b)];                                           # Symbols processed
  ok !&$end;                                                                    # Not at the end

Description

The following sections describe the methods in each functional area of this module. For an alphabetic listing of all methods by name see Index.

Construct regular expression

Construct a regular expression that defines the language to be parsed using the following combining operations which can all be imported:

element($)

One element.

     Parameter  Description       
  1  $label     Transition label  

Example:

  ok dfaFromExpr(element("a"))->print("Element: a") eq nws <<END;
  Element: a
  Location  F  Transitions
         0  0  { a => 1 }
         1  1  undef
  END
  

This is a static method and so should be invoked as:

  Data::DFA::element

sequence(@)

Sequence of elements.

     Parameter  Description  
  1  @elements  Elements     

Example:

  ok dfaFromExpr(sequence(element("a"), element("b")))
  
  ->print("Sequence: ab") eq nws <<END;
  Sequence: ab
  Location  F  Transitions
         0  0  { a => 1 }
         1  0  { b => 2 }
         2  1  undef
  END
  

This is a static method and so should be invoked as:

  Data::DFA::sequence

optional($)

Zero or one of an element.

     Parameter  Description  
  1  $element   Element      

Example:

  ok dfaFromExpr(element("a"), optional(element("b")), element("c"))
  
  ->print("Optional: ab?c") eq nws <<END;
  Optional: ab?c
  Location  F  Transitions
         0  0  { a => 1 }
         1  0  { b => 2, c => 3 }
         2  0  { c => 3 }
         3  1  undef
  END
  
  my $dfa = dfaFromExpr
  
  (element("a"),
  
  oneOrMore(choice(element("b"), element("c"))),
  
  optional(element("d")),
  
  element("e")
  
  );
  
  ok $dfa->print("Dfa for a(b|c)+d?e :") eq nws <<END;
  Dfa for a(b|c)+d?e :
  Location  F  Transitions
         0  0  { a => 1 }
         1  0  { b => 2, c => 2 }
         2  0  { b => 2, c => 2, d => 6, e => 7 }
         6  0  { e => 7 }
         7  1  undef
  END
  

This is a static method and so should be invoked as:

  Data::DFA::optional

zeroOrMore($)

Zero or more repetitions of an element.

     Parameter  Description             
  1  $element   Element to be repeated  

Example:

  ok dfaFromExpr(element("a"), zeroOrMore(element("b")), element("c"))
  
  ->print("Zero Or More: ab*c") eq nws <<END;
  Zero Or More: ab*c
  Location  F  Transitions
         0  0  { a => 1 }
         1  0  { b => 1, c => 4 }
         4  1  undef
  END
  

This is a static method and so should be invoked as:

  Data::DFA::zeroOrMore

oneOrMore($)

One or more repetitions of an element.

     Parameter  Description             
  1  $element   Element to be repeated  

Example:

  my $dfa = dfaFromExpr(element("a"), oneOrMore(element("b")), element("c"));
  
  ok $dfa->print("One or More: ab+c") eq nws <<END;
  One or More: ab+c
  Location  F  Transitions
         0  0  { a => 1 }
         1  0  { b => 2 }
         2  0  { b => 2, c => 4 }
         4  1  undef
  END
  

This is a static method and so should be invoked as:

  Data::DFA::oneOrMore

choice(@)

Choice from amongst one or more elements.

     Parameter  Description                 
  1  @elements  Elements to be chosen from  

Example:

  my $dfa = dfaFromExpr(element("a"),
  
  choice(element("b"), element("c")),
  
  element("d"));
  
  ok $dfa->print("Choice: (a(b|c)d") eq nws <<END;
  Choice: (a(b|c)d
  Location  F  Transitions
         0  0  { a => 1 }
         1  0  { b => 2, c => 2 }
         2  0  { d => 5 }
         5  1  undef
  END
  

This is a static method and so should be invoked as:

  Data::DFA::choice

Deterministic finite state parser

Create a deterministic finite state automaton to parse sequences of symbols in the language defined by a regular expression.

print($$$)

Print the current state of the finite automaton. If it is non deterministic, the non deterministic jumps will be shown as well as the transitions table. If deterministic, only the transitions table will be shown.

     Parameter  Description                             
  1  $states    States                                  
  2  $title     Title                                   
  3  $print     Print to STDERR if 2 or to STDOUT if 1  

Example:

  my $dfa = dfaFromExpr
  
  (element("a"),
  
  oneOrMore(choice(element("b"), element("c"))),
  
  optional(element("d")),
  
  element("e")
  
  );
  
  ok $dfa->print("Dfa for a(b|c)+d?e :") eq nws <<END;
  Dfa for a(b|c)+d?e :
  Location  F  Transitions
         0  0  { a => 1 }
         1  0  { b => 2, c => 2 }
         2  0  { b => 2, c => 2, d => 6, e => 7 }
         6  0  { e => 7 }
         7  1  undef
  END
  

dfaFromExpr(@)

Create a DFA from a regular expression.

     Parameter  Description  
  1  @expr      Expression   

This is a static method and so should be invoked as:

  Data::DFA::dfaFromExpr

parser(@)

Create a parser from a deterministic finite state automaton constructed from a regular expression.

     Parameter  Description                                                        
  1  $dfa       Deterministic finite state automaton generated from an expression  

Example:

  my $dfa = dfaFromExpr
  
  (element("a"),
  
  oneOrMore(choice(element("b"), element("c"))),
  
  optional(element("d")),
  
  element("e")
  
  );
  
  ok $dfa->print("Dfa for a(b|c)+d?e :") eq nws <<END;
  Dfa for a(b|c)+d?e :
  Location  F  Transitions
         0  0  { a => 1 }
         1  0  { b => 2, c => 2 }
         2  0  { b => 2, c => 2, d => 6, e => 7 }
         6  0  { e => 7 }
         7  1  undef
  END
  
  my ($parser, $end) = $dfa->parser;
  
  &$parser($_) for qw(a b);
  
  ok !&$end;
  
  my ($parser, $end) = $dfa->parser;
  
  &$parser($_) for qw(a b b c e);
  
  ok &$end;
  
  my ($parser, $end, $next, $processed) = $dfa->parser;
  
  eval{&$parser($_)} for(qw(a b a));
  
  ok !index(nws($@), nws <<END);
  Already processed: a b
  Expected one of  : b c d e
  But was given    : a
  END
  
  is_deeply [&$next],      [qw(b c d e)];
  
  is_deeply [&$processed], [qw(a b)];
  
  is_deeply [&$processed], [qw(a b)];
  

This is a static method and so should be invoked as:

  Data::DFA::parser

($)

Accept the next symbol drawn from the symbol set if possible by moving to a new state otherwise confessing with a helpful message

     Parameter  Description                                                
  1  $symbol    Next symbol to be processed by the finite state automaton  

()

Returns whether we are currently in a final state or not

()

Returns an array of symbols that would be accepted in the current state

()

Returns the array of symbols processed so far by this parser

Private Methods

nfaFromExpr2($$)

Create a DFA from an expression by pushing it on to the array of state transitions and connecting it up to existing states with jumps.

     Parameter  Description                     
  1  $states    States                          
  2  $expr      Expression to convert to a DFA  

nfaFromExpr(@)

Create an NFA from an expression.

     Parameter  Description  
  1  @expr      Expressions  

This is a static method and so should be invoked as:

  Data::DFA::nfaFromExpr

printNfa($$$)

Print the current state of a NFA.

     Parameter  Description                             
  1  $states    States                                  
  2  $title     Title                                   
  3  $print     Print to STDERR if 2 or to STDOUT if 1  

printDfa($$$)

Print the current state of a DFA.

     Parameter  Description                             
  1  $states    States                                  
  2  $title     Title                                   
  3  $print     Print to STDERR if 2 or to STDOUT if 1  

symbols($)

Return an array of all the transition symbols.

     Parameter  Description  
  1  $states    States       

reachableStates($$$$)

Find the names of all the states that can be reached from a specified state via a specified symbol and all the jumps available.

     Parameter   Description                                           
  1  $states     States                                                
  2  $stateName  Name of start state                                   
  3  $symbol     Symbol                                                
  4  $targets    Optional array reference of reachable targets so far  

removeJumpsFromState($$)

Remove the jumps from a state

     Parameter   Description                        
  1  $states     States                             
  2  $stateName  Name of the state to be dejumped.  

reachableFrom($$$)

Find the names of all the states that can be reached from a specified state using any symbol.

     Parameter   Description                                          
  1  $states     States                                               
  2  $stateName  Name of start state                                  
  3  $targets    Optional hash reference of reachable targets so far  

removeJumpsFromAllStates($)

Remove the jumps from every state.

     Parameter  Description  
  1  $states    States       

removeDuplicateStates($)

Remove any states with duplicate transition sets redirecting transitions to the surviving state.

     Parameter  Description  
  1  $states    States       

Index

1

2 choice

3 dfaFromExpr

4 element

5 nfaFromExpr

6 nfaFromExpr2

7 oneOrMore

8 optional

9 parser

10 print

11 printDfa

12 printNfa

13 reachableFrom

14 reachableStates

15 removeDuplicateStates

16 removeJumpsFromAllStates

17 removeJumpsFromState

18 sequence

19 symbols

20 zeroOrMore

Installation

This module is written in 100% Pure Perl and, thus, it is easy to read, use, modify and install.

Standard Module::Build process for building and installing modules:

  perl Build.PL
  ./Build
  ./Build test
  ./Build install

Author

philiprbrenan@gmail.com

http://www.appaapps.com

Copyright

Copyright (c) 2016-2018 Philip R Brenan.

This module is free software. It may be used, redistributed and/or modified under the same terms as Perl itself.