NAME
Parse::Marpa::Doc::Plumbing - The Plumbing Interface
DESCRIPTION
This document describes Marpa's plumbing Interface. The plumbing is the low-level interface used by all the porcelain interfaces. The plumbing can be used directly. It is a short list of named arguments to the Parse::Marpa::Grammar::new()
, Parse::Marpa::Grammar::set()
, and Parse::Marpa::Recognizer::new()
methods.
The plumbing and porcelain interfaces may not be used to build the same grammar. Marpa throws an exception if the user attempts to use a porcelain interface with any of the plumbing's named arguments, other than the start
argument.
Plumbing Symbol Names
Each interface has its own rules for symbol names. The plumbing's conventions are designed to allow flexibility for the porcelain. Any valid Perl string not ending in a right square bracket is an acceptable plumbing symbol name. Plumbing symbol names which end in right square brackets are reserved for Marpa internal use.
Unlike MDL, plumbing symbols are not considered identical unless their names match exactly. Unless stated otherwise, any reference to a symbol name in this document means a plumbing symbol name.
METHOD
Parse::Marpa::Grammar::get_symbol
my $minus = $grammar->get_symbol("minus");
my $number
= Parse::Marpa::Grammar::get_symbol($grammar, "number");
Given a symbol's plumbing name, returns the symbol's cookie. It returns undefined if a symbol with that name doesn't exist. If you are using MDL to define your grammar, you want to use Parse::Marpa::MDL::get_symbol
instead.
Symbol cookies are used primarily when calling the Parse::Marpa::Recognizer::earleme
method. To get the cookie for a symbol using its porcelain name, see the documentation for the individual porcelain interface.
NAMED ARGUMENTS
The rules
Named Argument
The rules
named argument is available with both the Parse::Marpa::Grammar::new
and Parse::Marpa::Grammar::set
methods. The rules
named argument may be specified multiple times, adding new rules to the grammar each time. New rules may be added until the grammar is precomputed.
The value of the rules
named argument must be a reference to an array, and each element of the array must be a reference to a description of a rule. Rule descriptions can be either arrays (the short form) or hashes (the long form).
Short Form
The short form description of a rule is an array with 4 elements: lhs, rhs, action and priority. The last two of these are optional.
The lhs element must be the name of the left hand side symbol. The rhs element must be a reference to an array of names of right hand side symbol names. In the case of an empty rule, rhs must be a reference to a zero length array.
The action element, if present, must be a string describing the rule's action in the current Marpa semantics. Right now, the only available semantics is Perl 5. If the action for a rule is not explicitly set, it will be the value of Marpa's default_action
option.
The priority element, if present, must be an integer. It can be negative. It will be the priority of the rule. If undefined, priority defaults to zero.
Long Form
The long form description of a rule is a hash of rule options, with the option names as the hash keys, and the option values as the hash values. The available rule options are:
lhs
,rhs
,action
, andpriority
-
The values of the
lhs
,rhs
,action
, andpriority
rule options are as described above for the corresponding elements of the short form. min
andmax
-
If defined,
min
must be a non-negative integer. If defined,max
must be an integer greater than zero, and greater than or equal tomin
.min
andmax
both undefinedThe rule is an ordinary BNF production.
min
defined, butmax
undefinedThe rule is a sequence production. Only one symbol is allowed on the right hand side. It is not allowed be a nullable symbol. The rhs must be repeated at least
min
times and may be repeated an unlimited number of times.min
undefined, butmax
definedThe rule is not valid. An exception is thrown.
min
andmax
both defined and both equal to 1The rule is an ordinary BNF production.
min
andmax
both defined, but not both equal to 1The rule is a counted sequence production. Only one symbol is allowed on the right hand side. It is not allowed be a nullable symbol. It may be repeated anywhere from
min
tomax
times.
For an introduction to sequence productions, see the MDL document.
separator
-
Any sequence production may have a
separator
defined. The value must be a symbol name. Marpa allows trailing separators, Perl style. The separator must not be a nullable symbol.
Duplicate Rules
Marpa throws an exception if a duplicate rule is added. For BNF productions, a rule is considered a duplicate if it has the same left hand side symbol, and the same symbols in the same order on the right hand side.
For sequences, a rule is considered a duplicate if it has the same left hand symbol, the same right hand side symbol, and the same separator. It's possible that some of the sequences banned by this rule are not pathological, but this restriction can be worked around by writing them as BNF productions. Even if the banned sequences are not pathological, their semantics are probably tricky, and writing them as BNF productions may be for the best.
The terminals
Named Argument
The value of the terminals
name argument must be a reference to an array of references to terminal descriptions. Terminal descriptions are arrays of two elements. The first element is the symbol name of the terminal. The second element must be a reference to a hash of terminal options, with option names as hash keys and option values as hash values.
Terminal Options
regex
-
The value of the
regex
terminal option must be a regular expression. It will be used to match the terminal in the input text. Only one of theregex
andaction
terminal options may be specified. See the MDL document for details on writing terminal regexes. action
-
The value of the
action
terminal option must be a string with code in the current semantics. Right now the only available semantics is Perl 5. The code will be interpreted as a lex action, which will be used to match the terminal in the input text. Only one of theregex
andaction
terminal options may be specified. See the MDL document for details on writing lex actions. prefix
-
The value of the
prefix
terminal option must be a regular expression. It will be used to match and discard text from the input before any attempt is made to match the terminal itself. The most common use is to discard leading whitespace. priority
-
The value of the priority terminal option must be an integer. It can be negative. It will control the order in which terminal matches are attempted.
The start
Named Argument
The value of the start named argument must be a plumbing symbol name. It will be used as the start symbol for the grammar. Most of the plumbing named arguments may not be used in combination with a porcelain interface. The start
named argument is an exception. It may be used to set the default for, or to override the choice of, the start symbol in the porcelain.
If you use the start
named argument to specify a porcelain symbol, you must be careful to use the plumbing symbol name. The documentation for the porcelain should describe how its symbol names can be converted to plumbing symbol names.
SUPPORT
See the support section in the main module.
AUTHOR
Jeffrey Kegler
COPYRIGHT
Copyright 2007 - 2008 Jeffrey Kegler
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.