NAME
Parse::Marpa::Doc::Plumbing - Marpa's Plumbing Interface
OVERVIEW
This document describes Marpa's Plumbing Interface. It is the low-level interface used by all others. The Plumbing can be used directly. It is a short list of method options to the Parse::Marpa::new()
, Parse::Marpa::set()
, and Parse::Marpa::Recognizer::new()
methods.
Marpa does not allow use of both the raw interface and a high level interface to build the same grammar, with a minor exception as described below. Marpa throws an exception if the user attempts to build the same grammar using both kinds of interface.
METHODS
get_symbol
my $minus = $grammar->get_symbol("minus");
my $number
= Parse::Marpa::Grammar::get_symbol($grammar, "number");
This Parse::Marpa::Grammar
method, Given a symbol's raw interface name, returns the symbol's "cookie". It returns undefined if a symbol with that name doesn't exist.
Symbol cookies are used primarily for calling of the Parse::Marpa::Recognizer::earleme
method. To get the cookie for a symbol using a high-level interface symbol name, see the documentation for the individual high level interface.
If you are using MDL to define your grammar, you probably want to use Parse::Marpa::MDL::get_symbol
instead, so that the conversion from MDL name to raw interface name is handled for you.
RAW INTERFACE SYMBOL NAMES
Each interface has its own rules for symbol names. The raw interface's conventions are designed to allow the most flexibility to higher level interfaces.
Any valid Perl string not ending in a right square bracket is an acceptable raw interface symbol name. Raw interface symbol names which end in right square brackets are reserved for Marpa internal use.
Unlike MDL, raw interface symbols are not considered identical unless their names match exactly. Unless stated otherwise, any reference to a "symbol name" in this document means its raw interface symbol name.
THE RULES METHOD OPTION
The rules
option is available with both Parse::Marpa::new()
and Parse::Marpa::set()
. The rules
option may be specified multiple times, adding new rules to the grammar each time. New rules may continue to be added until the grammar is precomputed.
The value of the rules option must be a reference to an array, and each element of the array must be a reference to a description of a rule. Rule descriptions can be either "short form" (in which case they are arrays) or "long form" (in which case they are hashes).
Short Form Rules
The short form description of a rule is an array with 4 elements, the last two of which are optional. The first element must be the name of the left hand side symbol. The second element must be a reference to an array of names of right hand side symbol names. In the case of an empty rule, the right hand side symbol array is zero length.
The third and fourth two elements of the short form rule array are optional. The third element, if present, must be a string describing the rule's action in the current Marpa semantics. Right now, the only available semantics is Perl 5. If undefined, the "default action" (returning an undefined value) will be used. The default action is a Marpa predefined, and can be reset.
The fourth and last element, if present, must be an integer. It can be negative. It will be the priority of the rule. If undefined, a rule's priority defaults to zero.
Long Form Rules
The long form description of a rule is a hash, which is treated as option, value pairs. The available long form rules options are:
lhs
,rhs
,action
, andpriority
-
The values of the
lhs
,rhs
,action
, andpriority
rules options follow the same rules as for the corresponding elements of the short form array description, described above. min
andmax
-
The values of the
min
andmax
options must be non-negative integers.max
may be undefined, or bothmin
andmax
may be undefined. Ifmax
is defined,min
must be defined. If defined,max
cannot be zero, and must be greater than or equal tomin
.min
andmax
determine whether the production is a sequence production or a BNF production. Ifmin
andmax
are both undefined, or are both 1, the rule is BNF production. If bothmin
andmax
are defined and greater than one, the rule is a "counted" sequence, and the right hand side symbol may be repeated anywhere frommin
tomax
times. Ifmax
is not defined, the rule is a potentially infinite sequence where the right hand side must be repeated at leastmin
times.Only one symbol is allowed on the right hand side of a sequence production, and it is repeated according to
min
andmax
. The right hand side symbol is not allowed be a nullable symbol. For an introduction to sequence productions, see the MDL document. separator
-
Any sequence production may have a
separator
defined. The value must be a symbol name. Marpa allows trailing separators, Perl style. The separator is not allowed to be a nullable symbol.
Duplicate Rules
Marpa throws an exception if a duplicate rule is added. For BNF productions, a rule is considered a duplicate if it has the same left hand side symbol, and the same symbols in the same order on the right hand side.
For sequences, a rule is considered a duplicate if it has the same left hand symbol, the same right hand side symbol, and the same separator. It's possible this prevents some non-pathological uses of sequences, but that can be worked around by writing them as BNF productions. Since the semantics sequences in such cases would be tricky, writing them as BNF productions may be the best thing, anyway.
THE TERMINALS METHOD OPTION
The value of the terminals
option must be a reference to an array of references to terminal descriptions. Terminal descriptions are arrays of two elements. The first element is the symbol name for the terminal. The second element must be a reference to a hash of terminals
suboptions, as suboption, value pairs.
Suboptions for the terminals
option.
regex
-
The value of the
regex
suboption must be a regular expression. It will be use to match the terminal in the input stream.Only one of the
regex
andaction
suboptions may be specified. See the MDL document for details on writing regexes. action
-
The value of the action suboption must be a string with code in the current semantics. Right now the only available semantics is Perl 5. The lex action will be used to match the terminal in the input stream.
Only one of the
regex
andaction
suboptions may be specified. See the MDL document for details on writing lex actions. prefix
-
The value of the
prefix
suboption must be a regular expression. It will be used to match and discard text from the input stream before any attempt is made to match the terminal itself. The most common use is to discard leading whitespace. priority
-
The value of the priority suboption must be an integer. It can be negative. It will control the order in which terminal matches are attempted.
THE START OPTION
The value of the start option must be a raw interface symbol name. It will be used as the start symbol for the grammar. Unlike most of the raw interface options, which may not be used in combination with a high-level interface or the source
option, The start
option may be used to set the default for, or to override the choice of, start symbol in a high-level interface grammar.
If you use the start
option to specify a symbol from a high-level grammar, you must be careful to use the raw interface symbol name. The documentation for the high-level interface should describe how its symbol names can be converted to raw interface symbol names.