The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Parse::Marpa::Doc::Plumbing - The Plumbing Interface

DESCRIPTION

This document describes Marpa's plumbing Interface. The plumbing is the low-level interface used by all the porcelain interfaces. The plumbing can be used directly. It is a short list of named arguments to the Parse::Marpa::Grammar::new(), Parse::Marpa::Grammar::set(), and Parse::Marpa::Recognizer::new() methods.

The plumbing and porcelain interfaces may not be used to build the same grammar. Marpa throws an exception if the user attempts to use a porcelain interface with any of the plumbing's named arguments, other than the start argument.

Plumbing Symbol Names

Each interface has its own rules for symbol names. The plumbing's conventions are designed to allow flexibility for the porcelain. Any valid Perl string not ending in a right square bracket is an acceptable plumbing symbol name. Plumbing symbol names which end in right square brackets are reserved for Marpa internal use.

Unlike MDL, plumbing symbols are not considered identical unless their names match exactly. Unless stated otherwise, any reference to a symbol name in this document means a plumbing symbol name.

METHOD

Parse::Marpa::Grammar::get_symbol

    my $minus = $grammar->get_symbol("minus");

    my $number
        = Parse::Marpa::Grammar::get_symbol($grammar, "number");

Given a symbol's plumbing name, returns the symbol's cookie. It returns undefined if a symbol with that name doesn't exist. If you are using MDL to define your grammar, you want to use Parse::Marpa::MDL::get_symbol instead.

Symbol cookies are used primarily when calling the Parse::Marpa::Recognizer::earleme method. To get the cookie for a symbol using its porcelain name, see the documentation for the individual porcelain interface.

NAMED ARGUMENTS

The rules Named Argument

The rules named argument is available with both the Parse::Marpa::Grammar::new and Parse::Marpa::Grammar::set methods. The rules named argument may be specified multiple times, adding new rules to the grammar each time. New rules may be added until the grammar is precomputed.

The value of the rules named argument must be a reference to an array, and each element of the array must be a reference to a description of a rule. Rule descriptions can be either arrays (the short form) or hashes (the long form).

Short Form

The short form description of a rule is an array with 4 elements: lhs, rhs, action and priority. The last two of these are optional.

The lhs element must be the name of the left hand side symbol. The rhs element must be a reference to an array of names of right hand side symbol names. In the case of an empty rule, rhs must be a reference to a zero length array.

The action element, if present, must be a string describing the rule's action in the current Marpa semantics. Right now, the only available semantics is Perl 5. If the action for a rule is not explicitly set, it will be the value of Marpa's default_action option.

The priority element, if present, must be an integer. It can be negative. It will be the priority of the rule. If undefined, priority defaults to zero.

Long Form

The long form description of a rule is a hash of rule options, with the option names as the hash keys, and the option values as the hash values. The available rule options are:

lhs, rhs, action, and priority

The values of the lhs, rhs, action, and priority rule options are as described above for the corresponding elements of the short form.

min and max

If defined, min must be a non-negative integer. If defined, max must be an integer greater than zero, and greater than or equal to min.

  • min and max both undefined

    The rule is an ordinary BNF production.

  • min defined, but max undefined

    The rule is a sequence production. Only one symbol is allowed on the right hand side. It is not allowed be a nullable symbol. The rhs must be repeated at least min times and may be repeated an unlimited number of times.

  • min undefined, but max defined

    The rule is not valid. An exception is thrown.

  • min and max both defined and both equal to 1

    The rule is an ordinary BNF production.

  • min and max both defined, but not both equal to 1

    The rule is a counted sequence production. Only one symbol is allowed on the right hand side. It is not allowed be a nullable symbol. It may be repeated anywhere from min to max times.

For an introduction to sequence productions, see the MDL document.

separator

Any sequence production may have a separator defined. The value must be a symbol name. Marpa allows trailing separators, Perl style. The separator must not be a nullable symbol.

Duplicate Rules

Marpa throws an exception if a duplicate rule is added. For BNF productions, a rule is considered a duplicate if it has the same left hand side symbol, and the same symbols in the same order on the right hand side.

For sequences, a rule is considered a duplicate if it has the same left hand symbol, the same right hand side symbol, and the same separator. It's possible that some of the sequences banned by this rule are not pathological, but this restriction can be worked around by writing them as BNF productions. Even if the banned sequences are not pathological, their semantics are probably tricky, and writing them as BNF productions may be for the best.

The terminals Named Argument

The value of the terminals name argument must be a reference to an array of references to terminal descriptions. Terminal descriptions are arrays of two elements. The first element is the symbol name of the terminal. The second element must be a reference to a hash of terminal options, with option names as hash keys and option values as hash values.

Terminal Options

regex

The value of the regex terminal option must be a regular expression. It will be used to match the terminal in the input text. Only one of the regex and action terminal options may be specified. See the MDL document for details on writing terminal regexes.

action

The value of the action terminal option must be a string with code in the current semantics. Right now the only available semantics is Perl 5. The code will be interpreted as a lex action, which will be used to match the terminal in the input text. Only one of the regex and action terminal options may be specified. See the MDL document for details on writing lex actions.

prefix

The value of the prefix terminal option must be a regular expression. It will be used to match and discard text from the input before any attempt is made to match the terminal itself. The most common use is to discard leading whitespace.

priority

The value of the priority terminal option must be an integer. It can be negative. It will control the order in which terminal matches are attempted.

The start Named Argument

The value of the start named argument must be a plumbing symbol name. It will be used as the start symbol for the grammar. Most of the plumbing named arguments may not be used in combination with a porcelain interface. The start named argument is an exception. It may be used to set the default for, or to override the choice of, the start symbol in the porcelain.

If you use the start named argument to specify a porcelain symbol, you must be careful to use the plumbing symbol name. The documentation for the porcelain should describe how its symbol names can be converted to plumbing symbol names.

SUPPORT

See the support section in the main module.

AUTHOR

Jeffrey Kegler

COPYRIGHT

Copyright 2007 - 2008 Jeffrey Kegler

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.