The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Marpa::R2::Advanced::Thin - Direct access to Libmarpa

About this document

At this moment, this document is INCOMPLETE and, for that reason NOT 100% RELIABLE.

Most Marpa users can ignore this document. It describes Marpa's "thin" interface. The provides efficient access to Marpa's core library, Libmarpa. It provides the ultimate in Marpa speed, power and flexibility.

The "thin" interface is very low-level and NOT convenient to use -- user-friendliness is expected to be provided by an upper layer. The "thin" interface is intended for those writing upper layers for Marpa. It is also for those writing applications, when they want to eliminate the overhead of an upper layer, or when they want the flexibility provided by direct access to Libmarpa.

This document assumes that the reader is familiar with the other Marpa::R2 documentation, as well as the Libmarpa API document. This means the reader will have to know some C language, enough to understand C function prototypes.

How this document is written

The Libmarpa interface is described in the Libmarpa API document, and this document avoids duplicating the material there. This document states general rules for the "thin" interface. Methods that do not depart from the general rules are not specifically mentioned.

While this style and level of documentation is efficient, and the standard for C library interfaces to Perl, it is, admittedly, very terse. As an aid to the reader, an example of the usage of the thin interface is presented below. While small, the example is non-trival. It includes a full logic flow, starting with the definition of the grammar and contining all the way to the iteration of the values of an ambiguous parse.

Methods in the thin interface

As of this writing, the thin interface has no methods of its own. Each of its methods is a wrapper for a method from the Libmarpa interface.

Not all Libmarpa methods have thin interface wrappers. None of Libmarpa's internal methods are included in the thin interface. Additionally, some of Libmarpa's external methods provide services that are handled internally by the thin interface, and wrappers to those methods are therefore not included in the actual interface. When an external Libmarpa method is omitted, this will be specificially stated, with the reason for the omission.

Whenever an external Libmarpa method is not mentioned in this document, the reader can assume that it has a wrapper that is implemented according to the general guidelines, as given below. Where the implementation of an external libmarpa methods is an exception to the guidelines, or has other peculiarities, that will be explicitly stated.

Libmarpa time classes

As a reminder, the classes of Libmarpa's time objects are, in sequence, grammar, recognizer, bocage, ordering, tree and value. The one-letter abbreviations for these are, respectively, g, r, b, o, t and v.

Libmarpa methods not in the Marpa thin interface

No internal Libmarpa method is part of the Marpa:thin interface. Additionally, several external Libmarpa methods are omitted because their function is performed by the Marpa thin interface.

No thin interface method corresponds to the marpa_check_version() static method, because the Marpa thin interface interface handles its own version matching. No thin interface method corresponds to any of the Libmarpa config class methods, and no Marpa thin interface object corresponds to Libmarpa's config objects, Configuration in the Marpa thin interface is done Perl variables.

No Marpa thin interface method corresponds to the marpa_g_ref() and marpa_g_unref() methods because the thin interface handles the reference counting of the Libmarpa objects it creates. The application can rely on Libmarpa objects being cleaned up properly as part of Perl's For the same reason, no Marpa thin interface method corresponds to the "ref" and "unref" methods of the other Libmarpa time classes.

Libmarpa time objects and constructors

The thin interface implements a Perl class corresponding to each of the Libmarpa time classes. Objects in the thin Marpa classes should be treated as opaque scalars. No applications should define new elements for a thin Marpa classes, redefine, overload or remove existing elements, or subclass the class itself. The only operations an application should perform on objects blessed into the thin interfaces classes is to assign them, to use them to call methods in their class, and to pass them as arguments where appropriate.

    Marpa_Grammar       Marpa::R2::Thin::G
    Marpa_Recognizer    Marpa::R2::Thin::R
    Marpa_Bocage        Marpa::R2::Thin::B
    Marpa_Ordering      Marpa::R2::Thin::O
    Marpa_Tree          Marpa::R2::Thin::T
    Marpa_Value         Marpa::R2::Thin::V

Constructors for the time objects may be called using the new method of the corresponding Perl class. For example,

    my $recce = Marpa::R2::Thin::R->new($grammar);

The thin interface takes care of Libmarpa's reference counting for the user. Marpa thin interface's time objects should be destroyed implicitly by undefining them, or by letting them go out of scope.

The general pattern

The thin Marpa methods often follow a general pattern, based on their corresponding Libmarpa time class method. Internal class instance methods for Libmarpa's time classes have names of the form marpa_g_start_symbol_set. The name begins with a fixed six-letter prefix marpa_, followed by a single letter (in this case "g"), and another underscore. The single letter is one of Libmarpa's time class abbreviations, and indicates which class the method belongs to.

In general pattern of Marpa's thin interface, the corresponding Marpa thin Perl closure would be be a method in the appropriate Marpa thin class, whose name is the same except for the 8-letter prefix. For example, the Marpa thin method corresponding to marpa_g_start_symbol_set would be named start_symbol_set and would be a method of the Marpa::R2::Thin:G Perl class.

When a Libmarpa method returns -1 to indicate failure, a Marpa thin interface following the general pattern returns a Perl undef. When a Libmarpa method returns -2 to indicate failure, a Marpa thin interface following the general pattern throws a Perl exception.

Libmarpa's class instance methods prototypes have an object of the appropriate class as their first ("self") argument. Zero or more other non-self arguments follow this first time class argument. In the corresponding thin Marpa method, if it follows the general pattern, the arguments to the Perl method closure are the the arguments of the C function in the same order, and converted in Perl variables as described next. (Here I am following the convention in the perlobj of considered the "self" object to be a Perl method's first argument.)

In the general pattern, every return value or argument whose type is one of Libmarpa's time classes is converted to the corresponding Marpa thin interface class. Return values and arguments of Libmarpa's numbered classes (Marpa_Rule_ID and Marpa_Symbol_ID) are converted to Perl scalar integers. C language int's are also converted to Perl scalar integers.

Note that will NOT convert a Perl true to a 1 or a Perl false to a 0. The thin interface expects even those arguments which Libmapra interprets as booleans to be numbers, as specified in the Libmarpa API. This usually means the that arguments must either be NUMERIC one or NUMERIC zero. This allows for future extensions to the Libmarpa interface that accept and interpret other numeric values.

Here is an example of a Libmarpa function whose corresponding Marpa thin method follows the general pattern.

  marpa_g_start_symbol_set (grammar, symbol_S);

and here is the corresonding thin Marpa call:

    $grammar->start_symbol_set($symbol_S);

Error methods

The thin interface to Libmarpa provides error methods more appropriate to the Perl environment than Libmarpa's own.

$g->error()

    my ( $error_code, $error_description ) = $grammar->error();
    my @error_names = Marpa::R2::Thin::error_names();
    my $error_name = $error_names[$error_code];

In scalar context, the error() method returns the error description, a string that describes the most recent error. In array context, it returns a 2-element array. The first element of the array is the error code, and the second element is the same error description that is returned in scalar context.

If a method follows the general error handling pattern, then it does the following:

  • On a Libmarpa failure, the method sets the error code to the Libmarpa error code. This is a integer. The method always sets the error description to a string which describes the Libmarpa failure.

  • On other failures, the method sets the error code to a Perl undef. It sets the error description to a string which describes the failure.

  • On success, the method may do one of two things. First, it may clear the error code and error description. Second, it may leave them as is. Which of the two will happen is unspecified, unless stated in the description of that method in this document.

All Marpa thin interface methods follow the general error handling pattern, unless it is stated otherwise in that method's description in this document. Methods which follow the general pattern always follow the general error handling pattern.

The "error description" is a string that describes the most recent error. Error descriptions are subject to wording and punctuation changes, and applications should not rely on their text.

The error code may be undefined or an integer. If it is an integer, it is the Libmarpa error code and can be used as an index to array of error names returned by the $g->error_names() method. The programmer can expect error codes and error names to be kept stable.

If the error code is a Perl undef, the most recent error was in the Marpa thin layer, and the error description will be of that error. Note that a Marpa thin layer error will not persist after the call to error(). It is up to an application, if it wants to refer to the error code or description again, to store them.

While in the current implement Libmarpa error codes and descriptions can be more persistent, applications should not rely on this. Applications should assume, for all classes of errors returned by the error() method, that they will not persist into a call to any other Marpa thin method, including the a second call to the error() method itself.

$g->error_names()

For a synopsis, see the section on the $g->error() method. The error_names() method returns a reference to an array of error names, indexed by Libmarpa error code. Error names are intended as mnemonic codes -- as a convenient alternative to the numeric error codes. They are not intended as descriptions of the error condition. The programmer can expect error codes and error names to be kept stable.

$g->throw_set()

    $grammar->throw_set(0);

The throw_set() method turns the throw flag for the grammar on or off, according to whether its argument is 1 or 0.

If the argument of throw_set() is not a numeric 0 or 1, the method always throws an exception, regardless of the setting of any throw flags or variables.

Configuration methods and variables

The Marpa thin interface has one configuration variable:

  • $Marpa::R2::Thin::THROW

    If a Perl true, the Marpa thin interface throws failures as exceptions. If a Perl false, the Marpa thin interface methods return failure, as described for each method. Defaults to true.

    Each grammar has its own "throw" flag. This variables controls only the initial setting of that variable. A grammar's "throw" flag can be reset using the $g->throw_set() method, after which its setting is independent of the $Marpa::R2::Thin::THROW variable.

    Each grammar's "throw" flag is intended to centralize control of the throwing of exceptions in the time objects descended from that base grammar. But there are exceptions. If the arguments to the throw_set() method itself are not sane, it assumes all bets are off and throws an exception. And for the $r->alternative() method, the "Ruby Slippers" flag also affects which issues are throw as exceptions. For details, set its description.

Grammar methods

Marpa::R2::Thin::G->new()

    my $grammar  = Marpa::R2::Thin::G->new();

There are no arguments to the Marpa thin interface's grammar constructor. A failure occurs if there is a version mismatch, which should not happen -- it indicates a problem with the way that the library was built. On success, its return value is a thin interface grammar object. On failure in scalar context, its return value is a Perl undef. On failure in array context, its return value is a 2-element array whose first element is a Perl undef and whose second element is the error code.

$g->event()

    my ( $event_type, $value ) = $grammar->event( $event_ix++ );

The event() method returns a two-element array on success. The first element is a string naming the event type, and the second is a scalar representing its value. The string for an event type is its macro name, as given in the Libmarpa API document.

Some event types have an event "value". All event values are numeric Perl scalars. The number is either a symbol ID or a count, as described in the Libmarpa API document.

The permissible range of event indexes can be found with the Marpa thin interface's event_count() grammar method, which corresponds to Libmarpa's marpa_g_event_count() method. The thin interface's event_count() method follows the general pattern.

Since event() returns the event value whenever it exists, the Libmarpa marpa_g_event_value() method is unneeded. The Libmarpa marpa_g_event_value() method has no corresponding Marpa thin interface method.

$g->rule_new()

    my $start_rule_id = $grammar->rule_new( $symbol_S, [$symbol_E] );

The rule_new() grammar method is the Libmarpa thin interface method corresponding to the marpa_g_rule_new() method. It takes two arguments, both required. The first argument is a symbol ID representing the rule's LHS, and the second argument is a reference to an array of symbol ID's. The symbol ID's in the array represent the RHS. On success, the return value is the ID of the new rule.

$g->sequence_new()

    my $sequence_rule_id = $grammar->sequence_new(
            $symbol_S,
            $symbol_a,
            {   separator => $symbol_sep,
                proper    => 0,
                min       => 1
            }
        );

The sequence_new() grammar method is the Libmarpa thin interface method corresponding to the marpa_g_sequence_new() method. It takes three arguments, all required. The first argument is a symbol ID representing the sequence's LHS. The second argument is a symbol ID representing the sequence's RHS. The third argument is a reference to a hash of named arguments.

The hash of named arguments may be empty. If not empty, its keys, and their values, must be one of the following:

separator

The value of the separator named argument will be treated as an integer, and passed as the separator ID argument to the marpa_g_sequence_new() method. It defaults to -1.

proper

If the value of proper named argument is a Perl true value, the MARPA_PROPER_SEPARATION flag will be set in the flags passed to the marpa_g_sequence_new() method. Otherwise, the MARPA_PROPER_SEPARATION flag will not be set.

min

The value of the separator named argument will be treated as an integer, and passed as the separator ID argument to the marpa_g_sequence_new() method. It defaults to 0.

On success, the return value is the ID of the new sequence. sequence_new() obeys the throw setting. On unthrown Libmarpa failure, it returns -2. On other unthrown failure, it returns a Perl undef.

Users should be aware that all sequences at the Marpa thin interface level are "keep separation". This differs from the higher-level interface, which discards separators by default. At the Marpa thin interface level, it is up to the programmer to discard separators, if that is what is wanted.

$g->precompute()

    $grammar->precompute();

The precompute() method follows the general pattern. In addition to errors, precompute() also reports events. Events are queried using the grammar's event() method.

On success, precompute() returns an event count. But, even when there is an error, precompute() often reports one or more events. It is not safe to assume that no events occurred unless precompute() succeeds and reports an event count of zero.

Omitted methods

The marpa_g_ref() and marpa_g_unref() methods are omitted because the Marpa thin interface performs their function. The marpa_g_event_value() method is omitted because its function is absorbed into the thin interface's event() grammar method.

General pattern methods

All grammar methods that are part of the Libmarpa external interface, but that are not mentioned explicitly in this document, are implemented following the general pattern, as described above

Recognizer methods

Marpa::R2::Thin::R->new()

    my $recce = Marpa::R2::Thin::R->new($grammar);

The new() method takes a Marpa thin grammar object as its one argument. On success, it returns a Marpa thin recognizer object. On an unthrown failure, it returns undef.

$r->ruby_slippers_set()

    $recce->ruby_slippers_set(1);

With an argument of 1, the ruby_slippers_set() method enables "Ruby Slippers" mode. With an argument of 0, the ruby_slippers_set() method disables "Ruby Slippers" mode. This is the default. Note that the default in this interface (off) is the opposite of the default in the higher level Marpa::R2 interface.

The alternative() method will only throw exceptions when "Ruby Slippers" mode is OFF and the throw flag is ON. One way of describing Ruby Slippers mode is to say it is an override of the throw flag, one which only applies to the alternative() method.

The ruby_slippers_set() method itself does not obey the throw setting. All failures by ruby_slippers_set() are thrown as exceptions.

$r->alternative()

    $recce->alternative( $symbol_number, 2, 1 );

The thin interface's alternative() method follows the general pattern, with one exception. The Ruby Slippers flag overrides the base grammar's throw setting.

The alternative() method will not throw an exception if the Ruby Slippers flag is on. More precisely, the alternative() method will throw failures as exceptions if and only if the base grammar's throw flag is on and the Ruby Slippers flag is off.

$r->terminals_expected()

    my @terminals = $recce->terminals_expected();

The terminals_expected() method takes no arguments. On success, it returns an array containing the symbol ID's of the expected terminals. Note that the array of expected terminal ID's may be empty, so that an empty array is NOT a failure indicator. terminals_expected() obeys the throw setting. On unthrown failure, terminals_expected() sets the error code and returns a Perl undef.

$r->progress_item()

    my $ordinal = $recce->latest_earley_set();
    $recce->progress_report_start($ordinal);
    ITEM: while (1) {
        my ($rule_id, $dot_position, $origin) = $recce->progress_item();
        last ITEM if not defined $rule_id;
        push @{$report}, [$rule_id, $dot_position, $origin];
    }
    $recce->progress_report_finish();

The progress_item() method takes no arguments. On success, it returns an array of 3 elements: the rule ID, the dot position, and the earley set ID of the origin. If there are no more items, progress_item() returns a Perl undef.

progress_item() obeys the throw setting. On unthrown failure, progress_item() returns a rule ID of -2.

Omitted methods

Because the Marpa thin interface handles reference counting internally, it does not implement methods directly corresponding to Libmarpa's marpa_r_ref() and marpa_r_unref() methods.

Methods not mentioned

All recognizer methods that are part of the Libmarpa external interface, but that are not mentioned explicitly in this document, are implemented following the general pattern, as described above

Example

    my $grammar  = Marpa::R2::Thin::G->new();
    my $symbol_S = $grammar->symbol_new();
    my $symbol_E = $grammar->symbol_new();
    $grammar->start_symbol_set($symbol_S);
    my $symbol_op     = $grammar->symbol_new();
    my $symbol_number = $grammar->symbol_new();
    my $start_rule_id = $grammar->rule_new( $symbol_S, [$symbol_E] );
    my $op_rule_id =
        $grammar->rule_new( $symbol_E, [ $symbol_E, $symbol_op, $symbol_E ] );
    my $number_rule_id = $grammar->rule_new( $symbol_E, [$symbol_number] );
    $grammar->precompute();

    my $recce = Marpa::R2::Thin::R->new($grammar);
    $recce->start_input();

    # The numbers from 1 to 3 are themselves --
    # that is, they index their own token value.
    # Important: zero cannot be itself!

    my @token_values         = ( 0 .. 3 );
    my $zero                 = -1 + +push @token_values, 0;
    my $minus_token_value    = -1 + push @token_values, q{-};
    my $plus_token_value     = -1 + push @token_values, q{+};
    my $multiply_token_value = -1 + push @token_values, q{*};

    $recce->alternative( $symbol_number, 2, 1 );
    $recce->earleme_complete();
    $recce->alternative( $symbol_op, $minus_token_value, 1 );
    $recce->earleme_complete();
    $recce->alternative( $symbol_number, $zero, 1 );
    $recce->earleme_complete();
    $recce->alternative( $symbol_op, $multiply_token_value, 1 );
    $recce->earleme_complete();
    $recce->alternative( $symbol_number, 3, 1 );
    $recce->earleme_complete();
    $recce->alternative( $symbol_op, $plus_token_value, 1 );
    $recce->earleme_complete();
    $recce->alternative( $symbol_number, 1, 1 );
    $recce->earleme_complete();

    my $latest_earley_set_ID = $recce->latest_earley_set();
    my $bocage        = Marpa::R2::Thin::B->new( $recce, $latest_earley_set_ID );
    my $order         = Marpa::R2::Thin::O->new($bocage);
    my $tree          = Marpa::R2::Thin::T->new($order);
    my @actual_values = ();
    while ( $tree->next() ) {
        my $valuator = Marpa::R2::Thin::V->new($tree);
        $valuator->rule_is_valued_set( $op_rule_id,     1 );
        $valuator->rule_is_valued_set( $start_rule_id,  1 );
        $valuator->rule_is_valued_set( $number_rule_id, 1 );
        my @stack = ();
        STEP: while ( my ( $type, @step_data ) = $valuator->step() ) {
            last STEP if not defined $type;
            if ( $type eq 'MARPA_STEP_TOKEN' ) {
                my ( undef, $token_value_ix, $arg_n ) = @step_data;
                $stack[$arg_n] = $token_values[$token_value_ix];
                next STEP;
            }
            if ( $type eq 'MARPA_STEP_RULE' ) {
                my ( $rule_id, $arg_0, $arg_n ) = @step_data;
                if ( $rule_id == $start_rule_id ) {
                    my ( $string, $value ) = @{ $stack[$arg_n] };
                    $stack[$arg_0] = "$string == $value";
                    next STEP;
                }
                if ( $rule_id == $number_rule_id ) {
                    my $number = $stack[$arg_0];
                    $stack[$arg_0] = [ $number, $number ];
                    next STEP;
                }
                if ( $rule_id == $op_rule_id ) {
                    my $op = $stack[ $arg_0 + 1 ];
                    my ( $right_string, $right_value ) = @{ $stack[$arg_n] };
                    my ( $left_string,  $left_value )  = @{ $stack[$arg_0] };
                    my $value;
                    my $text = '(' . $left_string . $op . $right_string . ')';
                    if ( $op eq q{+} ) {
                        $stack[$arg_0] = [ $text, $left_value + $right_value ];
                        next STEP;
                    }
                    if ( $op eq q{-} ) {
                        $stack[$arg_0] = [ $text, $left_value - $right_value ];
                        next STEP;
                    }
                    if ( $op eq q{*} ) {
                        $stack[$arg_0] = [ $text, $left_value * $right_value ];
                        next STEP;
                    }
                    die "Unknown op: $op";
                } ## end if ( $rule_id == $op_rule_id )
                die "Unknown rule $rule_id";
            } ## end if ( $type eq 'MARPA_STEP_RULE' )
            die "Unexpected step type: $type";
        } ## end while ( my ( $type, @step_data ) = $valuator->step() )
        push @actual_values, $stack[0];
    } ## end while ( $tree->next() )

Copyright and License

  Copyright 2012 Jeffrey Kegler
  This file is part of Marpa::R2.  Marpa::R2 is free software: you can
  redistribute it and/or modify it under the terms of the GNU Lesser
  General Public License as published by the Free Software Foundation,
  either version 3 of the License, or (at your option) any later version.

  Marpa::R2 is distributed in the hope that it will be useful,
  but WITHOUT ANY WARRANTY; without even the implied warranty of
  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
  Lesser General Public License for more details.

  You should have received a copy of the GNU Lesser
  General Public License along with Marpa::R2.  If not, see
  http://www.gnu.org/licenses/.