The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

XS::Parse::Infix - XS functions to assist in parsing infix operators

DESCRIPTION

This module provides some XS functions to assist in writing syntax modules that provide new infix operators as perl syntax, primarily for authors of syntax plugins. It is unlikely to be of much use to anyone else; and highly unlikely to be of any use when writing perl code using these. Unless you are writing a syntax plugin using XS, this module is not for you.

This module is also currently experimental, and the design is still evolving and subject to change. Later versions may break ABI compatibility, requiring changes or at least a rebuild of any module that depends on it.

In addition, the places this functionality can be used are relatively small. No current release of perl actually supports custom infix operators, though I have a branch where I am currently experimenting with such support:

https://github.com/leonerd/perl5/tree/infix-plugin

In addition, the various XPK_INFIX_* token types of XS::Parse::Keyword support querying on this module, so some syntax provided by other modules may be able to make use of these new infix operators.

CONSTANTS

HAVE_PL_INFIX_PLUGIN

   if( XS::Parse::Infix::HAVE_PL_INFIX_PLUGIN ) { ... }

This constant is true if built on a perl that supports the PL_infix_plugin extension mechanism, meaning that custom infix operators registered with this module will actually be recognised by the perl parser.

No actual production or development releases of perl yet support this feature, but see above for details of a branch which does.

XS FUNCTIONS

boot_xs_parse_infix

  void boot_xs_parse_infix(double ver);

Call this function from your BOOT section in order to initialise the module and parsing hooks.

ver should either be 0 or a decimal number for the module version requirement; e.g.

   boot_xs_parse_infix(0.14);

xs_parse_infix_new_op

   OP *xs_parse_infix_new_op(const struct XSParseInfixInfo *info, U32 flags,
      OP *lhs, OP *rhs);

This function constructs a new optree fragment to represent invoking the infix operator with the given operands. It should be used much the same as core perl's newBINOP function.

The info structure pointer would be obtained from the infix field of the result of invoking the various XPK_INFIX_* token types from XS::Parse::Keyword.

register_xs_parse_infix

   void register_xs_parse_infix(const char *opname,
      const struct XSParseInfixHooks *hooks, void *hookdata);

This function installs a set of parsing hooks to be associated with the given operator name. This new operator will then be available via XS::Parse::Keyword by the various XPK_INFIX_* token types, or to core perl's PL_infix_plugin if availble.

These tokens will all yield an info structure, with the following fields:

   struct XSParseInfixInfo {
      const char *opname;
      OPCODE opcode;  /* for built-in operators, or OP_CUSTOM for 
                         custom-registered ones */

      struct XSParseInfixHooks *hooks;
      void                     *hookdata;
   };

If the operator name contains any non-ASCII characters they are presumed to be in UTF-8 encoding. This will matter for deparse purposes.

PARSE HOOKS

The XSParseInfixHooks structure provides the following fields which are used at various stages of parsing.

   struct XSParseInfixHooks {
      U16 flags; /* currently ignored */
      U8 lhs_flags;
      U8 rhs_flags;
      enum XSParseInfixClassification cls;

      const char *wrapper_func_name;

      const char *permit_hintkey;
      bool (*permit)(pTHX_ void *hookdata);

      OP *(*new_op)(pTHX_ U32 flags, OP *lhs, OP *rhs, void *hookdata);
      OP *(*ppaddr)(pTHX);
   };

Flags

The flags field is currently ignored. It is defined simply to reserve the space in case used in a later version. It should be set to zero.

The rhs_flags field gives details on how to parse and handle the right-hand side of the operator syntax. It should be set to one of the following constants:

XPI_OPERAND_TERM (0)

Default. The operand is a term expression.

XPI_OPERAND_TERM_LIST

The operand is a term expression. It will be foced into list context, preserving the OP_PUSHMARK at the beginning. This means that the ppfunc for this infix operator will have to POPMARK to find that.

XPI_OPERAND_LIST

The operand is a list expression. It will be forced into list context, the same as above.

In addition the following extra bitflags are defined:

XPI_OPERAND_ONLY_LOOK

If set, the operator function promises that it will not mutate any of its passed values, nor allow leaking of direct alias pointers to them via return value or other locations.

This flag is optional; omitting it when applicable will not change any observed behaviour. Setting it may enable certain optimisations to be performed.

Currently, this flag simply enables an optimisation in the call-checker for infix operator wrapper functions that take list-shaped operands. This optimisation discards an OP_ANONLIST operation which would create a temporary anonymous array reference for its operand values, allowing a slight saving of memory use and CPU time. This optimisation is only safe to perform if the operator does not mutate or retain aliases of any of the arguments, as otherwise the caller might see unexpected modifications or value references to the values passed.

The lhs_flags field gives details on how to handle the left-hand side of the operator syntax. It takes similar values to rhs_flags, except that it does not accept the XPI_OPERAND_LIST value. Parsing always happens on just a term expression, though it may be placed into list context (which therefore still permits things like parenthesized lists, or array variables).

The Selection Stage

The cls field gives a "classification" of the operator, suggesting what sort of operation it provides. This is used as a filter by the various XS::Parse::Keyword selection macros.

The classification should be one of the XPI_CLS_* constants found and described further in the main XSParseInfix.h file.

The permit Stage

As a shortcut for the common case, the permit_hintkey may point to a string to look up from the hints hash. If the given key name is not found in the hints hash then the keyword is not permitted. If the key is present then the permit function is invoked as normal.

If not rejected by a hint key that was not found in the hints hash, the function part of the stage is called next and should inspect whether the keyword is permitted at this time perhaps by inspecting other lexical clues, and return true only if the keyword is permitted.

Both the string and the function are optional. Either or both may be present. If neither is present then the keyword is always permitted - which is likely not what you wanted to do.

The Op Generation Stage

If the infix operator is going to be used, then one of the new_op or the ppaddr fields explain how to create a new optree fragment.

If new_op is defined then it will be used, and is expected to return an optree fragment that consumes the LHS and RHS arguments to implement the semantics of the operator. If this is not present, then the ppaddr will be used instead to construct a new BINOP of the OP_CUSTOM type.

The Wrapper Function

Additionally, if the wrapper_func_name field is set to a string, this gives the (fully-qualified) name for a function to be generated as part of registering the operator. This newly-generated function will act as a wrapper for the operator.

For operators whose RHS is a scalar, the wrapper function is assumed to take two simple scalar arguments. The result of invoking the function on those arguments will be determined by using the operator code.

   $result = $lhs OP $rhs;

   $result = WRAPPERFUNC( $lhs, $rhs );

For operators whose RHS is a list, the wrapper function takes at least one argument, possibly more. The first argument is the scalar on the LHS, and the remaining arguments, however many there are, form the RHS:

   $result = $lhs OP @rhs;

   $result = WRAPPERFUNC( $lhs, @rhs );

For operators whose LHS and RHS is a list, the wrapper function takes two arguments which must be array references containing the lists.

   $result = @lhs OP @rhs;

   $result = WRAPPERFUNC( \@lhs, \@rhs );

This creates a convenience for accessing the operator from perls that do not support PL_infix_plugin.

In the case of scalar infix operators, the wrapper function also includes a call-checker which attempts to inline the operator directly into the callsite. Thus, in simple cases where the function is called directly on exactly two scalar arguments (such as in the following), no ENTERSUB overhead will be incurred and the generated optree will be identical to that which would have been generated by using infix operator syntax directly:

   WRAPPERFUNC( $lhs, $rhs );
   WRAPPERFUNC( $lhs, CONSTANT );
   WRAPPERFUNC( $args[0], $args[1] );
   WRAPPERFUNC( $lhs, scalar otherfunc() );

The checker is very pessimistic and will only rewrite callsites where it determines this can be done safely. It will not rewrite any of the following forms:

   WRAPPERFUNC( $onearg );            # not enough args
   WRAPPERFUNC( $x, $y, $z );         # too many args
   WRAPPERFUNC( @args[0,1] );         # not a scalar
   WRAPPERFUNC( $lhs, otherfunc() );  # not a scalar

The wrapper function for infix operators which take lists on both sides also has a call-checker which will attempt to inline the operator in similar circumstances. In addition to the optimisations described above for scalar operators, this checker will also inline an array-reference operator and omit the resulting dereference behaviour. Thus, the two following lines emit the same optree, without an OP_SREFGEN or OP_RV2AV:

   @lhs OP @rhs;
   WRAPPERFUNC( \@lhs, \@rhs );

Note that technically, this optimisation isn't strictly transparent in the odd cornercase that one of the referenced arrays is also the backing store for a blessed object reference, and that object class has a @{} overload.

   my @arr;
   package SomeClass {
      use overload '@{}' => sub { return ["values", "go", "here"]; };
   }
   bless \@arr, "SomeClass";

   # this will not actually invoke the overload operator
   WRAPPERFUNC( \@arr, [4, 5, 6] );

As this cornercase relates to taking duplicate references to the same blessed object's backing store variable, it should not matter to any real code; regular objects that are passed by reference into the wrapper function will run their overload methods as normal.

The callchecker for list operands can optionally also discard an op of the OP_ANONLIST type, which is used by anonymous array-ref construction:

   ($u, $v, $w) OP ($x, $y, $z);
   WRAPPERFUNC( [$u, $v, $w], [$x, $y, $z] );

This optimisation is only performed if the operator declared it safe to do so, via the XPI_OPERAND_ONLY_LOOK flag.

DEPARSE

This module operates with B::Deparse in order to automatically provide deparse support for infix operators. Every infix operator that is implemented as a custom op (and thus has the ppaddr hook field set) will have deparse logic added. This will allow it to deparse to either the named wrapper function, or to the infix operator syntax if on a PL_infix_plugin-enabled perl and the appropriate lexical hint is enabled at the callsite.

In order for this to work, it is important that your custom operator is not registered as a custom op using the Perl_register_custom_op() function. This registration will be performed by XS::Parse::Infix itself at the time the infix operator is registered.

TODO

  • Have the entersub checker for list/list operators unwrap arrayref or anon-array argument forms (WRAPPERFUNC( \@lhs, \@rhs ) or WRAPPERFUNC( [LHS], [RHS] )).

AUTHOR

Paul Evans <leonerd@leonerd.org.uk>