The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

DRAFT

This document is more or less the full length source of the POD that is in Sub::Genius. As this module gets forged through the fire on CPAN, the documentation will also go through several revision. The goal is to expand on the subjects presented her in seperate documentation.

NAME

Sub::Genius - module for facilitating concurrent Perl semantics in the uniprocess execution model of perl.

Another way to say this, is that it introduces all the joys and pains of multi-threaded, shared memory programming to the uniprocess environment that is perl.

THIS MODULE IS EXPERIMENTAL

Until further noted, this module subject to extreme fluxuations in interfaces and implied approaches. The hardest part about this will be managing all the cool and bright ideas stemming from it.

And try as we might, it is impossible to account for all eventualities leading to the outcome of this module or approach. It is the hope of this author that this is simply a reference exploration or implemention. Provin that such a thing can be accomplished in the perl userland (e.g., not core) is part of the goals of this project.

As noted by an early reviewer of this work, it might seem beneficial to provide support for a finite automata engine for uses such as the application presented in this module, but that determination is certainly one that should address an actual need. The need, if one exists, may come from demonstration modules such as this one.

Offline versus Online Execution

The current use of this module can be considered offline, meaning that a sequentially consistent serialization of the presented concurrent semantics is generated first, then it is executed. There may be uses where a valid next subroutine is generated during run time - this can be called online execution of processes defined by the ordering. Will this module support online execution? The answer is, it depends on the actual utility this approach proves to give Perl programmers.

PURPOSE

The main purpose of this module is to expose a high level Perl interface to ensuring sequential consistency among sets of subroutines, which is an essential concept when discussing anything related to concurrent semantics and correct execution ordering in perl's inherently uniprocess environment.

It leverages well known algorithms for converting Regular Languages expressed as a formal (not perl) Regular Expression using the traditional operators, with the addition of the shuffle operator:

From [1],

    A shuffle w of u and v can be loosely defined as a word that is obtained
    by first decomposing u and v into individual pieces, and then combining
    (by concatenation) the pieces to form w, in a way that the order of the
    pieces in each of u and v is preserved.

This means that it preserves the total ordering required by regular languages u and v, but admits the partial ordering - or shuffling - of the languages of both. This ultimately means that a valid string resulting from this combination is necessarily sequentially consistent. Which, from [2],

    ... the result of any execution is the same as if the operations of all the
    processors were executed in some sequential order, and the operations of
    each individual processor appear in this sequence in the order specified
    by its program.

SYNOPSIS

    my $pre = q{( A B    &     C D )       Z};
    #             \ /          \ /         |
    #>>>>>>>>>>>  L1  <shuff>  L2   <cat>  L3

    my $sq = Sub::Genius->new(pre => $pre);

    # sub declaration order has no bearing on anything
    sub A { print qq{A} }
    sub B { print qq{B} }
    sub C { print qq{C} }
    sub D { print qq{D} }
    sub Z { print qq{\n}}

    $sq->run_once();
    print qq{\n};

The following expecity execution of the defined subroutines are all valid according to the PRE description above:

    # valid order 1
      A(); B(); C(); D(); Z();
    
    # valid order 2
      A(); C(); B(); D(); Z();
    
    # valid order 3
      A(); C(); D(); B(); Z();
    
    # valid order 4
      C(); A(); D(); B(); Z();
    
    # valid order 5
      C(); D(); A(); B(); Z();

In the example above, using a PRE to describe the relationship among subroutine names (these are just multicharacter symbols); we are expressing the following constraints:

sub A must run before sub B
sub C must run before sub D
sub Z is always called last

Sub::Genius uses FLAT's functionality to generate any of a number of valid "strings" or correct symbol orderings that are accepted by the Regular Language described by the shuffle two regular languages.

Meaningful Subroutine Names

FLAT allows single character symbols to be expressed with out any decorations;

    my $pre = q{ s ( A (a b) C & D E F) f };

The concatentaion of single symbols is implied, and spaces between symbols doesn't even matter. The following is equivalent to the PRE above,

    my $pre = q{s(A(ab)C&DEF)f};

It's important to note immediately after the above example, that the PRE may contain symbols that are made up of more than one character. This is done using square brackets ([...]), e.g.:

    my $pre = q{[s]([A]([a][b])[C]&[D][E][F])[f]};

But this is a mess, so we can use longer subroutine names as symbols and break it up in a more familar way:

    my $pre = q{
      [start]
        (
          [sub_A]
          (
            [sub_a]
            [sub_b]
          )
          [sub_C]
        &
          [sub_D]
          [sub_E]
          [sub_F]
        )
      [fin]
    };

This is much nicer and starting to look like a more natural expression of concurrent semantics.

Notes

  • All strings in brackets are considered a single symbol that is part of the set of all symbols accepted by the Regular Language that this PRE describes.

  • A lot is inevitably going to change with some nuances of how the PRE is expressed, and a lot of this has to do with necessary changes that will be needed in FLAT to facilitate this sort of application; e.g., below the white space immediately after the [ is considered part of the symbol defined within [..]. A valid string with therefore be the concatenation of all symbols; this will necessarily provide spaces in the string. The string, is split (effectively trim'ing each subroutine name. The subroutines may then be string eval'd with the guarantee that they are going to be run in a correct sequential order. This is known because -valid_string> returns one of potential many (or infinite) valid strings depending on the PRE based description.

Development Workflows

When first starting out with a Perl program using this approach, it might be helpful to be able to generate a stub Perl program using a PRE.

Beyond that, this module makes little attempt to solve the external details of managing a code base that might utlize this approach. An indication of this module's success will be the outcry for these kind of features, or by a few kind souls, contributions of tools or features that do help in this area.

For now, it is the purpose of Sub::Genius::Util::plan2perl to generate stub Perl code. An interesting future capability would be to provide additional annotations inside of the [...], and generate immediately useful Perl programs in much the same way that the Ragel State Machine compiler works. But this module is not meant to be anything like the RSM; one will see no goto statements anywhere around these parts.

PERL's UNIPROCESS MEMORY MODEL AND ITS EXECUTION ENVIRONMENT

While the language Perl is not necessarily constrained by a uniprocess execution model, the runtime provided by perl is. This has necessarily restricted the expressive semantics that can very easily be extended to DWIM in a concurrent execution model. The problem is that perl has been organically grown over the years to run as a single process. It is not immediately obvious to many, even seasoned Perl programmers, why after all of these years does perl not have real threads or admit real concurrency and semantics. Accepting the truth of the uniprocess model makes it clear and brings to it a lot of freedom. This module is meant to facilitate shared memory, multi-process reasoning to perl's fixed uniprocess reality.

In the Formal Language Theory area of Computer Science where the study of Regular Languages exists, REs are typically modeled as Finite Automata. Many are familiar with this concept, and this is the starting point for the memory model established with this approach.

Each named symbol in the PRE represents a state the equivalent FSM. And while Finite State Machines have no concept of memory, it is the state itself that presents the validity of a string; the perl environment and all the capabilities of subroutines may be leveraged to facilitate several levels of memory that present some interesting options.

Earlier in this SYNOPSIS, there were some subroutine definitions associated with the PRE, AB & CD. And the subroutines were defined as so:

    # sub declaration order has no bearing on anything
    
    sub A { print qq{A} }
    sub B { print qq{B} }
    sub C { print qq{C} }
    sub D { print qq{D} }

By leveraging the consistent memory model [3] of the perl uniprocess, the actual memory envronment may be safely expanded using any combination of state and global variable (i.e., my, our):

    # NOTE: sub declaration order has no bearing on anything

    # Shared, global memory
    my $GLOBAL_STATE = {};
    
    sub A {
      my $scope = shift;
      state $private_persist = {}; # persistent "state" memory local
      my $private_temp    = {}; # temporary memory

      # Note: use of 'return' has not yet been # explored in the context
      # of this approach return;
       
      return $scope;
    }

    sub B {
      my $scope = shift;
      state $private_persist = {}; # persistent "state" memory local
      my $private_temp    = {}; # temporary memory

      # Note: use of 'return' has not yet been # explored in the context
      # of this approach return;
       
      return $scope;
    }

    sub C {
      my $scope = shift;
      state $private_persist = {}; # persistent "state" memory local
      my $private_temp    = {}; # temporary memory

      # Note: use of 'return' has not yet been # explored in the context
      # of this approach return;
       
      return $scope;
    }

    sub D {
      my $scope = shift;
      state $private_persist = {}; # persistent "state" memory local
      my $private_temp    = {}; # temporary memory

      # Note: use of 'return' has not yet been # explored in the context
      # of this approach return;
       
      return $scope;
    }

The concept of GLOBAL and state scopes available to the uniprocess seems to beg some form of federated memory among 2 or more distinct perl uniprocesses, but this is not in the scope of the purpose of this module. Considering fork, it's an interesting avenue to explore in the future.

Indeed this paradigm moves slightly away from passing everything into a subroutine via parameter list, and embraces the natural availablitly of global variables. The pendulum must move back eventually, maybe this is such the time.

Note: run_once (and by extension, run_any) provides for execution time globaly, this can be initialized when called run_once. This is covered in detail in the METHODS section.

Atomics and Barriers

When speaking of concurrent semantics in Perl, the topic of atomic primatives often comes up, because in a truly multi-process execution environment, they are very important to coordinating the competitive access of resources such as files and shared memory. Since this execution model necessarily serializes parallel semantics in a sequentially consistent way, there is no need for any of these things. Singular lines of execution need no coordination because there is no competition for any resource (e.g., a file, memory, network port, etc).

USE CASES

See DESCRIPTION for more indepth treatment of sequentially consistent execution and Perl, but the following use cases provide examples of why a human Perl programmer would want to use this module. This section might be more words than code examples for the time being while it gets worked out.

Use Case 1 - static code stub generation

The scenario is that a developer wishes to describe on a high level partial and total ordering among subroutines. For doing this easily in Perl and executing it correctly in perl's uniprocess execution model.

Use Case 2 - run time sequentialization

This case utlizes a PRE during runtime to generate one or more sequentially consistent execution schedules. The developer has whatever degrees of freedom a PRE desription offers, augmented with perl support for global variables, subroutine specific state variables, and private my variables.

Use Case 3 - low level run time support for concurrent semantics

This has yet to be demostrated in any way (even unmeaningfully). But it stands to reason that this approach can be sufficiently developed to be the basis - or POC of a basis - to manage concurrent semantics in the uniprocess model the perl runtime presents. What this might look like, is an implementation of fork/join or async/await/future semantics that silently serialize correctly for sequential execution in a single perl process.

Use Case 4 - general experimentation with effects of serialization

A generalization of the activities necessary to prove this module's usefulness for Use Case 3 (above) or other things that necessarily need to correctly serialized non-squential actions in a correct way.

DESCRIPTION

This section has been intentionally left blank given the wall of text provided above.

RUNTIME METHODS

A minimal set of methods is provided, more so to not suggest the right way to use this module.

new

Constructor, requires a single scalar string argument that is a valid PRE accepted by FLAT.

    my $pre = q{
      [start]
        (
          [subA] (
            [subB_a] [subB_b]
          ) [subC]
        &
          [subD] [subE] [subF]
        )
      [finish]
    };

    my $sq = Sub::Genius->new(pre => $pre);

Note: due to the need to explore the advantages of supporting infinite languages, i.e., PREs that contain a Kleene star; init_plan will die after it compiles the PRE into a min DFA. It checks this using the FLAT::DFA::is_finite subroutine, which simply checks for the presence of cycles. Once this is understood more clearly, this restriction may be lifted. This module is all about correctness, and only finite languages are being considered at this time.

The reference, if captured by a scalar, can be wholly reset using the same parameters as new but calling the plan_nein methods. It's a minor convenience, but one all the same.

plan_nein

Using an existing reference instantiation of Sub::Genius, resets everything about the instance. It's effectively link calling new on the instance without having to recapture it.

init_plan

This takes the PRE provided in the new constructure, and runs through the conversion process provded by FLAT to an equivalent mininimzed DFA. It's this DFA that is then used to generate the (currently) finite set of strings, or plans that are acceptible for the algorithm or steps being implemented.

    my $pre = q{
      [start]
        (
          [subA] (
            [subB_a] [subB_b]
          ) [subC]
        &
          [subD] [subE] [subF]
        )
      [finish]
    };
    my $sq = Sub::Genius->new(pre => $pre);
    $sq->init_plan;
run_once

Returns scope as affected by the assorted subroutines.

Accepts two parameters, both are optional:

ns => q{My::Subsequentializer::Oblivious::Funcs}

Defaults to main::, allows one to specify a namespace that points to a library of subroutines that are specially crafted to run in a sequentialized environment. Usually, this points to some sort of willful obliviousness, but might prove to be useful nonetheless.

scope => {}

Allows one to initiate the execution scoped memory, and may be used to manage a data flow pipeline. Useful and consistent only in the context of a single plan execution. If not provided, scope is initialized as an empty anonymous hash reference:

    my $final_scope = $sq->run_once(
                           scope   => {},
                           verbose => undef,
                      );
verbose => 1|0

Default is falsy, or off. When enabled, outputs arguably useless diagnostic information.

Runs the execution plan once, returns whatever the last subroutine executed returns:

    my $pre = join(q{&},(a..z));
    my $sq = Sub::Genius->new(pre => $pre);
    $plan = $sq->init_plan;
    my $final_scope = $sq->run_once;
next

FLAT provides some utility methods to pump FAs for valid strings; effectively, its the enumeration of paths that exist from an initial state to a final state. There is nothing magical here. The underlying method used to do this actually creates an interator.

When next is called the first time, an interator is created, and the first string is returned. There is currently no way to specify which string (or plan) is returned first, which is why it is important that the concurrent semantics declared in the PRE are done in such a way that any valid string presented is considered to be sequentially consistent with the memory model used in the implementation of the subroutines. Perl provides the access to these memories by use of their lexical variable scoping (my) and the convenient way it allows one to make a subroutine maintain persistent memory (i.e., make it a coroutine) using the state keyword. See more about PERL's UNIPROCESS MEMORY MODEL AND ITS EXECUTION ENVIRONMENT in the section above of the same name.

An example of iterating over all valid strings in a loop follows:

    while (my $plan = $sq->next_plan()) {
      print qq{Plan: $plan\n};
      $sq->run_once;
    }

Note, in the above example, the concept of pipelining is violated since the loop is running each plan ( with no guaranteed ordering ) in turn. $scope is only meaningful within each execution context. Dealing with multiple returned final scopes is not part of this module, but can be captured during each iteration for future processessing:

    my @all_final_scopes = ();
    while (my $plan = $sq->next_plan()) {
      print qq{Plan: $plan\n};
      my $final_scope = $sq->run_once;
      push @all_final_scopes, { $plan => $final_scope };
    }
    # now do something with all the final scopes collected
    # by @all_final_scopes

At this time Sub::Genius only permits finite languages, therefore there is always a finite list of accepted strings. The list may be long, but it's finite.

As an example, the following admits a large number of orderings in a realtively compact DFA, in fact there are 26! (factorial) such valid orderings:

    my $pre = join(q{&},(a..z));
    my $final_scope = Sub::Genius->new(pre => $pre)->run_once;

Thus, the following will take long time to complete; but it will complete:

    my $ans; # global to all subroutines executed
    while ($my $plan = $sq->next_plan()) {
      $sq->run_once;
    }
    print qq{ans: $ans\n};

Done right, the output after 26! iterations may very well be:

    ans: 42

A formulation of 26 subroutines operating over shared memory in which all cooperative execution of all 26! orderings reduces to 42 is left as an excercise for the reader.

run_any

For convenience, this wraps up the steps of plan, init_plan, next, and run_once. It presents a simple one line interfaces:

    my $pre = q{
      [start]
        (
          [subA] (
            [subB_a] [subB_b]
          ) [subC]
        &
          [subD] [subE] [subF]
        )
      [finish]
    };
    Sub::Genius->new(pre => $pre)->run_any();

STATIC CODE UTILITY METHODS

Sub::Genius supplies a very basic and dumb way to generate stub code based on a provided PRE. It's only meant to use once, and is not recommended for managing code dynamically or as part of a development workflow.

Managing generated code is a pain and a very fragile process. Perl modules such as CodeGen::Protection address this issue directly. To remain minimalistic in the offered functionality, there is a single helper utility that simply generates a set of stub methods for each symbol present in the alphabet implied by the PRE. Also generated is a global shared memory hash; and each subroutine has in it a state my hash to serve as persistent memory between subsequent calls and as temporary memory available each time a subroutine is executed.

OPEN QUESTIONS

Usually the term, open questions, is used to imply addition research is needed in the academic sense. Fortunately, this module encapsulates two extremely well researched topics: Regular Languages, Finite Automata, and uniprocess execution environments (particularly in regards to operating systems research).

This section is focused questions as they relate to the search for proper idiomatic uses of this module to explore concurrent programming semantics that may be expressed in Perl (the language) and the uniprocess execution and memory model that perl exposes in such a robust and fantic way.

Some open questions are pressing and required for an initial release. Others are more future oriented.

What sort of run loop methods would be useful
It is pretty clear how concatenation, union, and shuffle can be leveraged for declaring concurrent semantics dicussed above. How can support for the kleene star and negation be leveraged to bring value? See FLAT::Regex::WithExtraOps for more information on how FLAT supports these Regular Language Expression operations
Given speculation in point above, it might lead to some interesting options to look at https://metacpan.org/source/ESTRABD/FLAT-1.0.2/lib%2FFLAT%2FDFA.pm#L394, which is the iterator inplementing the "deep_dft" traversal used to "pump" cycles necessarily created by kleene stars, but to a limit. It may lead to an approach that executes subs as it's doing the traversal, offering options to choose among valid 'run time' decisions based on the current DFT state of the traversal being walked for a valid string
Are there any other operations that are closed under Regular Languages that could be introduced to FLAT that would provide additional value.
What ways of getting a valid sequential string would be most helpful>? Currently, string returns any valid string in a non-reprucible way. This means that getting the same string between subsequent calls is not support. Specifying specific characteristcs of the the sequentially consistent string form is also not supported (e.g., longest, etc).
What utlitiy methods would provide the most value to anyone statically generating programs utilizing this approach? How can regeneration of code be used best in a manageable development workflow.

SEE ALSO

Pipeworks, Sub::Pipeline, Process::Pipeline, FLAT, Graph::PetriNet

COPYRIGHT AND LICENSE

Same terms as perl itself.

AUTHOR

OODLER 577 <oodler@cpan.org<gt>

REFERENCES