Unisyn::Parse - Parse a Unisyn expression.
Parse the Unisyn expression:
𝒂 ❴ 𝒃 ⟦𝒄⟨ 𝗮 𝑒𝑞𝑢𝑎𝑙𝑠 𝒅 𝗯 𝙙 𝐭𝐢𝐦𝐞𝐬 ⟪𝗰 𝐩𝐥𝐮𝐬 𝗱⟫⟢ 𝗲 𝑎𝑠𝑠𝑖𝑔𝑛 𝗳 𝐬𝐮𝐛 𝗴 𝙝⟩ 𝙘 ⟧ 𝙗 ❵ 𝙖
To get:
Suffix: 𝙖 Term Prefix: 𝒂 Term Brackets: ⦇⦈ Term Term Suffix: 𝙗 Term Prefix: 𝒃 Term Brackets: ⦋⦌ Term Term Suffix: 𝙘 Term Prefix: 𝒄 Term Brackets: ⦏⦐ Term Term Semicolon Term Assign: 𝑒𝑞𝑢𝑎𝑙𝑠 Term Variable: 𝗮 Term Dyad: 𝐭𝐢𝐦𝐞𝐬 Term Suffix: 𝙙 Term Prefix: 𝒅 Term Variable: 𝗯 Term Brackets: ⦓⦔ Term Term Dyad: 𝐩𝐥𝐮𝐬 Term Variable: 𝗰 Term Variable: 𝗱 Term Assign: 𝑎𝑠𝑠𝑖𝑔𝑛 Term Variable: 𝗲 Term Dyad: 𝐬𝐮𝐛 Term Variable: 𝗳 Term Suffix: 𝙝 Term Variable: 𝗴
Then traverse the parse tree printing the type of each node:
variable variable prefix_d suffix_d variable variable plus times equals variable variable variable sub assign semiColon brackets_3 prefix_c suffix_c brackets_2 prefix_b suffix_b brackets_1 prefix_a suffix_a
Parse a Unisyn expression.
Version "20211013".
The following sections describe the methods in each functional area of this module. For an alphabetic listing of all methods by name see Index.
Create a Unisyn parse of a utf8 string.
Create a new unisyn parse from a utf8 string.
Parameter Description 1 $address Address of a zero terminated utf8 source string to parse as a variable 2 %options Parse options.
Example:
create (K(address, Rutf8 $Lex->{sampleText}{vav}))->print; # Create parse tree from source terminated with zero # 𝗘𝘅𝗮𝗺𝗽𝗹𝗲 ok Assemble(debug => 0, eq => <<END); Assign: 𝑎 Term Variable: 𝗮 Term Variable: 𝗯 END
Parse Unisyn expressions
Traverse the parse tree
Traverse the terms in parse tree in post order and call the operator subroutine associated with each term.
Parameter Description 1 $parse Parse tree
my $s = Rutf8 $Lex->{sampleText}{Adv}; # Ascii my $p = create K(address, $s), operators => \&printOperatorSequence; K(address, $s)->printOutZeroString; $p->dumpParseTree; $p->print; $p->traverseParseTree; # 𝗘𝘅𝗮𝗺𝗽𝗹𝗲 Assemble(debug => 0, eq => <<END) 𝗮𝗮𝑒𝑞𝑢𝑎𝑙𝑠abc 123 𝐩𝐥𝐮𝐬𝘃𝗮𝗿 Tree at: 0000 0000 0000 10D8 length: 0000 0000 0000 000B Keys: 0000 1118 0500 000B 0000 0000 0000 0000 0000 0000 0000 000D 0000 000C 0000 0009 0000 0008 0000 0007 0000 0006 0000 0005 0000 0004 0000 0002 0000 0001 0000 0000 Data: 0000 0000 0000 0016 0000 0000 0000 0000 0000 0000 0000 0F18 0000 0009 0000 0AD8 0000 0009 0000 0004 0000 0006 0000 0002 0000 0005 0041 26A4 0000 0003 0000 0009 Node: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 index: 0000 0000 0000 0000 key: 0000 0000 0000 0000 data: 0000 0000 0000 0009 index: 0000 0000 0000 0001 key: 0000 0000 0000 0001 data: 0000 0000 0000 0003 index: 0000 0000 0000 0002 key: 0000 0000 0000 0002 data: 0000 0000 0041 26A4 index: 0000 0000 0000 0003 key: 0000 0000 0000 0004 data: 0000 0000 0000 0005 index: 0000 0000 0000 0004 key: 0000 0000 0000 0005 data: 0000 0000 0000 0002 index: 0000 0000 0000 0005 key: 0000 0000 0000 0006 data: 0000 0000 0000 0006 index: 0000 0000 0000 0006 key: 0000 0000 0000 0007 data: 0000 0000 0000 0004 index: 0000 0000 0000 0007 key: 0000 0000 0000 0008 data: 0000 0000 0000 0009 index: 0000 0000 0000 0008 key: 0000 0000 0000 0009 data: 0000 0000 0000 0AD8 subTree index: 0000 0000 0000 0009 key: 0000 0000 0000 000C data: 0000 0000 0000 0009 index: 0000 0000 0000 000A key: 0000 0000 0000 000D data: 0000 0000 0000 0F18 subTree Tree at: 0000 0000 0000 0AD8 length: 0000 0000 0000 0007 Keys: 0000 0B18 0000 0007 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0007 0000 0006 0000 0005 0000 0004 0000 0002 0000 0001 0000 0000 Data: 0000 0000 0000 000E 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0002 0000 0000 0000 0006 0041 176C 0000 0001 0000 0009 Node: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 index: 0000 0000 0000 0000 key: 0000 0000 0000 0000 data: 0000 0000 0000 0009 index: 0000 0000 0000 0001 key: 0000 0000 0000 0001 data: 0000 0000 0000 0001 index: 0000 0000 0000 0002 key: 0000 0000 0000 0002 data: 0000 0000 0041 176C index: 0000 0000 0000 0003 key: 0000 0000 0000 0004 data: 0000 0000 0000 0006 index: 0000 0000 0000 0004 key: 0000 0000 0000 0005 data: 0000 0000 0000 0000 index: 0000 0000 0000 0005 key: 0000 0000 0000 0006 data: 0000 0000 0000 0002 index: 0000 0000 0000 0006 key: 0000 0000 0000 0007 data: 0000 0000 0000 0000 end Tree at: 0000 0000 0000 0F18 length: 0000 0000 0000 000B Keys: 0000 0F58 0500 000B 0000 0000 0000 0000 0000 0000 0000 000D 0000 000C 0000 0009 0000 0008 0000 0007 0000 0006 0000 0005 0000 0004 0000 0002 0000 0001 0000 0000 Data: 0000 0000 0000 0016 0000 0000 0000 0000 0000 0000 0000 0DD8 0000 0009 0000 0C18 0000 0009 0000 0003 0000 0004 0000 0013 0000 0003 0041 2E40 0000 0003 0000 0009 Node: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 index: 0000 0000 0000 0000 key: 0000 0000 0000 0000 data: 0000 0000 0000 0009 index: 0000 0000 0000 0001 key: 0000 0000 0000 0001 data: 0000 0000 0000 0003 index: 0000 0000 0000 0002 key: 0000 0000 0000 0002 data: 0000 0000 0041 2E40 index: 0000 0000 0000 0003 key: 0000 0000 0000 0004 data: 0000 0000 0000 0003 index: 0000 0000 0000 0004 key: 0000 0000 0000 0005 data: 0000 0000 0000 0013 index: 0000 0000 0000 0005 key: 0000 0000 0000 0006 data: 0000 0000 0000 0004 index: 0000 0000 0000 0006 key: 0000 0000 0000 0007 data: 0000 0000 0000 0003 index: 0000 0000 0000 0007 key: 0000 0000 0000 0008 data: 0000 0000 0000 0009 index: 0000 0000 0000 0008 key: 0000 0000 0000 0009 data: 0000 0000 0000 0C18 subTree index: 0000 0000 0000 0009 key: 0000 0000 0000 000C data: 0000 0000 0000 0009 index: 0000 0000 0000 000A key: 0000 0000 0000 000D data: 0000 0000 0000 0DD8 subTree Tree at: 0000 0000 0000 0C18 length: 0000 0000 0000 0007 Keys: 0000 0C58 0000 0007 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0007 0000 0006 0000 0005 0000 0004 0000 0002 0000 0001 0000 0000 Data: 0000 0000 0000 000E 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0000 0007 0000 0008 0000 0002 0041 53FE 0000 0001 0000 0009 Node: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 index: 0000 0000 0000 0000 key: 0000 0000 0000 0000 data: 0000 0000 0000 0009 index: 0000 0000 0000 0001 key: 0000 0000 0000 0001 data: 0000 0000 0000 0001 index: 0000 0000 0000 0002 key: 0000 0000 0000 0002 data: 0000 0000 0041 53FE index: 0000 0000 0000 0003 key: 0000 0000 0000 0004 data: 0000 0000 0000 0002 index: 0000 0000 0000 0004 key: 0000 0000 0000 0005 data: 0000 0000 0000 0008 index: 0000 0000 0000 0005 key: 0000 0000 0000 0006 data: 0000 0000 0000 0007 index: 0000 0000 0000 0006 key: 0000 0000 0000 0007 data: 0000 0000 0000 0001 end Tree at: 0000 0000 0000 0DD8 length: 0000 0000 0000 0007 Keys: 0000 0E18 0000 0007 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0007 0000 0006 0000 0005 0000 0004 0000 0002 0000 0001 0000 0000 Data: 0000 0000 0000 000E 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0002 0000 0003 0000 0017 0000 0006 0041 176C 0000 0001 0000 0009 Node: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 index: 0000 0000 0000 0000 key: 0000 0000 0000 0000 data: 0000 0000 0000 0009 index: 0000 0000 0000 0001 key: 0000 0000 0000 0001 data: 0000 0000 0000 0001 index: 0000 0000 0000 0002 key: 0000 0000 0000 0002 data: 0000 0000 0041 176C index: 0000 0000 0000 0003 key: 0000 0000 0000 0004 data: 0000 0000 0000 0006 index: 0000 0000 0000 0004 key: 0000 0000 0000 0005 data: 0000 0000 0000 0017 index: 0000 0000 0000 0005 key: 0000 0000 0000 0006 data: 0000 0000 0000 0003 index: 0000 0000 0000 0006 key: 0000 0000 0000 0007 data: 0000 0000 0000 0002 end end end Assign: 𝑒𝑞𝑢𝑎𝑙𝑠 Term Variable: 𝗮𝗮 Term Dyad: 𝐩𝐥𝐮𝐬 Term Ascii: abc 123 Term Variable: 𝘃𝗮𝗿 variable ascii variable plus equals END my $s = Rutf8 $Lex->{sampleText}{ws}; my $p = create (K(address, $s), operators => \&printOperatorSequence); K(address, $s)->printOutZeroString; # Print input string $p->print; # Print parse $p->traverseParseTree; # Traverse tree printing terms # 𝗘𝘅𝗮𝗺𝗽𝗹𝗲 Assemble(debug => 0, eq => <<END) 𝗮𝑎𝑠𝑠𝑖𝑔𝑛⌊〈❨𝗯𝗽❩〉𝐩𝐥𝐮𝐬❪𝘀𝗰❫⌋⟢𝗮𝗮𝑎𝑠𝑠𝑖𝑔𝑛❬𝗯𝗯𝐩𝐥𝐮𝐬𝗰𝗰❭⟢ Semicolon Term Assign: 𝑎𝑠𝑠𝑖𝑔𝑛 Term Variable: 𝗮 Term Brackets: ⌊⌋ Term Term Dyad: 𝐩𝐥𝐮𝐬 Term Brackets: ❨❩ Term Term Brackets: ❬❭ Term Term Variable: 𝗯𝗽 Term Brackets: ❰❱ Term Term Variable: 𝘀𝗰 Term Assign: 𝑎𝑠𝑠𝑖𝑔𝑛 Term Variable: 𝗮𝗮 Term Brackets: ❴❵ Term Term Dyad: 𝐩𝐥𝐮𝐬 Term Variable: 𝗯𝗯 Term Variable: 𝗰𝗰 variable variable variable plus assign variable variable variable plus assign semiColon END
Traverse the parse tree in post order to create an execution chain.
Print a parse tree
Print a parse tree.
Dump the parse tree.
Associate methods with each operator via a set of quarks describing the method to be called for each lexical operator.
Map a lexical item to a processing subroutine.
Parameter Description 1 $parse Sub quarks 2 $alphabet The alphabet number 3 $op The operator name in that alphabet 4 $sub Subroutine definition
Define a method for a dyadic operator.
Parameter Description 1 $parse Sub quarks 2 $text The name of the operator as a utf8 string 3 $sub Associated subroutine definition
Define a method for an assign operator.
Define a method for a prefix operator.
Define a method for a suffix operator.
Define a method for ascii text.
Parameter Description 1 $parse Sub quarks 2 $sub Associated subroutine definition
Define a method for the semicolon operator which comes in two forms: the explicit semi colon and a new line semicolon.
Define a method for a variable.
Define a method for a bracket operator.
Parameter Description 1 $parse Sub quarks 2 $open Opening parenthesis 3 $sub Associated subroutine
Translate between alphabets.
Translate ascii to the corresponding letters in the assign latin alphabet.
Parameter Description 1 $in A string of ascii
Translate ascii to the corresponding letters in the assign greek alphabet.
Translate ascii to the corresponding letters in the dyad latin alphabet.
Translate ascii to the corresponding letters in the dyad greek alphabet.
Translate ascii to the corresponding letters in the prefix latin alphabet.
Translate ascii to the corresponding letters in the prefix greek alphabet.
Translate ascii to the corresponding letters in the suffix latin alphabet.
Translate ascii to the corresponding letters in the suffix greek alphabet.
Translate ascii to the corresponding letters in the escaped ascii alphabet.
Print the operator calling sequence.
Parameter Description 1 $parse Parse
Description of parse
Address of source string as utf8
Arena containing tree
Number of failures encountered in this parse
Methods implementing each lexical operator
Offset to the head of the parse tree
Quarks representing the strings used in this parse
Size of source string as utf8
Source text as utf32
Length of utf32 string
Size of utf32 allocation
Size of entries in exec chain
Load the position of a lexical item in its alphabet from the current character.
Parameter Description 1 $register Register to load 2 $address Address of start of string 3 $index Index into string
Load the lexical code of the current character in memory into the specified register.
Put the specified lexical code into the current character in memory.
Parameter Description 1 $register Register used to load code 2 $address Address of string 3 $index Index into string 4 $code Code to put
Load the details of the character currently being processed so that we have the index of the character in the upper half of the current character and the lexical type of the character in the lowest byte.
Check that we have at least the specified number of elements on the stack.
Parameter Description 1 $depth Number of elements required on the stack
Push the current element on to the stack.
Push the empty element on to the stack.
Lexical name for a lexical item described by its letter.
Parameter Description 1 $l Letter of the lexical item
Lexical number for a lexical item described by its letter.
Put the length of a lexical item into variable size.
Parameter Description 1 $source32 B<address> of utf32 source representation 2 $offset B<offset> to lexical item in utf32
Create a new term in the parse tree rooted on the stack.
Parameter Description 1 $depth Stack depth to be converted 2 $description Text reason why we are creating a new term
Write an error message and stop.
Parameter Description 1 $message Error message
Test a set of items, setting the Zero Flag is one matches else clear the Zero flag.
Parameter Description 1 $set Set of lexical letters 2 $register Register to test
Check that one of a set of items is on the top of the stack or complain if it is not.
Parameter Description 1 $set Set of lexical letters
Convert the longest possible expression on top of the stack into a term at the specified priority.
Parameter Description 1 $priority Priority of the operators to reduce
Reduce existing operators on the stack.
Assign.
Open.
Closing parenthesis.
Infix but not assign or semi-colon.
Prefix.
Post fix.
Semi colon.
Variable.
Parse the string of classified lexical items addressed by register $start of length $length. The resulting parse tree (if any) is returned in r15.
Replace the low three bytes of a utf32 bracket character with 24 bits of offset to the matching opening or closing bracket. Opening brackets have even codes from 0x10 to 0x4e while the corresponding closing bracket has a code one higher.
Parameter Description 1 @parameters Parameters
Scan input string looking for opportunities to convert new lines into semi colons.
Classify white space per: "lib/Unisyn/whiteSpace/whiteSpaceClassification.pl".
Reload the variables associated with a parse.
Parameter Description 1 $parse Parse 2 $parameters Hash of variable parameters
Parse a unisyn expression encoded as utf8 and return the parse tree.
Parameter Description 1 $parse Parse 2 @parameters Parameters
Print the execute chain for a parse
Print the utf8 string corresponding to a lexical item at a variable offset.
Parameter Description 1 $parse Parse tree 2 $source32 B<address> of utf32 source representation 3 $offset B<offset> to lexical item in utf32 4 $size B<size> in utf32 chars of item
Show an alphabet.
Parameter Description 1 $alphabet Alphabet name
Parse some text and dump the results.
Parameter Description 1 $key Key of text to be parsed 2 $expected Expected result 3 %options Options
Parse some text and print the results.
1 accept_a - Assign.
2 accept_B - Closing parenthesis.
3 accept_b - Open.
4 accept_d - Infix but not assign or semi-colon.
5 accept_p - Prefix.
6 accept_q - Post fix.
7 accept_s - Semi colon.
8 accept_v - Variable.
9 ascii - Define a method for ascii text.
10 asciiToAssignGreek - Translate ascii to the corresponding letters in the assign greek alphabet.
11 asciiToAssignLatin - Translate ascii to the corresponding letters in the assign latin alphabet.
12 asciiToDyadGreek - Translate ascii to the corresponding letters in the dyad greek alphabet.
13 asciiToDyadLatin - Translate ascii to the corresponding letters in the dyad latin alphabet.
14 asciiToEscaped - Translate ascii to the corresponding letters in the escaped ascii alphabet.
15 asciiToPrefixGreek - Translate ascii to the corresponding letters in the prefix greek alphabet.
16 asciiToPrefixLatin - Translate ascii to the corresponding letters in the prefix latin alphabet.
17 asciiToSuffixGreek - Translate ascii to the corresponding letters in the suffix greek alphabet.
18 asciiToSuffixLatin - Translate ascii to the corresponding letters in the suffix latin alphabet.
19 asciiToVariableGreek - Translate ascii to the corresponding letters in the suffix greek alphabet.
20 asciiToVariableLatin - Translate ascii to the corresponding letters in the suffix latin alphabet.
21 assign - Define a method for an assign operator.
22 bracket - Define a method for a bracket operator.
23 C - Parse some text and print the results.
24 checkSet - Check that one of a set of items is on the top of the stack or complain if it is not.
25 checkStackHas - Check that we have at least the specified number of elements on the stack.
26 ClassifyNewLines - Scan input string looking for opportunities to convert new lines into semi colons.
27 ClassifyWhiteSpace - Classify white space per: "lib/Unisyn/whiteSpace/whiteSpaceClassification.
28 create - Create a new unisyn parse from a utf8 string.
29 dumpParseTree - Dump the parse tree.
30 dyad - Define a method for a dyadic operator.
31 error - Write an error message and stop.
32 executeOperator - Print the operator calling sequence.
33 getAlpha - Load the position of a lexical item in its alphabet from the current character.
34 getLexicalCode - Load the lexical code of the current character in memory into the specified register.
35 lexicalItemLength - Put the length of a lexical item into variable size.
36 lexicalNameFromLetter - Lexical name for a lexical item described by its letter.
37 lexicalNumberFromLetter - Lexical number for a lexical item described by its letter.
38 lexToSub - Map a lexical item to a processing subroutine.
39 loadCurrentChar - Load the details of the character currently being processed so that we have the index of the character in the upper half of the current character and the lexical type of the character in the lowest byte.
40 makeExecutionChain - Traverse the parse tree in post order to create an execution chain.
41 MatchBrackets - Replace the low three bytes of a utf32 bracket character with 24 bits of offset to the matching opening or closing bracket.
42 new - Create a new term in the parse tree rooted on the stack.
43 parseExpression - Parse the string of classified lexical items addressed by register $start of length $length.
44 parseUtf8 - Parse a unisyn expression encoded as utf8 and return the parse tree.
45 prefix - Define a method for a prefix operator.
46 print - Print a parse tree.
47 printExecChain - Print the execute chain for a parse
48 printLexicalItem - Print the utf8 string corresponding to a lexical item at a variable offset.
49 printOperatorSequence - Print the operator calling sequence.
50 pushElement - Push the current element on to the stack.
51 pushEmpty - Push the empty element on to the stack.
52 putLexicalCode - Put the specified lexical code into the current character in memory.
53 reduce - Convert the longest possible expression on top of the stack into a term at the specified priority.
54 reduceMultiple - Reduce existing operators on the stack.
55 reload - Reload the variables associated with a parse.
56 semiColon - Define a method for the semicolon operator which comes in two forms: the explicit semi colon and a new line semicolon.
57 semiColonChar - Translate ascii to the corresponding letters in the escaped ascii alphabet.
58 showAlphabet - Show an alphabet.
59 suffix - Define a method for a suffix operator.
60 T - Parse some text and dump the results.
61 testSet - Test a set of items, setting the Zero Flag is one matches else clear the Zero flag.
62 traverseParseTree - Traverse the terms in parse tree in post order and call the operator subroutine associated with each term.
63 variable - Define a method for a variable.
This module is written in 100% Pure Perl and, thus, it is easy to read, comprehend, use, modify and install via cpan:
sudo cpan install Unisyn::Parse
philiprbrenan@gmail.com
http://www.appaapps.com
Copyright (c) 2016-2021 Philip R Brenan.
This module is free software. It may be used, redistributed and/or modified under the same terms as Perl itself.
1 POD Error
The following errors were encountered while parsing the POD:
=pod directives shouldn't be over one line long! Ignoring all 5 lines of content
To install Unisyn::Parse, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Unisyn::Parse
CPAN shell
perl -MCPAN -e shell install Unisyn::Parse
For more information on module installation, please visit the detailed CPAN module installation guide.