NAME
Parse::Token - Class defining the tokens used by Parse::Lex.pm (Beta 2.02).
SYNOPSIS
require 5.000;
use Parse::Lex;
@token = qw(
ADDOP [-+]
INTEGER [1-9][0-9]*
);
$lexer = Parse::Lex->new(@token);
$lexer->from(\*DATA);
$content = $INTEGER->next;
if ($INTEGER->status) {
print "$content\n";
}
$content = $ADDOP->next;
if ($ADDOP->status) {
print "$content\n";
}
if ($INTEGER->isnext(\$content)) {
print "$content\n";
}
__END__
1+2
DESCRIPTION
The Token package defines the lexemes used by Parse::Lex or Parse::CLex. The Lex::new method of the Parse::Lex package indirectly creates an object of class Token for each recognized lexeme. The methods next and isnext of the Token package permit easily interfacing the lexical analyzer with a recursive-descent syntactic analyzer (parser). For interfacing with byacc, see the Parse::YYLex package.
The Parse::Token package is not intended to be used directly. This package is included via use Parse::Lex.
Methods
- action
-
Returns the anonymous subroutine defined within the
Parse::Tokenobject. - factory LIST
-
Creates a list of
Parse::Tokenobjects from a list of token specifications. The list can also include objects of classParse::Tokenor of a class derived from it. Can be used as a class method or instance method.The
factory(LIST)method can be used to create a set of tokens which are not within the analysis automaton. This method carries out two operations: 1) it creates the objects based on the specifications given in LIST (see thenew()method), and 2) it imports the created objects into the calling package.You could for example write:
%keywords = qw ( PROC undef FUNC undef RETURN undef IF undef ELSE undef WHILE undef PRINT undef READ undef ); Parse::Token->factory(%keywords);and install these tokens in a symbol table in the following manner:
foreach $name (keys %keywords) { $symbol{"\L$name"} = [${$name}, '']; }${$name}is theTokenobject.During the lexical analysis phase, you can use the tokens in the following manner:
qw(IDENT [a-zA-Z][a-zA-Z0-9]*), sub { $symbol{$_[1]} = [] unless defined $symbol{$_[1]}; my $type = $symbol{$_[1]}[0]; $lexer->setToken((not defined $type) ? $VAR : $type); $_[1]; # THE TOKEN TEXT }This permits indicating that any symbol of unknown type is a variable.
In this example we have used
$_[1]which corresponds to the text recognized by the regular expression. This text is what is returned by the anonymous subroutine. - get EXPR
-
getobtains the value of the attribute named by the result of evaluating EXPR. You can also use the name of the attribute as a method name. - getText
-
Returns the character string that was recognized by means of this
Parse::Tokenobject.Same as the text() method.
- isnext EXPR
- isnext
-
Returns the status of the token. The consumed string is put into EXPR if it is a reference to a scalar.
- name
-
Returns the symbolic name of the
Parse::Tokenobject. - next
-
Activate searching for the lexeme defined by the regular expression contained in the object. If this lexeme is recognized on the character stream to analyze,
nextreturns the string found and sets the status of the object to true. - new SYMBOL_NAME, REGEXP, SUB
-
Creates an object of the
Parse::Tokentype. The arguments of thenewmethod are: a symbolic name, a regular expression, and an anonymous subroutine.REGEXP is either a simple regular expression, or a reference to an array containing from one to three regular expressions. In the latter case the lexeme can span several lines. For example, it can be a character string delimited by quotation marks, comments in a C program, etc. The regular expressions are used to recognize:
1. The beginning of the lexeme,
2. The "body" of the lexeme; if this second expression is missing,
Parse::Lexuses "(?:.*?)",3. the end of the lexeme; if this last expression is missing then the first one is used. (Note! The end of the lexeme cannot span several lines).
Example:
qw(STRING), [qw(" (?:[^"\\\\]+|\\\\(?:.|\n))* ")],These regular expressions can recognize multi-line strings delimited by quotation marks, where the backslash is used to quote the quotation marks appearing within the string. Notice the quadrupling of the backslash.
Here is a variation of the previous example which uses the
soption to include newline in the characters recognized by ".":qw(STRING), [qw(" (?s)(?:[^"\\\\]+|\\\\.)* ")],(Note: it is possible to write regular expressions which are more efficient in terms of execution time, but this is not our objective with this example.)
The anonymous subroutine is called when the lexeme is recognized by the lexical analyzer. This subroutine takes two arguments:
$_[0]contains theParse::Tokenobject, and$_[1]contains the string recognized by the regular expression. The scalar returned by the anonymous subroutine defines the character string memorized in theParse::Tokenobject.In the anonymous subroutine you can use the positional variables
$1,$2, etc. which correspond to the groups of parentheses in the regular expression. - regexp
-
Returns the regular expression of the
Tokenobject. - set LIST
-
Allows marking a Token object with a list of attribute-value pairs.
An attribute name can be used as a method name.
- setText EXPR
-
The value of
EXPRdefines the character string associated with the lexeme.Same as the
text(EXPR)method. - status EXPR
- status
-
Indicates if the last search of the lexeme succeeded or failed.
status EXPRoverrides the existing value and sets it to the value of EXPR. - text EXPR
- text
-
text()Returns the character string recognized by means of theTokenobject. The value ofEXPRsets the character string associated with the lexeme. - trace OUTPUT
- trace
-
Class method which activates/deactivates a trace of the lexical analysis.
OUTPUTcan be a file name or a reference to a filehandle to which the trace will be directed.
ERROR HANDLING
To handle the cases of nonrecognition of lexemes you can define a special Token object at the end of the list of tokens which defines the lexical analyzer. If the search for this token succeeds it is then possible to call a subroutine reserved for error handling.
AUTHOR
Philippe Verdret. Documentation translated to English by Vladimir Alexiev and Ocrat.
ACKNOWLEDGMENTS
Version 2.0 owes much to suggestions made by Vladimir Alexiev. Ocrat has significantly contributed to improving this documentation.
REFERENCES
Friedl, J.E.F. Mastering Regular Expressions. O'Reilly & Associates 1996.
Mason, T. & Brown, D. - Lex & Yacc. O'Reilly & Associates, Inc. 1990.
COPYRIGHT
Copyright (c) 1995-1998 Philippe Verdret. All rights reserved. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 228:
You forgot a '=back' before '=head1'