The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Text::TokenStream - lexer to break text up into user-defined tokens

SYNOPSIS

    my $lexer = Text::TokenStream::Lexer->new(
        whitespace => [qr/\s+/],
        rules => [
            word => qr/\w+/,
            sym => qr/[^\w\s]+/,
        ],
    );

    my $stream = Text::TokenStream->new(
        lexer => $lexer,
        input => "foo *",
    );

    my $tok1 = $stream->next; # --> "word" token containing "foo"
    my $tok2 = $stream->next; # --> "sym" token containing "*"

DESCRIPTION

This class is part of a collection of classes that act together to lex (aka scan) an input text into a stream of tokens.

This token stream class provides the stream interface, along with a notion of the "current position" in the input text, and position-aware error reporting. It composes Text::TokenStream::Role::Stream; that role lists the methods this class provides (so that you can easily write a parser class that has a token stream which in turn handles the tokenizer methods).

The basic lexer machinery is found in Text::TokenStream::Lexer; it is separated out from the token stream so that it can be reused across many inputs.

Tokens are instances of a class, Text::TokenStream::Token by default.

CONSTRUCTOR

This class uses Moo, and inherits the standard new constructor.

ATTRIBUTES

lexer

An instance of Text::TokenStream::Lexer; required; read-only. Will be used to find tokens in the input.

input

Str; required; read-only. The text that will be lexed into a stream of tokens.

input_name

A Maybe[Path]; read-only. Can be coerced from a string. If a defined value is present, it should contain the name of the file that the input was read from, and that name will be used in any error messages.

token_class

The name of a class that inherits from Text::TokenStream::Token; defaults to Text::TokenStream::Token itself; read-only. Tokens found in the input will be constructed as instances of this class.

OTHER METHODS

collect_all

Takes no arguments. Returns a list of all remaining tokens found in the input.

In the current implementation, this method is provided by Text::TokenStream::Role::Stream.

collect_upto

Takes a single argument indicating a token to match, as with Text::TokenStream::Token#matches. Scans through the input until it finds a token that matches the argument, and returns a list of all tokens before the matching one. If no remaining token in the input matches the argument, behaves as "collect_all".

In the current implementation, this method is provided by Text::TokenStream::Role::Stream.

create_token

Takes a listified hash of token attributes, and creates a token instance. The token object is created by calling:

    $self->token_class->new(%data);

If you have particularly complex needs, you may wish to override this method in a subclass.

current_position

Takes no arguments. Returns the 0-based position of the first input character that hasn't yet been returned by "next".

err

Takes multiple arguments, that are concatenated into an error message. (If no arguments are supplied, acts as if you'd supplied the string "Something's wrong".) Throws an exception, reporting the locus of the error as the current input position (using 1-based line and column numbers).

fill

Takes a single positive-integer argument. Attempts to fill an internal buffer of already-lexed tokens so that it contains that many tokens. Returns a boolean that is true iff there were enough tokens to do that.

looking_at

Takes zero or more arguments, each of which indicates a token to match, as with Text::TokenStream::Token#matches. Returns a boolean that is true iff there's at least one more token in the input, and it matches the argument.

next

Takes no arguments. Returns the next token found in the input, and advances the current position past it; if no tokens remain, returns undef. The token instance is created by "create_token".

next_of

Takes a single argument indicating a token to match, as with Text::TokenStream::Token#matches, and an optional string argument describing the current position (for example, "in expression", or "after keyword"). If there are no more tokens in the input, reports an error at the current position, using "err". Otherwise, if the next token doesn't match the argument, reports an error at the position of that token, using "token_err". Otherwise, the next token matches what is being looked for, so that token is returned.

peek

Takes no arguments. Returns the next token that would be returned by "next", but doesn't advance the current input position, and a subsequent "next" call will return the same token.

An internal buffer is used to ensure that every token is lexed only once.

skip_optional

Takes a single argument indicating a token to match, as with Text::TokenStream::Token#matches. If there are no more tokens in the input, or the next token doesn't match the argument, returns false; otherwise, advances past the next token, and returns true.

token_err

Takes a token as an argument, followed by multiple arguments that are concatenated into an error message. (If no non-token arguments are supplied, acts as if you'd supplied the string "Something's wrong".) Throws an exception, reporting the locus of the error as the position of the token (using 1-based line and column numbers).

AUTHOR

Aaron Crane, <arc@cpan.org>

COPYRIGHT

Copyright 2021 Aaron Crane.

LICENCE

This library is free software and may be distributed under the same terms as perl itself. See http://dev.perl.org/licenses/.