Why not adopt me?

This distribution is up for adoption! If you're interested then please contact the PAUSE module admins via email.

NAME

Text::TokenStream::Lexer - reusable lexer for token-stream scanning

SYNOPSIS

    my $lexer = Text::TokenStream::Lexer->new(
        whitespace => [qr/\s+/, qr/\# [^\n]* (?:\n|\z)/x],
        rules => [
            word => qr/\w+/,
            sym => qr/[^\w\s\#]+/,
        ],
    );

    my $token = $lexer->next_token(\$input_text);

DESCRIPTION

A lexer instance is constructed by specifying regexes that match individual parts of the input text. Each regex is associated with a token type that will be used to distinguish the tokens found. The regexes are tried in the order they're given in the "rules" attribute; this means, for example, that you can have a keyword rule that matches any of a list of specified keywords, followed by an identifier rule that matches arbitrary identifiers, even if keywords have the same syntax as identifiers.

(In actual fact, the regexes are preprocessed into a form that the regex engine can handle more easily, and only one regex match operation is performed to extract each token. This should be completely transparent to the caller.)

A lexer will attempt to skip whitespace before scanning each token; to do that, it uses a separate set of regexes, in the "whitespace" attribute.

CONSTRUCTOR

This class uses Moo, and inherits the standard new constructor.

ATTRIBUTES

`rules`

Required; read-only. Array ref of (identifier, rule) pairs: each rule is a regex (or a literal string), that will be matched at the current position in the input, and the preceding identifier will be used as the type of the token, if this rule matches.

If a rule regex has any named captures, the contents of those captures will be preserved in the value returned by "next_token".

The regexes will be implicitly anchored to the next match position in the string being examined, so you should not add any initial anchor.

It is the caller's responsibility to ensure that the rules match every possible input.

`whitespace`

Read-only; defaults to empty array ref. Array ref of rule pairs, where each rule is a regex (or literal string), that will be treated as whitespace. It will typically be a good idea to include comments (if needed in your language) in this attribute.

The regexes will be implicitly anchored to the next match position in the string being examined, so you should not add any initial anchor.

OTHER METHODS

`next_token`

Takes one argument, which is a reference to a string. First attempts to "skip_whitespace" on the referenced string, and returns undef if the string is empty after any whitespace. Then attempts to match each of the "rules" against the remaining part of the string. If no rule matches, throws an exception. Otherwise, returns a hashref containing the following elements:

type: The identifier corresponding to the rule that matched
text: The text matched by the regex
cuddled: A boolean value, true iff the token was not preceded by whitespace
captures: A hashref of any named captures matched by the regex

`skip_whitespace`

Takes one argument, which is a reference to a string. If none of the "whitespace" patterns match at the start of the referenced string, returns false. Otherwise, removes as many leading whitespace sequences as it can from the beginning of the referenced string, and returns true.

AUTHOR

Aaron Crane, <arc@cpan.org>

COPYRIGHT

LICENCE

This library is free software and may be distributed under the same terms as perl itself. See http://dev.perl.org/licenses/.

To install Text::TokenStream, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Text::TokenStream

CPAN shell

perl -MCPAN -e shell
install Text::TokenStream

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)