The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Zoidberg::StringParser - simple string parser

SYNOPSIS

        my $base_gram = {
            esc => '\\',
            quotes => {
                q{"} => q{"},
                q{'} => q{'},
            },
        };

        my $parser = Zoidberg::StringParser->new($base_gram);

        my @blocks = $parser->split(
            qr/\|/, 
            qq{ls -al | cat > "somefile with a pipe | in it"} );

        # @blocks now is: 
        # ('ls -al ', ' cat > "somefile with a pipe | in it"');
        # So it worked like split, but it respected quotes

DESCRIPTION

This module is a simple syntaxt parser. It originaly was designed to work like the built-in split function, but to respect quotes. The current version is a little more advanced: it uses user defined grammars to deal with delimiters, an escape char, quotes and braces. Also these grammars can contain hooks to add meta information to each splitted block of text. The parser has a 'pull' mechanism to allow line-by-line parsing, or to define callbacks for when for example an unmatched bracket is encountered.

All grammars and collections of grammars should be considered PRIVATE when used by a Z::SP object.

EXPORT

None by default.

GRAMMARS

TODO

esc

FIXME

If this is an Regexp ref, no double-escape removal is done. Probably if you use a Regexp ref as ecape you also want to set "no_esc_rm".

no_esc_rm

Boolean that tells the parser not to remove the escape char when an escaped token is encountered. Double escapes won't be replaced either. Usefull when a string needs to go through a chain of parsers.

Collection

The collection hash is simply a hash of grammars with the grammar names as keys. When a collection is given all methods can use a grammar name instead of a grammar.

Base grammar

This can be seen as the default grammar, to use it leave the grammar undefined when calling a method. If this base grammar is defined and you specify a grammar at a method call, the specified grammar will overload the base grammar.

METHODS

new(\%base_grammar, \%collection, \%settings)

Simple constructor. See "Collection", "Base grammar" and "settings" for explanation of the arguments.

set($grammar, @input_methods)

Sets begin state for parser. $grammar can either be a hash ref containing a grammar or be the name (key) of a grammar in %collection. See "input methods" for possible values of @input_methods.

reset()

Remove all state information from the parser. Also removes any error messages.

more()

Test for more input. Can trigger the pull mechanism.

Intended usage:

        $p->set($grammar, @input);
        while ($p->more) {
                ($block, $token) = $p->get()
        }
get()

Get next block from input. Intended for atomic use, for most situations either split or getline will do.

next_line()

Loads next line of input from "input methods". This method is called internally by the pull mechanism. Intended for atomic use.

split($grammar, @input_methods)

Get all blocks till input returns undef. Arguments are passed directly to set(). Blocks will by default be passed as scalar refs (unless the grammar's meta function altered them) and tokens as scalars. To be a little compatible with CORE::split all items (blocks and tokens) are passed as plain scalars if $grammar is or was a Regexp reference. ( This behaviour can be faked by giving your grammr a value called 'was_regexp'. ) This behaviour is turned off by the "no_split_intel" setting.

getline($grammar, @input_methods)

Like split but gets only one line from input and without the "intelligent" behaviour. Will try to get more input when the syntax is incomplete unless "allow_broken" is set.

error()

Returns parser error if any. Returns undef if all is well.

input methods

FIXME

settings

The %settings hash contains options that control the general behaviour of the parser. Supported settings are:

allow_broken

If this value is set the parser will not automaticly pull from input when broken syntax is encountered. Very usefull in combination with the getline() method to make sure just one line is read and parsed even if this leaves us with broken syntax.

raise_error

Boolean that controls whether the parser dies when an error is encountered - see "DIAGNOSTICS".

no_split_intel

Boolean, disables "intelligent" behaviour of split() when set.

DIAGNOSTICS

By default this module will croak for fatal errors like wrong argument types only. For less-fatal errors it sets the error function. Notice that some of these "less-fatal" errors may turn out to be fatal after all. If the raise_error setting is set all errors will raise an exception.

FIXME splain error messages

AUTHOR

Jaap Karssenberg || Pardus [Larus] <pardus@cpan.org>

Copyright (c) 2003 Jaap G Karssenberg. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

Contains some code derived from Tie-Hash-Stack-0.09 by Michael K. Neylon.

SEE ALSO

Zoidberg