NAME

Parser::Combinators - A library of building blocks for parsing text

SYNOPSIS


            
              
              use Parser::Combinators;
my $parser = < a combination of the parser building blocks from Parser::Combinators >
(my $status, my $rest, my $matches) = $parser->($str);
my $parse_tree = getParseTree($matches);

DESCRIPTION

Parser::Combinators is a library of parser building blocks ('parser combinators'), inspired by the Parsec parser combinator library in Haskell (http://legacy.cs.uu.nl/daan/download/parsec/parsec.html). The idea is that you build a parsers not by specifying a grammar (as in yacc/lex or Parse::RecDescent), but by combining a set of small parsers that parse well-defined items.

Usage

Each parser in this library , e.g. word or symbol, is a function that returns a function (actually, a closure) that parses a string. You can combine these parsers by using special parsers like sequence and choice. For example, a JavaScript variable declaration


            
              
              var res = 42;

could be parsed as:


            
              
              my $p =
    sequence [
        symbol('var'),
        word,
        symbol('='),
        natural,
        semi
    ]

if you want to express that the assignment is optional, i.e. var res; is also valid, you can use maybe():


            
              
              my $p =
    sequence [
        symbol('var'),
        word,
        maybe(
            sequence [
               symbol('='),
               natural
               ]
        ),
        semi
    ]

If you want to parse alternatives you can use choice(). For example, to express that either of the next two lines are valid:

you can write


            
              
              my $p = choice( number, sequence [ symbol('return'), parens( number ) ] )

This example also illustrates the `parens()` parser to parse anything enclosed in parenthesis

Provided Parsers

The library is not complete in the sense that not all Parsec combinators have been implemented. Currently, it contains:


            
              
                    whiteSpace : parses any white space, always returns success. 
      * Lexeme parsers (they remove trailing whitespace):
      word : (\w+)
      natural : (\d+)
      symbol : parses a given symbol, e.g. symbol('int')
comma : parses a comma
      semi : parses a semicolon
         
      char : parses a given character
      * Combinators:
      sequence( [ $parser1, $parser2, ... ], $optional_sub_ref )
      choice( $parser1, $parser2, ...) : tries the specified parsers in order
      try : normally, the parser consums matching input. try() stops a parser from consuming the string
      maybe : is like try() but always reports success
      parens( $parser ) : parser '(', then applies $parser, then ')'
      many( $parser) : applies $parser zero or more times
      many1( $parser) : applies $parser one or more times
      sepBy( $separator, $parser) : parses a list of $parser separated by $separator
      oneOf( [$patt1, $patt2,...]): like symbol() but parses the patterns in order
      * Dangerous: the following parsers take a regular expression, so you can mix regexes and other combinators ...                                       
      upto( $patt )
      greedyUpto( $patt)
      regex( $patt)

Labeling

You can label any parser in a sequence using an anonymous hash, for example:


            
              
                sub type_parser {     
sequence [
      {Type =>       word},
      maybe parens choice(
              {Kind => natural},
                                sequence [
                                        symbol('kind'),
                                        symbol('='),
                          {Kind => natural}
                                ] 
                        )        
] 
  }

Applying this parser returns a tuple as follows:


            
              
              my $str = 'integer(kind=8), '
(my $status, my $rest, my $matches) = type_parser($str);

Here,$status is 0 if the match failed, 1 if it succeeded. $rest contains the rest of the string. The actual matches are stored in the array $matches. As every parser returns its resuls as an array ref, $matches contains the concrete parsed syntax, i.e. a nested array of arrays of strings.


            
              
              show($matches) ==> [{'Type' => 'integer'},['kind','\\=',{'Kind' => '8'}]]

You can remove the unlabeled matches and convert the raw tree into nested hashes using getParseTree:


            
              
              my $parse_tree = getParseTree($matches);
  show($parse_tree) ==> {'Type' => 'integer','Kind' => '8'}

A more complete example

I wrote this library because I needed to parse argument declarations of Fortran-95 code. Some examples of valid declarations are:


            
              
              integer(kind=8), dimension(0:ip, -1:jp+1, kp) , intent( In ) :: u, v,w
real, dimension(0:7) :: f 
real(8), dimension(0:7,kp) :: f,g

I want to extract the type and kind, the dimension and the list of variable names. For completeness I'm parsing the `intent` attribute as well. The parser is a sequence of four separate parsers type_parser, dim_parser, intent_parser and arglist_parser. All the optional fields are wrapped in a maybe().


            
              
                 my $F95_arg_decl_parser =    
   sequence [
        whiteSpace,
       {TypeTup => &type_parser},
    maybe(
            sequence [
                    comma,
               &dim_parser
        ], 
        ),
    maybe(
                sequence [
                comma,
                &intent_parser
                ], 
    ),
       &arglist_parser
];
   # where
   sub type_parser {    
        sequence [
       {Type =>      word},
       maybe parens choice(
               {Kind => natural},
                                        sequence [
                                                symbol('kind'),
                                                symbol('='),
                           {Kind => natural}
                                        ] 
                                )        
        ] 
   }
   sub dim_parser {
        sequence [
                symbol('dimension'),
       {Dim => parens sepBy(',', regex('[^,\)]+')) }
        ] 
   }
   sub intent_parser {
    sequence [
           symbol('intent'),
        {Intent => parens word}
        ] 
   }
   sub arglist_parser {
       sequence [
        symbol('::'),
           {Vars => sepBy(',',&word)}
       ]
   }

Running the parser and calling getParseTree() on the first string results in


            
              
              {
'TypeTup' => {
            'Type' => 'integer',
            'Kind' => '8'
        },
'Dim' => ['0:ip','-1:jp+1','kp'],
'Intent' => 'In',
'Vars' => ['u','v','w']
}

See the test fortran95_argument_declarations.t for the source code.

No Monads?!

As this library is inspired by a monadic parser combinator library from Haskell, I have also implemented bindP() and returnP() for those who like monads ^_^ So instead of saying


            
              
              my $pp = sequence [ $p1, $p2, $p3 ]

you can say


            
              
              my $pp = bindP( 
    $p1, 
    sub { (my $x) =@_;
        bindP( 
            $p2,  
            sub {(my $y) =@_;
            bindP(
                $p3,
                sub { (my $z) = @_;
                    returnP->($z);
                    }
                )->($y)
            }
            )->($x);
        }
    );

which is obviously so much better :-)

AUTHOR

Wim Vanderbauwhede <Wim.Vanderbauwhede@gmail.com>

COPYRIGHT

LICENSE

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)