The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Parse::Marpa::Doc::Bibliography - A Marpa Bibliography

Aho and Ullman 1972

The Theory of Parsing, Translation and Compiling, Volume I: Parsing by Alfred Aho and Jeffrey Ullman (Prentice-Hall: Englewood Cliffs, New Jersey, 1972). I think this was the standard source for Earley's algorithm for decades. It certainly was my standard source. The account of Earley's algorithm is on pages 320-330.

Aycock and Horspool 2002

Marpa is based on and derived from the parser described in John Aycock and R. Nigel Horspool's "Practical Earley Parsing", The Computer Journal, Vol. 45, No. 6, 2002, pp. 620-630. The idea of doing LR(0) precomputation for Earley's general parsing algorithm, as well as details of how to do it, came to me from this article.

The Aycock and Horspool paper summarizes Earley's very nicely and is available on the web: http://www.cs.uvic.ca/~nigelh/Publications/PracticalEarleyParsing.pdf, unlike "Earley 1970". Aycock and Horspool 2002 is not, however, easy reading. I have been following this particular topic on and off for years. Nonetheless I found this paper very heavy going. Readers who want to get into might find the Marpa documentation, up to and including Parse::Marpa::Doc::Internals, the easiest approach to the Aycock and Horspool paper, though I can't promise you there won't be leaps of insight required at several points.

Dominus 2005

Although my approach to parsing is not heavily influenced by Mark Jason Dominus's Higher Order Perl, just about every page of the rest of the book opened my eyes to new ideas. I wish I'd gotten his book earlier in my coding.

Both the Perl and the English are examples of good writing, and the book is dense with ideas. Mark's discussion on memoization in Chapter 3 is the best I've seen.

Earley 1970

Jay Earley's most accessible paper on his algorithm is "An efficient context-free parsing algorithm", Communications of the Association for Computing Machinery, 13:2:94-102, 1970.

Ordinarily, I'd not bother pointing out 35-year old nits in a brilliant and historically important article. But more than a few people cite this article as not just the first word in Earley parsing, but also the last. They need to be aware of two issues.

First, the parse engine itself, as described, has a serious bug. There's an easy fix, but one that greatly slows down an algorithm whose main problem, in its original form, was speed. The whole matter is well laid out by Aycock and Horspool in their article.

Second, according to Tomita there is a mistake in the parse tree representation. See page 153 of "Grune and Jacobs 1990", page 210 of "Grune and Jacobs 2008", and the bibliography entry for Earley 1970 in "Grune and Jacobs 2008". In the printed edition of the 2008 bibliography, the entry is on page 578, and on the web (ftp://ftp.cs.vu.nl/pub/dick/PTAPG_2nd_Edition/CompleteList.pdf), it's on pp. 583-584. My methods for getting the parses out of Earley sets have come from Aho and Ullman, from Aycock and Horspool, or been of my own device, so I am taking Tomita's word on this one.

Grune and Jacobs 1990

Parsing Techniques: A Practical Guide, by Dick Grune Ceriel Jacobs, (Ellis Horwood Limited: Chichester, West Sussex, England, 1990). This book is available on the Web: http://www.cs.vu.nl/~dick/PTAPG.html

Grune and Jacobs 2008

Parsing Techniques: A Practical Guide, by Dick Grune Ceriel Jacobs, 2nd Edition. (Springer: New York NY, 2008).

This is "Grune and Jacobs 1990" updated. The bibliography for this book is available in enlarged form on the web: ftp://ftp.cs.vu.nl/pub/dick/PTAPG_2nd_Edition/CompleteList.pdf.

Wikipedia

Wikipedia's article on Backus-Naur form is http://en.wikipedia.org/wiki/Backus-Naur_form. It's a great place to start if you don't know the basics of grammars and parsing.

As Wikipedia points out, BNF might better be called Panini-Backus Form. The grammarian Panini gave a precise description of Sanskirt more than 23 centuries earlier in India using a similar notation.