Marpa::Advanced::Bibliography - A Marpa Bibliography


Aho and Ullman 1972

The Theory of Parsing, Translation and Compiling, Volume I: Parsing by Alfred Aho and Jeffrey Ullman (Prentice-Hall: Englewood Cliffs, New Jersey, 1972). I think this was the standard source for Earley's algorithm for decades. It certainly was my standard source. The account of Earley's algorithm is on pages 320-330.

Aycock and Horspool 2002

Marpa is based on ideas from John Aycock and R. Nigel Horspool's "Practical Earley Parsing", The Computer Journal, Vol. 45, No. 6, 2002, pp. 620-630. The idea of doing LR(0) precomputation for Earley's general parsing algorithm, and Marpa's approach to handling nullable symbols and rules, both came from this article.

The Aycock and Horspool paper summarizes Earley's very nicely and is available on the web: Unlike "Earley 1970", Aycock and Horspool 2002 is not easy reading. I have been following this particular topic on and off for years and nonetheless found this paper very heavy going. Readers who want to delve into Aycock and Horspool 2002 might find it helpful to read the Marpa documentation first.

Dominus 2005

Although my approach to parsing is not influenced by Mark Jason Dominus's Higher Order Perl, Mark's treatment of parsing is an excellent introduction to parsing, especially in a Perl context. His focus on just about every other technique except general parsing is pretty much standard, and will help a beginner understand how unconventional Marpa's approach is.

Mark's book opened my eyes to many new ideas. Both Mark's Perl and his English are examples of good writing, and the book is dense with insights. Mark's discussion on memoization in Chapter 3 is the best I've seen. I wish I'd bought his book earlier in my coding.

Mark's book is available on-line. You can download chapter-by-chapter or the whole thing at once, and you can take your pick of his original sources or PDF, at A PDF of the parsing chapter is at

Earley 1970

Of Jay Earley's papers on his general parsing algorithm, the most readily available is "An efficient context-free parsing algorithm", Communications of the Association for Computing Machinery, 13:2:94-102, 1970.

Ordinarily, I'd not bother pointing out 35-year old nits in a brilliant and historically important article. But more than a few people treat this article as not just the first word in Earley parsing, but the last as well. Many implementations of Earley's algorithm come, directly and unaltered, from his paper. These implementers and their users need to be aware of two issues.

First, the recognition engine itself, as described, has a serious bug. There's an easy fix, but one that greatly slows down an algorithm whose main problem, in its original form, was speed. This issue is well laid out by Aycock and Horspool in their article.

Second, according to Tomita there is a mistake in the parse tree representation. See page 153 of "Grune and Jacobs 1990", page 210 of "Grune and Jacobs 2008", and the bibliography entry for Earley 1970 in "Grune and Jacobs 2008". In the printed edition of the 2008 bibliography, the entry is on page 578, and on the web (, it's on pp. 583-584. My methods for producing parse results from Earley sets do not come from Earley 1970, so I am taking Tomita's word on this one.

Grune and Jacobs 1990

Parsing Techniques: A Practical Guide, by Dick Grune and Ceriel Jacobs, (Ellis Horwood Limited: Chichester, West Sussex, England, 1990). This book is available on the Web:

Grune and Jacobs 2008

Parsing Techniques: A Practical Guide, by Dick Grune and Ceriel Jacobs, 2nd Edition. (Springer: New York NY, 2008). This is the most authoritative and comprehensive introduction to parsing I know of. In theory it requires no mathematics, only a programming background, but even so it is moderately difficult reading.

This is "Grune and Jacobs 1990" updated. The bibliography for this book is available in enlarged form on the web:

Leo 1991

Marpa's handling of right-recursion uses the ideas in Joop M.I.M. Leo's "A General Context-Free Parsing Algorithm Running in Linear Time on Every LR(k) Grammar Without Using Lookahead", Theoretical Computer Science, Vol. 82, No. 1, 1991, pp 165-176. Unfortunately, there is no copy of this paper on-line.

This is a difficult paper. At a minimum, readers will want to already be familiar with Earley's algorithm.


Wikipedia's article on Backus-Naur form is It's a great place to start if you don't know the basics of grammars and parsing. As Wikipedia points out, BNF might better be called Panini-Backus Form. The grammarian Panini gave a precise description of Sanskirt more than 23 centuries earlier in India using a similar notation.


Copyright 2007-2010 Jeffrey Kegler, all rights reserved. Marpa is free software under the Perl license. For details see the LICENSE file in the Marpa distribution.