++ed by:

4 PAUSE users
1 non-PAUSE user.

Author image Jeffrey Kegler
and 1 contributors


Marpa - Generate Parsers for any BNF Grammar


Marpa generates parses from any BNF grammar. That includes recursive grammars, ambiguous grammars, infinitely ambiguous grammars and grammars with useless or empty productions. If you can write it in BNF, Marpa parses it.

This is a branch from Parse::Marpa. This branch started out experimental. It is still alpha, but in the near future it will replace Parse::Marpa as the main branch.


urhtml_fmt formats HTML documents, indenting them according to their structure. It supplies missing start and end tags. I find urhtml_fmt quite handy. It allows me to get a quick look at the structure of an HTML document. And the formatted version is easier and safer to transform using regular expressions.
urhtml_score computes a "complexity" score and other statistics for HTML scripts. The complexity score is the average depth in the element structure of the characters, divided by the log of the document's length. It's an interesting number.

These scripts are demos of Marpa::UrHTML, which is a high level HTML parser. "High level" means that it finds the structure of an HTML document, as opposed to HTML::Parser which does tokenization or "low level parsing". (Marpa::UrHTML uses HTML::Parser as its low-level parser.)


Marpa's documentation is being brought up to date. What is ready at the moment is only helpful for readers who are familiar with other parsing algorithms, and who are interested in learning more about the approach I used to write the Marpa parse engine. Marpa::Algorithm describes the Marpa algorithm itself. My sources, and other useful references are described in Marpa::Bibliography.

Marpa::Parse_Terms is intended as a quick refresher of parsing terminology. It's all standard, but the emphasis is on the meanings of the terms as they will be used in the (not yet released) documents.

Support, on a volunteer basis, is provided via the standard mechanisms for CPAN modules. The Support document has details.


Jeffrey Kegler

Why is it Called "Marpa"?

Marpa is the name of the greatest of the Tibetan "translators". In his time (the 11th century AD) Indian Buddhism was at its height. A generation of scholars was devoting itself to producing Tibetan versions of Buddhism's Sanskrit scriptures. Marpa became the greatest of them, and today is known as Marpa Lotsawa: "Marpa the Translator".

Translation in the 11th century was not a job for the indoors type. A translator needed to study in India, with the teachers who had the texts and could explain them. From Marpa's home in Tibet's Lhotrak Valley, the best way across the Himalayas to India was over the Khala Chela Pass. To reach the Khala Chela's three-mile high summit, Marpa had to cross two hundred lawless miles of Tibet. Once a pilgrim crested the Himalayas, the road to Nalanda University was all downhill. Eager to reach their destination, the first travelers from Tibet had descended the four hundred miles straight to the hot plains.

The last part of the journey had turned out to be by far the most deadly. Almost no germs live in the cold, thin air of Tibet. Pilgrims who didn't stop to acclimatize themselves reached the great Buddhist center with no immunity to India's diseases. Several large expeditions reached Nalanda only to have every single member die within weeks.

Blatant Plug

There's more about Marpa in my novel, The God Proof, in which his studies, travels and adventures are a subplot. The God Proof centers around Kurt Gödel's proof of God's existence. Yes, that Kurt Gödel, and yes, he really did work out a God Proof (it's in his Collected Works, Vol. 3, pp. 403-404). The God Proof is available as a free download (http://www.lulu.com/content/933192) and in print form at Amazon.com: http://www.amazon.com/God-Proof-Jeffrey-Kegler/dp/1434807355.


Marpa is derived from the parser described in Aycock and Horspool 2002. I've made significant changes to it, which are documented separately (Marpa::Algorithm). Aycock and Horspool, for their part, built on the algorithm discovered by Jay Earley.

I'm grateful to Randal Schwartz for his support over the years that I've been working on Marpa. My contacts with Larry Wall have been few and brief, but his openness to new ideas has been a major encouragement and his insight into the relationship between "natural language" and computer language has been a major influence. More recently, Allison Randal and Patrick Michaud have been generous with their very valuable time. They might have preferred that I volunteered as a Parrot cage-cleaner, but if so, they were too polite to say.

Many at perlmonks.org answered questions for me. I used answers from chromatic, Corion, dragonchild, jdporter, samtregar and Juerd, among others, in writing this module. I'm just as grateful to those whose answers I didn't use. My inquiries were made while I was thinking out the code and it wasn't always 100% clear what I was after. If the butt is moved after the round, it shouldn't count against the archer.

In writing the Pure Perl version of Marpa, I benefited from studying the work of Francois Desarmenien (Parse::Yapp), Damian Conway (Parse::RecDescent) and Graham Barr (Scalar::Util). Adam Kennedy patiently instructed me in module writing, both on the finer points and on issues about which I really should have know better.


Copyright 2007-2009 Jeffrey Kegler, all rights reserved. Marpa is free software under the Perl license. For details see the LICENSE file in the Marpa distribution.