++ed by:

4 PAUSE users
1 non-PAUSE user.

Author image Jeffrey Kegler
and 1 contributors


Marpa - Parse any Language You Can Describe in BNF


This is ALPHA software

This is alpha software. There may be bugs. The interface may change. Please be careful. Do not rely on it for anything mission-critical.

General BNF Parsing

Marpa parses any language whose grammar can be written in BNF. That includes recursive grammars, ambiguous grammars, infinitely ambiguous grammars and grammars with useless or empty productions.


Marpa is a branch from Parse::Marpa. Marpa started as an experiment, but is now alpha. Marpa will, in the near future, replace Parse::Marpa.


The Marpa package contains


The API for the Marpa parser generator is documented in Marpa::API namespace The Marpa::API document is a semi-tutorial overview. It contains a guide to the rest of the parser generator's documentation.


Marpa::UrHTML was written as a proof-of-concept of Marpa -- a practical application of Marpa to a difficult problem. In addition to the Marpa::UrHTML module itself, there are two handy utilities: urhtml_fmt and urhtml_score.


Marpa::UrHTML is a high-level HTML parser, based on the Marpa parse engine. It finds the structure of an HTML document. A CSS-like specifier syntax allows the user to specify semantic actions written in Perl for elements, classes and terminals. Marpa::UrHTML uses HTML::Parser as its tokenization layer.


urhtml_fmt formats HTML documents, indenting them according to their structure. It supplies missing start and end tags. urhtml_fmt is handy for getting a quick overview of the structure of an HTML document. Once an HTML document has been reformatted according to its structure by urhtml_fmt, it can be easier to transform using other programs. urhtml_fmt uses Marpa::UrHTML to do the HTML parsing.


urhtml_score computes a "complexity" score and other statistics for HTML scripts. The complexity score is the average depth in the element structure of the characters, divided by the log of the document's length. It's an interesting number. urhtml_score uses Marpa::UrHTML to do the HTML parsing.


A few of the technical documents are not specific to the API or to the HTML parser.

Marpa::Parse_Terms is intended as a quick refresher in parsing terminology. My sources, and other useful references, are described in Marpa::Bibliography.

Marpa::Algorithm describes the Marpa algorithm itself. It will only be of interest to those with a theoretical bent.


Jeffrey Kegler

Why is it Called "Marpa"?

Marpa is the name of the greatest of the Tibetan "translators". In his time (the 11th century AD) Indian Buddhism was at its height. A generation of scholars was devoting itself to producing Tibetan versions of Buddhism's Sanskrit scriptures. Marpa became the greatest of them, and today is known as Marpa Lotsawa: "Marpa the Translator".

Translation in the 11th century was not a job for the indoors type. A translator needed to study in India, with the teachers who had the texts and could explain them. From Marpa's home in Tibet's Lhotrak Valley, the best way across the Himalayas to India was over the Khala Chela Pass. To reach the Khala Chela's three-mile high summit, Marpa had to cross two hundred lawless miles of Tibet. Once a pilgrim crested the Himalayas, the road to Nalanda University was all downhill.

Eager to reach their destination, the first travelers from Tibet had immediately descended the four hundred miles. For Marpa's predecessors, this last part of the journey had turned out to be by far the most deadly. Almost no germs live in the cold, thin air of Tibet. Marpa would spend two years in the foothills of Nepal, acclimatizing himself in the foothills.

The first Tibetan pilgrims were unaware of the danger, and eager to find teachers in the hot plains of India. They reached the great Buddhist university at Nalanda with no immunity to India's many diseases. Their hosts could only watch as every single member of several large expeditions died with weeks of arrival.

Blatant Plug

There's more about Marpa in my novel, The God Proof, in which his studies, travels and adventures are a subplot. The God Proof centers around Kurt Gödel's proof of God's existence. Yes, that Kurt Gödel, and yes, he really did work out a God Proof (it's in his Collected Works, Vol. 3, pp. 403-404). The God Proof is available as a free download (http://www.lulu.com/content/933192). It can be purchased in print form at Amazon.com: http://www.amazon.com/God-Proof-Jeffrey-Kegler/dp/1434807355.


Marpa is derived from the parser described in Aycock and Horspool 2002. I've made significant changes to it, which are documented separately (Marpa::Algorithm). Aycock and Horspool, for their part, built on the algorithm discovered by Jay Earley.

I'm grateful to Randal Schwartz for his support over the years that I've been working on Marpa. My contacts with Larry Wall have been few and brief, but his openness to new ideas has been a major encouragement and his insight into the relationship between "natural language" and computer language has been a major influence. More recently, Allison Randal and Patrick Michaud have been generous with their very valuable time. They might have preferred that I volunteered as a Parrot cage-cleaner, but if so, they were too polite to say.

Many at perlmonks.org answered questions for me. I used answers from chromatic, Corion, dragonchild, jdporter, samtregar and Juerd, among others, in writing this module. I'm just as grateful to those whose answers I didn't use. My inquiries were made while I was thinking out the code and it wasn't always 100% clear what I was after. If the butt is moved after the round, it shouldn't count against the archer.

In writing the Pure Perl version of Marpa, I benefited from studying the work of Francois Desarmenien (Parse::Yapp), Damian Conway (Parse::RecDescent) and Graham Barr (Scalar::Util). Adam Kennedy patiently instructed me in module writing, both on the finer points and on issues about which I really should have know better.


Marpa comes without warranty. Support is provided on a volunteer basis through the standard mechanisms for CPAN modules. The Support document has details.


Copyright 2007-2010 Jeffrey Kegler, all rights reserved. Marpa is free software under the Perl license. For details see the LICENSE file in the Marpa distribution.