The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Marpa - Parse any Language You Can Describe in BNF

DESCRIPTION

This is ALPHA software

This is alpha software. There may be bugs. The interface may change. Please be careful. Do not rely on it for anything mission-critical.

General BNF Parsing

Marpa parses any language whose grammar can be written in BNF. That includes recursive grammars, ambiguous grammars, infinitely ambiguous grammars and grammars with useless or empty productions.

Status

Marpa is a branch from Parse::Marpa. Marpa started as an experiment, but is now alpha. Marpa will, in the near future, replace Parse::Marpa.

Documentation is only partially complete. An HTML parser based on the Marpa parse engine is finished and documented. The documentation for the interface to the Marpa parse engine itself is in progress.

WHAT'S ALREADY DOCUMENTED

urhtml_fmt

urhtml_fmt formats HTML documents, indenting them according to their structure. It supplies missing start and end tags. urhtml_fmt is handy for getting a quick overview of the structure of an HTML document. Once an HTML document has been reformatted according to its structure by urhtml_fmt, it can be easier to transform using other programs. urhtml_fmt uses Marpa::UrHTML to do the HTML parsing.

urhtml_score

urhtml_score computes a "complexity" score and other statistics for HTML scripts. The complexity score is the average depth in the element structure of the characters, divided by the log of the document's length. It's an interesting number. urhtml_score uses Marpa::UrHTML to do the HTML parsing.

Marpa::UrHTML

Marpa::UrHTML is a high-level HTML parser, based on the Marpa parse engine. It finds the structure of an HTML document. A CSS-like specifier syntax allows the user to specify semantic actions written in Perl for elements, classes and terminals. Marpa::UrHTML uses HTML::Parser as its tokenization layer.

MARPA PARSE ENGINE DOCUMENTATION

The documentation for the Marpa parse engine itself is in progress. The few documents which are ready at the moment are of interest only to readers who are familiar with other parsing algorithms, and who want to know more about the approach used in the Marpa parse engine. Marpa::Algorithm describes the Marpa algorithm itself. My sources, and other useful references are described in Marpa::Bibliography. Marpa::Parse_Terms is intended as a quick refresher of parsing terminology.

AUTHOR

Jeffrey Kegler

Why is it Called "Marpa"?

Marpa is the name of the greatest of the Tibetan "translators". In his time (the 11th century AD) Indian Buddhism was at its height. A generation of scholars was devoting itself to producing Tibetan versions of Buddhism's Sanskrit scriptures. Marpa became the greatest of them, and today is known as Marpa Lotsawa: "Marpa the Translator".

Translation in the 11th century was not a job for the indoors type. A translator needed to study in India, with the teachers who had the texts and could explain them. From Marpa's home in Tibet's Lhotrak Valley, the best way across the Himalayas to India was over the Khala Chela Pass. To reach the Khala Chela's three-mile high summit, Marpa had to cross two hundred lawless miles of Tibet. Once a pilgrim crested the Himalayas, the road to Nalanda University was all downhill. Eager to reach their destination, the first travelers from Tibet had descended the four hundred miles straight to the hot plains.

The last part of the journey had turned out to be by far the most deadly. Almost no germs live in the cold, thin air of Tibet. Pilgrims who didn't stop to acclimatize themselves reached the great Buddhist center with no immunity to India's diseases. Several large expeditions reached Nalanda only to have every single member die within weeks.

Blatant Plug

There's more about Marpa in my novel, The God Proof, in which his studies, travels and adventures are a subplot. The God Proof centers around Kurt Gödel's proof of God's existence. Yes, that Kurt Gödel, and yes, he really did work out a God Proof (it's in his Collected Works, Vol. 3, pp. 403-404). The God Proof is available as a free download (http://www.lulu.com/content/933192) and in print form at Amazon.com: http://www.amazon.com/God-Proof-Jeffrey-Kegler/dp/1434807355.

ACKNOWLEDGMENTS

Marpa is derived from the parser described in Aycock and Horspool 2002. I've made significant changes to it, which are documented separately (Marpa::Algorithm). Aycock and Horspool, for their part, built on the algorithm discovered by Jay Earley.

I'm grateful to Randal Schwartz for his support over the years that I've been working on Marpa. My contacts with Larry Wall have been few and brief, but his openness to new ideas has been a major encouragement and his insight into the relationship between "natural language" and computer language has been a major influence. More recently, Allison Randal and Patrick Michaud have been generous with their very valuable time. They might have preferred that I volunteered as a Parrot cage-cleaner, but if so, they were too polite to say.

Many at perlmonks.org answered questions for me. I used answers from chromatic, Corion, dragonchild, jdporter, samtregar and Juerd, among others, in writing this module. I'm just as grateful to those whose answers I didn't use. My inquiries were made while I was thinking out the code and it wasn't always 100% clear what I was after. If the butt is moved after the round, it shouldn't count against the archer.

In writing the Pure Perl version of Marpa, I benefited from studying the work of Francois Desarmenien (Parse::Yapp), Damian Conway (Parse::RecDescent) and Graham Barr (Scalar::Util). Adam Kennedy patiently instructed me in module writing, both on the finer points and on issues about which I really should have know better.

SUPPORT

Marpa comes without warranty. Support is provided on a volunteer basis through the standard mechanisms for CPAN modules. The Support document has details.

LICENSE AND COPYRIGHT

Copyright 2007-2009 Jeffrey Kegler, all rights reserved. Marpa is free software under the Perl license. For details see the LICENSE file in the Marpa distribution.