The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Marpa - Parse any Language You Can Describe in BNF

DESCRIPTION

This is ALPHA software

This is alpha software. There may be bugs. The interface may change. Please be careful. Do not rely on it for anything mission-critical.

General BNF Parsing

Marpa parses any language whose grammar can be written in BNF. That includes recursive grammars, ambiguous grammars, infinitely ambiguous grammars and grammars with useless or empty productions.

History

Marpa is a branch from Parse::Marpa. Marpa will, in the near future, replace Parse::Marpa.

WHAT MARPA CONTAINS

The Marpa package contains

THE MARPA PARSER GENERATOR

The Marpa::API document is a semi-tutorial overview of the Marpa parser generator and its API. Marpa::API contains a guide to the rest of the parser generator's documentation.

MARPA'S HTML PARSER

Marpa::UrHTML was written as a proof-of-concept of Marpa -- an application of Marpa to a difficult real-life problem. In addition to the Marpa::UrHTML module itself, there are two utilities: urhtml_fmt and urhtml_score.

Marpa::UrHTML

Marpa::UrHTML is a high-level HTML parser, based on the Marpa parse engine. It finds the structure of an HTML document. A CSS-like specifier syntax allows the user to specify semantic actions written in Perl for elements, classes and terminals. Marpa::UrHTML uses HTML::Parser as its tokenization layer.

urhtml_fmt

urhtml_fmt formats HTML documents, indenting them according to their structure. It supplies missing start and end tags. urhtml_fmt is handy for getting a quick overview of the structure of an HTML document. Once an HTML document has been reformatted according to its structure by urhtml_fmt, it can be easier to transform using other programs. urhtml_fmt uses Marpa::UrHTML to do the HTML parsing.

urhtml_score

urhtml_score computes a "complexity" score and other statistics for HTML scripts. The complexity score is the average depth in the element structure of the characters, divided by the log of the document's length. urhtml_score uses Marpa::UrHTML to do the HTML parsing.

OTHER TECHNICAL DOCUMENTATION

A few of the technical documents are not specific to the API or to the HTML parser.

Marpa::Parse_Terms is intended as a quick refresher in parsing terminology. My sources, and other useful references, are described in Marpa::Bibliography.

Marpa::Algorithm describes the Marpa algorithm itself. It will only be of interest to those with a theoretical bent.

AUTHOR

Jeffrey Kegler

Why is it Called "Marpa"?

Marpa is the name of the greatest of the Tibetan "translators". In his time (the 11th century AD) Indian Buddhism was at its height. A generation of scholars was devoting itself to producing Tibetan versions of Buddhism's Sanskrit scriptures. Marpa became the greatest of them, and today is known as Marpa Lotsawa: "Marpa the Translator".

Translation in the 11th century was not a job for the indoors type. A translator needed to study in India, with the teachers who had the texts and could explain them. From Marpa's home in Tibet's Lhotrak Valley, the best way across the Himalayas was over the Khala Chela Pass. To reach the Khala Chela's three-mile high summit, Marpa had to cross two hundred lawless miles of Tibet. Once a pilgrim crested the Himalayas, Nalanda University was downhill, four hundred miles beyond.

Marpa spent three years in Nepal, acclimatizing himself in the foothills. Tibetans had learned the hard way not to march straight on to Nalanda. Almost no germs live in the cold, thin air of Tibet. Tibetans arriving directly in the lowlands had no immunities. Whole expeditions had perished within weeks of arrival on the hot plains.

Blatant Plug

There's more about Marpa in my novel, The God Proof, in which his studies, travels and adventures are a subplot. The God Proof centers around Kurt Gödel's proof of God's existence. Yes, that Kurt Gödel, and yes, he really did work out a God Proof (it's in his Collected Works, Vol. 3, pp. 403-404). The God Proof is available as a free download (http://www.lulu.com/content/933192). It can be purchased in print form at Amazon.com: http://www.amazon.com/God-Proof-Jeffrey-Kegler/dp/1434807355.

ACKNOWLEDGMENTS

Marpa is derived from the parser described in Aycock and Horspool 2002. I've made significant changes to it, which are documented separately (Marpa::Algorithm). Aycock and Horspool, for their part, built on the algorithm discovered by Jay Earley.

I'm grateful to Randal Schwartz for his support over the years that I've been working on Marpa. My chats with Larry Wall have been few and brief, but his openness to new ideas has been a major encouragement and his insight into the relationship between "natural language" and computer language has been a major influence. More recently, Allison Randal and Patrick Michaud have been generous with their very valuable time. They might have preferred that I volunteered as a Parrot cage-cleaner, but if so, they were too polite to say.

Many at perlmonks.org answered questions for me. I used answers from chromatic, Corion, dragonchild, jdporter, samtregar and Juerd, among others, in writing this module. I'm just as grateful to those whose answers I didn't use. My inquiries were made while I was thinking out the code and it wasn't always 100% clear what I was after. If the butt is moved after the round, it shouldn't count against the archer.

In writing the Pure Perl version of Marpa, I benefited from studying the work of Francois Desarmenien (Parse::Yapp), Damian Conway (Parse::RecDescent) and Graham Barr (Scalar::Util). Adam Kennedy patiently instructed me in module writing, both on the finer points and on issues about which I really should have know better.

SUPPORT

Marpa comes without warranty. Support is provided on a volunteer basis through the standard mechanisms for CPAN modules. The Support document has details.

LICENSE AND COPYRIGHT

Copyright 2007-2010 Jeffrey Kegler, all rights reserved. Marpa is free software under the Perl license. For details see the LICENSE file in the Marpa distribution.