Lingua::Treebank - Perl extension for manipulating the Penn Treebank format River stage one • 1 direct dependent • 1 total dependent

This class knows how to read two treebank formats, the Penn format and the Chomsky Normal Form (CNF) format. These formats differ in how they handle terminal nodes. The Penn format places pre-terminal part of speech tags in the left-hand position of ...

KAHN/Lingua-Treebank-0.16 - 28 Aug 2008 20:08:52 GMT

Lingua::Treebank::Const - Object modeling constituent from a treebank River stage one • 1 direct dependent • 1 total dependent

Module for describing simple constituents of the Penn Treebank. Recursive behaviors are implied. Note assumption that terminal nodes (those with defined "word" values) will not have "children", and vice versa. This assumption is currently unchecked b...

KAHN/Lingua-Treebank-0.16 - 28 Aug 2008 20:08:52 GMT

Lingua::Treebank::HeadFinder - Head-finding in Lingua::Treebank River stage one • 1 direct dependent • 1 total dependent

The L::TB::HeadFinder object is initialized from a list like the one in To do...

KAHN/Lingua-Treebank-0.16 - 28 Aug 2008 20:08:52 GMT

get_words - given collapsed treebank, print words only River stage one • 1 direct dependent • 1 total dependent

Reads input files (or STDIN) for Penn-style trees, one per line, and prints out only the words, one tree per line. Providing the "-sgml" tag makes the output pseudo-SGML by including angle-bracketed "<s>" and "</s>" tokens at the beginning and end of...

KAHN/Lingua-Treebank-0.16 - 28 Aug 2008 20:08:52 GMT

list-edges - reads penn treebanks, prints out all edges found in each tree, one tree per line River stage one • 1 direct dependent • 1 total dependent

This program lists all edges in the trees presented, one tree per line. Edges are LABEL,INDEX,INDEX where INDEX values come from between the words (0-based). CAVEATS The trees must be in Penn treebank format. TO DO None that I know of....

KAHN/Lingua-Treebank-0.16 - 28 Aug 2008 20:08:52 GMT

vocabulary - extract vocabularies from Penn treebank files River stage one • 1 direct dependent • 1 total dependent

Given a list of Penn treebank files, this script extracts the words, parts of speech, and non-terminal node names and emits each in a separate file in order of frequency. Note that giving a "-" argument for any of ntfile, posfile, or wordfile causes ...

KAHN/Lingua-Treebank-0.16 - 28 Aug 2008 20:08:52 GMT

tree-inflate - transform a one-tree-per-line treebank into something human-readable River stage one • 1 direct dependent • 1 total dependent

Reads one-tree-per-line from STDIN or indicated files, reformats the trees according to a Penn standard (spreading daughters to the next line, applying indenting, etc) and prints them to STDOUT. Handy with *less* etc for spot-checking trees stored in...

KAHN/Lingua-Treebank-0.16 - 28 Aug 2008 20:08:52 GMT

tree-collapse - reads multi-line Penn trees from files or STDIN and outputs trees one per line. River stage one • 1 direct dependent • 1 total dependent

Reads inflated Penn treebank-format trees, with children indented and possibly on different lines, and outputs intact trees, one tree per line with whitespace as input....

KAHN/Lingua-Treebank-0.16 - 28 Aug 2008 20:08:52 GMT

list-rewrites - reads penn treebanks, prints out all rewrites found River stage one • 1 direct dependent • 1 total dependent

This program lists all rewrites in all trees presented by file or on STDIN to this script. CAVEATS The trees must be in Penn treebank format. The rewrites will not necessarily be unique; if you want them to be unique, you will have to pipe the output...

KAHN/Lingua-Treebank-0.16 - 28 Aug 2008 20:08:52 GMT

9 results (0.04 seconds)