Text::Document - a text document subject to statistical analysis ++

"Text::Document" allows to perform simple Information-Retrieval-oriented statistics on pure-text documents. Text can be added in chunks, so that the document may be incrementally built, for instance by a class like "HTML::Parser". A simple algorithm ...

ASPINELLI/Text-Document-1.07 - 04 Feb 2002 18:00:51 GMT - Search in distribution

Text::Amuse::Document 1 ++
MELMOTHX/Text-Amuse-0.22 - 21 Mar 2015 10:41:49 GMT - Search in distribution

Text::Corpus::CNN::Document - Parse CNN article for research. ++

"Text::Corpus::CNN::Document" provides methods for accessing specific portions of CNN news articles for personnel researching and testing of information processing methods. Read the CNN Interactive Service Agreement to ensure you abide with their Ser...

KUBINA/Text-Corpus-CNN-1.02 - 21 Aug 2010 16:33:02 GMT - Search in distribution

Text::Mining::Corpus::Document - Provenance and Representations for Documents ++

INTERFACE DIAGNOSTICS "Error message here, perhaps with %s placeholders" [Description of error here] "Another error message here" [Description of error here] [Et cetera, et cetera] CONFIGURATION AND ENVIRONMENT Text::Mining::Corpus::Document requires...

ROGERHALL/Text-Mining-0.08 - 15 Mar 2009 17:06:03 GMT - Search in distribution

Text::Corpus::Inspec::Document - Parse Inspec abstract for research. ++

"Text::Corpus::Inspec::Document" provides methods for accessing specific portions of Inspec abstracts for researching and testing of information processing methods. CONSTRUCTOR "new" The method "new" creates an instance of the "Text::Corpus::Inspec" ...

KUBINA/Text-Corpus-Inspec-1.00 - 09 Dec 2009 03:41:43 GMT - Search in distribution

Text::Corpus::NewYorkTimes::Document - Parse NYT article for research. ++

"Text::Corpus::NewYorkTimes::Document" provides methods for accessing specific portions of news articles from the New York Times corpus. CONSTRUCTOR "new" The constructor "new" creates an instance of the "Text::Corpus::NewYorkTimes::Document" class w...

KUBINA/Text-Corpus-NewYorkTimes-1.01 - 09 Dec 2009 03:41:31 GMT - Search in distribution

Text::Corpus::VoiceOfAmerica::Document - Parse a VOA article for research. ++

"Text::Corpus::VoiceOfAmerica::Document" provides methods for accessing the content of VOA news articles for the researching and testing of information processing techniques. Read the Voice of America's Terms of Use statement to ensure you abide by i...

KUBINA/Text-Corpus-VoiceOfAmerica-1.03 - 24 Aug 2010 14:15:50 GMT - Search in distribution

Document::Writer::TextArea - A page in a document ++
GPHAT/Document-Writer-0.13 - 01 May 2011 14:17:41 GMT - Search in distribution

XML::XQL::DOM - Adds XQL support to XML::DOM nodes ++

XML::XQL::DOM adds methods to XML::DOM nodes to support XQL queries on XML::DOM document structures. See XML::XQL and XML::XQL::Query for more details. XML::DOM::Node describes the xql() method. ...

ENNO/libxml-enno-1.02 - 27 Mar 2000 16:23:22 GMT - Search in distribution

XML::DOM::XPath - Perl extension to add XPath support to XML::DOM, using XML::XPath engine ++

XML::DOM::XPath allows you to use XML::XPath methods to query a DOM. This is often much easier than relying only on getElementsByTagName. It lets you use all of the XML::DOM methods. METHODS Those methods can be applied to a whole dom object or to a ...

MIROD/XML-DOM-XPath-0.14 - 15 Apr 2008 15:35:41 GMT - Search in distribution

Template::Plugin::XML::LibXML - XML::LibXML Template Toolkit Plugin ++

This module provides a plugin for the XML::LibXML module. It can be utilised the same as any other Template Toolkit plugin, by using a USE statement from within a Template. The use statment will return a reference to root node of the parsed document ...

MARKF/Template-Plugin-XML-LibXML-1.07 - 12 Aug 2004 10:10:32 GMT - Search in distribution

XML::DOM - A perl module for building DOM Level 1 compliant document structures 1 ++

This module extends the XML::Parser module by Clark Cooper. The XML::Parser module is built on top of XML::Parser::Expat, which is a lower level interface to James Clark's expat library. XML::DOM::Parser is derived from XML::Parser. It parses XML str...

TJMATHER/XML-DOM-1.44   (2 reviews) - 26 Jul 2005 01:06:30 GMT - Search in distribution

XML::LibXML::Augment - extend XML::LibXML::{Attr,Element,Document} on a per-namespace/element basis 2 ++

XML::LibXML is super-awesome. However, I don't know about you, but sometimes I wish it had some domain-specific knowledge. For example, if I have an XML::LibXML::Element which represents an HTML "<form>" element, why can't it have a "submit" method? ...

TOBYINK/XML-LibXML-Augment-0.004 - 16 Sep 2014 19:01:14 GMT - Search in distribution

XML::Sablotron::DOM - The DOM interface to Sablotron's internal structures ++

Sablotron uses internally the DOM-like data structures to represent parsed XML trees. In the "sdom.h" header file is defined a subset of functions allowing the DOM access to these structures. What is it good for You may find this module useful if you...

PAVELH/XML-Sablotron-1.01 - 26 May 2005 08:48:46 GMT - Search in distribution

Pod::Clipper - Extract blocks of POD from a text document ++

This module allows you to divide a document/string into POD and non-POD blocks of text. This is useful for extracting POD data (or code) from a "mixed" document, like most perl modules on CPAN. POD data is identified as per the perlpodspec manpage. I...

YHA/Pod-Clipper-0.01 - 01 Jun 2010 03:19:33 GMT - Search in distribution

LibXML.pm 63 ++
SHLOMIF/XML-LibXML-2.0118   (14 reviews) - 05 Feb 2015 10:57:03 GMT - Search in distribution

pdf2ocr - get text content of pdf document images within ++

Argument is a pdf file. This script assumes that each page in the pdf is one 8.5x11 page.. ONE image that's what the calculations are set up for. USAGE EXAMPLES OPTION FLAGS -h help -d debug -v version -s cache by sum on -n don't do anything, just sh...

LEOCHARRE/PDF-OCR-1.11 - 20 Apr 2009 13:01:05 GMT - Search in distribution
  • PDF::OCR::Thorough - DEPRECATED extract text fom pdf document resorting to ocr as needed