HTML::Extract - Perl extension for getting text and HTML snippets out of HTML pages in general. River stage one • 1 direct dependent • 1 total dependent

This is a pretty simple little Perl module for getting text out of HTML pages. It's really designed so that you can call it in anything where you would otherwise be looking for a way of stripping part of web pages away (for example, if you are extrac...

CSELT/HTML-Extract-0.25 - 15 Feb 2007 22:27:44 UTC

File::Extract::HTML - Extract Text From HTML Files River stage zero No dependents

DMAKI/File-Extract-0.07000 - 18 Nov 2007 13:35:51 UTC

HTML::Extract::CPANModules - Extract CPAN module names from an HTML document River stage one • 3 direct dependents • 3 total dependents

PERLANCAR/HTML-Extract-CPANModules-0.04 - 07 Mar 2016 04:27:25 UTC

Locale::TextDomain::OO::Extract::HTML River stage one • 1 direct dependent • 1 total dependent

This module extracts internationalization data from HTML. Implemented rules: Gettext::Loc <any_tag ... class="... loc_ ..." ... >text to extract< <any_tag ... class="... loc_ ..." ... >context{CONTEXT_SEPARATOR}text to extract< Gettext <any_tag ... c...

STEFFENW/Locale-TextDomain-OO-Extract-2.015 - 21 Sep 2018 13:11:01 UTC

HTML::ExtractText - extract multiple text strings from HTML content, using CSS selectors River stage one • 1 direct dependent • 1 total dependent

The module allows to extract [multiple] text strings from HTML documents, using CSS selectors to declare what text needs extracting. The module can either return the results as a hashref or automatically call setter methods on a provided object. If y...

ZOFFIX/HTML-ExtractText-1.002004 - 30 Oct 2016 17:06:41 UTC

HTML::ExtractMeta - Helper class for extracting useful meta data from HTML pages. River stage zero No dependents

HTML::ExtractMeta is a helper class for extracting useful metadata from HTML pages, like their title, description, authors etc....

TOREAU/HTML-ExtractMeta-0.21 - 11 Oct 2016 05:44:34 UTC

HTML::ExtractMain - Extract the main content of a web page River stage one • 1 direct dependent • 1 total dependent

ANIRVAN/HTML-ExtractMain-0.63 - 19 May 2013 15:39:27 UTC

HTML::ExtractContent - An HTML content extractor with scoring heuristics River stage one • 1 direct dependent • 1 total dependent

HTML::ExtractContent is a module for extracting content from HTML with scoring heuristics. It guesses which block of HTML looks like content according to scores depending on the amount of punctuation marks and the lengths of non-tag texts. It also gu...

TARAO/HTML-ExtractContent-0.12 - 30 Nov 2015 08:32:54 UTC

HTML::TableExtract - Perl module for extracting the content contained in tables within an HTML document, either as text or encoded element trees. River stage two • 24 direct dependents • 45 total dependents

HTML::TableExtract is a subclass of HTML::Parser that serves to extract the information from tables of interest contained within an HTML document. The information from each extracted table is stored in table objects. Tables can be extracted as text, ...

MSISK/HTML-TableExtract-2.15 - 25 May 2017 13:49:56 UTC

HTML::ExtractText::Extra - extra useful HTML::ExtractText River stage zero No dependents

The module offers extra options and post-processing that the vanilla HTML::ExtractText does not provide....

ZOFFIX/HTML-ExtractText-Extra-1.001003 - 20 Apr 2015 13:03:46 UTC

lib/HTML/ExtractContent/ River stage one • 1 direct dependent • 1 total dependent

TARAO/HTML-ExtractContent-0.12 - 30 Nov 2015 08:32:54 UTC

HTML::Quoted - extract structure of quoted HTML mail message River stage zero No dependents

Parses and extracts quotation structure out of a HTML message. Purpose and returned structures are very similar to Text::Quoted....

BPS/HTML-Quoted-0.05 - 11 Jul 2023 19:48:27 UTC

HTML::Feature - Extract Feature Sentences From HTML Documents River stage one • 1 direct dependent • 3 total dependents

This module extracst blocks of feature sentences out of an HTML document. Version 3.0, we provide three engines. 1. LDRFullFeed Use wedata's databaase that is compatible for LDR Full Feed. see -> ( Japanse only ) 2. Googl...

MIKI/HTML-Feature-3.00011 - 13 May 2010 07:33:28 UTC

HTML::LinkExtor - Extract links from an HTML document River stage four • 548 direct dependents • 3510 total dependents

*HTML::LinkExtor* is an HTML parser that extracts links from an HTML document. The *HTML::LinkExtor* is a subclass of *HTML::Parser*. This means that the document should be given to the parser by calling the $p->parse() or $p->parse_file() methods....

OALDERS/HTML-Parser-3.81 - 31 Jan 2023 03:13:18 UTC

HTML::RelExtor - Extract "rel" and "rev" information from LINK and A tags. River stage zero No dependents

HTML::RelExtor is a HTML parser module to extract relationship information from "A" and LINK HTML tags....

MIYAGAWA/HTML-RelExtor-0.03 - 12 Apr 2009 03:20:12 UTC

YAPE::HTML - Yet Another Parser/Extractor for HTML River stage zero No dependents

This module is yet another parser and tree-builder for HTML documents. It is designed to make extraction and modification of HTML documents simplistic. The API allows for easy custom additions to the document being parsed, and allows very specific ta...

PINYAN/YAPE-HTML-1.11 - 06 Feb 2001 06:23:48 UTC

HTML::Miner - This Module 'Mines' (hopefully) useful information for an URL or HTML snippet. River stage zero No dependents

TMHARISH/HTML-Miner-1.03 - 20 Jan 2013 08:53:50 UTC

HTML::Query - jQuery-like selection queries for HTML::Element River stage one • 2 direct dependents • 2 total dependents

The "HTML::Query" module is an add-on for the HTML::Tree module set. It provides a simple way to select one or more elements from a tree using a query syntax inspired by jQuery. This selector syntax will be reassuringly familiar to anyone who has eve...

KAMELKEV/HTML-Query-0.09 - 03 Sep 2014 03:06:39 UTC

HTML::Blitz - high-performance, selector-based, content-aware HTML template engine River stage zero No dependents

HTML::Blitz is a high-performance, CSS-selector-based, content-aware template engine for HTML5. Let's unpack that: * You want to generate web pages. Those are written in HTML5. * Your HTML documents are mostly static in nature, but some parts need to...

MAUKE/HTML-Blitz-0.09 - 03 Aug 2023 23:19:26 UTC

HTML::DublinCore - Extract Dublin Core metadata from HTML River stage zero No dependents

HTML::DublinCore is a module for easily extracting Dublin Core metadata that is embedded in HTML documents. The Dublin Core is a small set of metadata elements for describing information resources. Dublin Core is typically stored in the <HEAD> of and...

ESUMMERS/HTML-DublinCore-0.4 - 15 Nov 2004 19:02:36 UTC
