HTML::ContentExtractor - extract the main content from a web page by analysising the DOM tree!

Web pages often contain clutter (such as ads, unnecessary images and extraneous links) around the body of an article that distracts a user from actual content. This module is used to reduce the noise content in web pages and thus identify the content...

JZHANG/HTML-ContentExtractor-0.03 - 23 Jun 2007 01:36:57 GMT - Search in distribution

HTML::Content::ContentExtractor - Perl module for extracting content from HTML documents.

HTML::Content::ContentExtractor attempts to extract the content from HTML documents. It attempts to remove tags, scripts and boilerplate text from the documents by trying to find the region of the HTML document that has the maximum ratio of words to ...

JTAVERNI/HTML-Content-Extractor-0.01 - 22 Aug 2005 03:38:43 GMT - Search in distribution

2 results (0.159 seconds)