HTML::ContentExtractor - extract the main content from a web page by analysising the DOM tree!

Web pages often contain clutter (such as ads, unnecessary images and extraneous links) around the body of an article that distracts a user from actual content. This module is used to reduce the noise content in web pages and thus identify the content...

JZHANG/HTML-ContentExtractor-0.03 - 23 Jun 2007 01:36:57 GMT - Search in distribution

HTML::Content::Extractor - Recieving a main text of publication from HTML page and main media content that is bound to the text

This module analyzes an HTML document and extracts the main text (for example front page article contents on the news site) and all related images....

LASTMAC/HTML-Content-Extractor-0.17 - 03 Nov 2013 20:04:19 GMT - Search in distribution