The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

FileMetadata::Miner::HTML

SYNOPSIS

  use FileMetadata::Miner::HTML;

  my $miner = FileMetadata::Miner::HTML->new ({});

  my $meta = {};

  print "TITLE: $meta->{'title'}" if $miner->mine ('ex.html', $meta);

DESCRIPTION

This module extracts metadata from HTML files. The only tags of interest are the <TITLE> and the <META> tags withing the <HEAD> tag within the <HTML> tag. The HTTP-EQUIV attribute describes metadata with operational significance to the HTTP protocol and is hence ignored by this module.

This method implements interfaces for the FileMetadata framework but can be used independently.

METHODS

new

See "new" in FileMetadata::Miner

This module does not accept any config options.

mine

See "mine" in FileMetadata::Miner

The mine method extracts the 'title' and 'meta' information from a HTML document. The following keys are inserted in the meta hash.

1. FileMetadata::Miner::HTML::title - Test enclosed by the <title> tags

2. FileMetadata::Miner::HTML::* - where * is the value of the 'name' attribute to a meta tag and the value of this key is the value of the content attribute.

If a meta tag is present with the value of name set to 'title', the value of the FileMetadata::Miner::HTML key is the determined from the latter occureence.

VERSION

1.0 - This is the first release

REQUIRES

HTML::Parser

AUTHOR

Midh Mulpuri midh@enjine.com

LICENSE

This software can be used under the terms of any Open Source Initiative approved license. A list of these licenses are available at the OSI site - http://www.opensource.org/licenses/