NAME

HTML::RobotsMETA - Parse HTML For Robots Exclusion META Markup

SYNOPSIS


            
              
              use HTML::RobotsMETA;
my $p = HTML::RobotsMETA->new;
my $r = $p->parse_rules($html);
if ($r->can_follow) {
  # follow links here!
} else {
  # can't follow...
}

DESCRIPTION

HTML::RobotsMETA is a simple HTML::Parser subclass that extracts robots exclusion information from meta tags. There's not much more to it ;)

DIRECTIVES

Currently HTML::RobotsMETA understands the following directives:

ALL
NONE
INDEX
NOINDEX
FOLLOW
NOFOLLOW
ARCHIVE
NOARCHIVE
SERVE
NOSERVE
NOIMAGEINDEX
NOIMAGECLICK

METHODS

new

Creates a new HTML::RobotsMETA parser. Takes no arguments

parse_rules

Parses an HTML string for META tags, and returns an instance of HTML::RobotsMETA::Rules object, which you can use in conditionals later

parser

Returns the HTML::Parser instance to use.

get_parser_callbacks

Returns callback specs to be used in HTML::Parser constructor.

TODO

Tags that specify the crawler name (e.g. <META NAME="Googlebot">) are not handled yet.

There also might be more obscure directives that I'm not aware of.

AUTHOR

LICENSE

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

See http://www.perl.com/perl/misc/Artistic.html

To install HTML::RobotsMETA, copy and paste the appropriate command in to your terminal.

cpanm

cpanm HTML::RobotsMETA

CPAN shell

perl -MCPAN -e shell
install HTML::RobotsMETA

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)