NAME
HTML::WordTagRatio::RelativeRatio - Perl module for determining the ratio of words to tags in a range of tokens in an HTML document.
SYNOPSIS
my
$tokenizer
= new HTML::Content::HTMLTokenizer(
'TAG'
,
'WORD'
);
open
(HTML,
"index.html"
);
my
$doc
=
join
(
""
,<HTML>);
close
(HTML);
my
(
$word_count_arr_ref
,
$tag_count_arr_ref
,
$token_type_arr_ref
,
$token_hash_ref
) =
$tokenizer
->Tokenize(
$doc
);
my
$ratio
= new HTML::WordTagRatio::RelativeRatio();
my
$value
=
$ratio
->RangeValue(0,
@$word_count_arr_ref
,
$word_count_arr_ref
,
$tag_count_arr_ref
);
DESCRIPTION
HTML::WordTagRatio::RelativeRatio computes a ratio of Words to Tags for a given range. In psuedo code, the ratio is
Words/(Words + Tags)
Methods
my $ratio = new HTML::WordTagRatio::RelativeRatio()
Initializes HTML::WordTagRatio::RelativeRatio
my $value = $ratio->RangeValue($start, $end, \@WordCount, \@TagCount)
$value is computed as follows:
(
$WordCount
[
$end
] -
$WordCount
[
$start
])/((
$WordCount
[
$end
] -
$WordCount
[
$start
]) + (
$TagCount
[
$end
] -
$TagCount
[
$start
]))
$WordCount[$i] is the number of word tokens before or at the ith token in the input HTML document. $TagCount[$i] is the number of tag tokens before or at the ith token in the input HTML document.
AUTHOR
Jean Tavernier (jj.tavernier@gmail.com)
COPYRIGHT
Copyright 2005 Jean Tavernier. All rights reserved.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
SEE ALSO
ContentExtractorDriver.pl (1), HTML::Content::ContentExtractor (3), HTML::Content::HTMLTokenizer (3), HTML::WordTagRatio::Ratio (3),HTML::WordTagRatio::WeightedRatio (3), HTML::WordTagRatio::SmoothedRatio (3), HTML::WordTagRatio::ExponentialRatio (3), HTML::WordTagRatio::NormalizedRatio (3).