urhtml_score - Show complexity metric and other stats for web page
urhtml_score
urhtml_score [--html] [uri|file]
urhtml_score http://perl.org urhtml_score --html http://perl6.org
Given a URI or a file name, treats its referent as HTML and prints a complexity metric, the maximum element depth, and per-element statistics. The per-element statistics appear in rows, one per tag name. For each tag name, its row contains:
The maximum nesting depth of elements with that tag name. This is per-tag-name nesting depth, and does not take into account nesting within other elements with other tag names.
A count of the elements with that tag name in the document.
The total number of characters in elements with that tag name. Characters in nested elements are counted multiple times. For example, if a page contains a table within a table, characters in the inner table will be counted twice.
The average size of elements with this tag name, in characters.
The argument to urhtml_score can be either a URI or a file name. If it starts with alphanumerics followed by a colon, it is treated as a URI. Otherwise it is treated as file name. If the --html option is specified, the output is written as an HTML table.
--html
The complexity metric is the average depth (or nesting level), in elements, of a character, divided by the logarithm of the length of the HTML. Whitespace and comments are ignored in calculating the complexity metric. The division by the logarithm of the HTML length is based on the idea that, all else being equal, it is reasonable for the nesting to increase logarithmically as a web page grows in length.
Here is the first part of the output for http://perl.org.
http://perl.org
http://perl.org Complexity Score = 0.873 Maximum Depth = 12 Maximum Number of Size in Average Element Nesting Elements Characters Size a 1 56 3533 63 body 1 1 7615 7615 div 5 30 24695 823 em 1 1 13 13 h1 1 1 37 37 h4 1 11 559 50
With caution, the complexity metric can be used as a self-assessment of website quality. Well designed websites often have low numbers, particularly if fast loading is an important goal. But high values of the complexity metric do not necessarily mean low quality. Everything depends on what the mission is, and how well complexity is being used to serve the site's mission.
This program is a demo of a demo. It purpose is to show how easy it is to write applications which look at the structure of web pages using Marpa::UrHTML. And the purpose of Marpa::UrHTML is to demonstrate the power of its parse engine, Marpa. Marpa::UrHTML was written in a few days, and its logic is a straightforward, natural expression of the structure of HTML.
Jeffrey Kegler
Please report any bugs or feature requests to bug-parse-marpa at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Marpa. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
bug-parse-marpa at rt.cpan.org
You can find documentation for this module with the perldoc command.
perldoc Marpa
You can also look for information at:
AnnoCPAN: Annotated CPAN documentation
http://annocpan.org/dist/Marpa
CPAN Ratings
http://cpanratings.perl.org/d/Marpa
RT: CPAN's request tracker
http://rt.cpan.org/NoAuth/Bugs.html?Dist=Marpa
Search CPAN
http://search.cpan.org/dist/Marpa
The starting template for this code was HTML::TokeParser, by Gisle Aas.
Copyright 2007-2009 Jeffrey Kegler, all rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl 5.10.0.
To install Marpa::UrHTML, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Marpa::UrHTML
CPAN shell
perl -MCPAN -e shell install Marpa::UrHTML
For more information on module installation, please visit the detailed CPAN module installation guide.