NAME

WWW::Wappalyzer - Perl port of Wappalyzer (https://wappalyzer.com)

DESCRIPTION

Uncovers the technologies used on websites: detects content management systems, web shops, web servers, JavaScript frameworks, analytics tools and many more.

Supports only `scriptSrc`, `scripts`, `html`, `meta`, `headers`, 'cookies' and `url` patterns of Wappalyzer specification. Lacks 'version', 'implies', 'excludes' support in favour of speed.

Categories: https://github.com/wappalyzer/wappalyzer/blob/master/src/categories.json Technologies: https://github.com/wappalyzer/wappalyzer/tree/master/src/technologies More info on Wappalyzer: https://github.com/wappalyzer/wappalyzer

SYNOPSIS

use WWW::Wappalyzer;
use LWP::UserAgent;
use List::Util 'pairmap';

my $response = LWP::UserAgent->new->get( 'http://www.drupal.org' );
my %detected = WWW::Wappalyzer->new->detect(
    html    => $response->decoded_content,
    headers => { pairmap { $a => [ $response->headers->header($a) ] } $response->headers->flatten },
);

# %detected = (
#   'Font scripts'    => [ 'Google Font API' ],
#   'Caching'         => [ 'Varnish' ],
#   'CDN'             => [ 'Fastly' ],
#   'CMS'             => [ 'Drupal' ],
#   'Video players'   => [ 'YouTube' ],
#   'Tag managers'    => [ 'Google Tag Manager' ],
#   'Reverse proxies' => [ 'Nginx' ],
#   'Web servers'     => [ 'Nginx' ],
# );

EXPORT

None by default.

SUBROUTINES/METHODS

new

my $wappalyzer = WWW::Wappalyzer->new( %params )

Constructor.

Available parameters:

categories   - optional additional categories array ref to files list (refer 'add_categories_files' below)
technologies - optional additional technologies array ref to files list (refer 'add_technologies_files' below)

Returns the instance of WWW::Wappalyzer class.

detect

my %detected = $wappalyzer->detect( %params )

Tries to detect CMS, framework, etc for given html code, http headers, URL.

Available parameters:

html    - HTML code of web page.

headers - Hash ref to http headers list. The value may be a plain string or an array ref
          of strings for a multi-valued field.
          Cookies should be passed in 'Set-Cookie' header.

url     - URL of web page.

cats    - Array ref to a list of trying categories names, defaults to all.
          Less categories - less CPU usage.

Returns the hash of detected applications by category:

(
    CMS  => [ 'Joomla' ],
    'Javascript frameworks' => [ 'jQuery', 'jQuery UI' ],
)

get_categories_names

my @cats = $wappalyzer->get_categories_names()

Returns the array of all application categories names.

add_categories_files

$wappalyzer->add_categories_files( @filepaths )

Puts additional categories files to a list of processed categories files. See lib/WWW/wappalyzer_src/categories.json as format sample.

add_technologies_files

$wappalyzer->add_technologies_files( @filepaths )

Puts additional techs files to a list of processed techs files. See lib/WWW/wappalyzer_src/technologies/a.json as format sample.

reload_files

$wappalyzer->reload_files()

Ask to reload data from additional categories and technologies files those may be changed in runtime.

AUTHOR

Alexander Nalobin, <alexander at nalobin.ru>

BUGS

Please report any bugs or feature requests to bug-www-wappalyzer at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=WWW-Wappalyzer. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

perldoc WWW::Wappalyzer

You can also look for information at:

ACKNOWLEDGEMENTS

LICENSE AND COPYRIGHT

Copyright 2013-2015 Alexander Nalobin.

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

See http://dev.perl.org/licenses/ for more information.