The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

LaBrea::Tarpit::Get

SYNOPSIS

  use LaBrea::Tarpit::Get;

  ($rv,$host,$port,$path)=parse_http_URL($url)
  ($handle,$host,$port,$path)=open_http(*S,$url);
  $rv=parse_http_response(\$buffer,\%response);
  $rv=short_response($url,\%response,\%content,$timeout);
  $line = make_line($url,$err,\%content);
  $rv = not_hour($file);
  $rv = not_day($file);
  $rv=auto_update($url,$file,$cur_ver,$timeout);

DESCRIPTION - LaBrea::Tarpit::Get

Module connects to a web site running LaBrea::Tarpit::Report::html_report.plx and retrieves a short_report as described in LaBrea::Tarpit::Report.

Run examples/web_scan.pl from a cron job hourly or daily to update the statistics from all know sites running LaBrea::Tarpit. A report can then be generated showing the activity worldwide.

 # MIN HOUR DAY MONTH DAYOFWEEK   COMMAND
 30 * * * * ./web_scan.pl ./other_sites.txt ./tmp/site_stats

See: LaBrea::Tarpit::Report::other_sites

($handle,$host,$port,$path)= parse_http_URL($url);

Separate an http URL into its components

  input:        URL of the form
        http://www.foo.com[:8080]/file.html

  https:// service is not supported

  returns: (undef, error message)
                or
           (file_handle,hostname,port,path)
        where port and path may be empty
($handle,$host,$port,$path)=open_http(*S,$url);

Open connection to http target

  input:        *S,$url [default port = 80]
  returns:      (undef, error)  on error

                (file_handle,
                 hostname,
                 port
                 path )         on success
$rv=parse_http_response(\$buffer,\%response);

Parse an http server response into a hash of headers.

  i.e.  (representative, will vary)

  rc             => 200
  msg            => OK
  date           => Wed, 24 Apr 2002 21:46:30 GMT
  server         => Apache/1.3.22
  protocol       => HTTP/1.1
  content-type   => text/plain
  content-length => 92
  last-modified  => Wed, 24 Apr 2002 21:46:34 GMT
  expires        => Wed, 24 Apr 2002 21:47:04 GMT
  connection     => close
  content        => (complete text buffer)

  input:        \$text_in, \%response
  returns:      true on success, %response filled
                false on failure

  NOTE:         %response{rc}   (server response code)
                %response(msg}  (server messages)
                are ALWAYS filled with something.
                In the case of server failure, the 
                cause of the failure will be inserted
                into %response(msg} and undef returned.
$rv=short_response($url,\%response,\%content,$timeout);

Fetch the short report from $url and place the headers in %response, the content, parsed, in %content. Optional $timeout, default is 60 seocnds.

%response contains http headers

%content contains key => value pairs

  LaBrea        => version
  Tarpit        => version
  Report        => version
  Util          => version
  now           => seconds since epoch (local)
  tz            => time zone (i.e. -0700)
  threads       => number of threads
  total_IPs     => total IP's
  bw            => bandwidth

  input:        URL,    # complete url
           i.e. www.foo.com/html_report.plx
                \%response,
                \%content,

  returns:      false on success
                error message on failure
$line = make_line($url,$err,\%content);

Make a line of text summarizing the short report where $err is the return value from short_report

  Format:

  url threads total_IPs bw time tz version:nn:nn:nn
    or
  url error message
$rv = not_hour($file);

Check if the file has been accessed this hour;

  input:        path/to/file
  returns:      true, not current hour
                false if accessed this hour
                or non-existent or not readable
$rv = not_day($file);

Check if the file has been accessed this day;

  input:        path/to/file
  returns:      true, not accessed this day
                false if accessed this day
                or non-existent or not readable
$rv=auto_update($url,$file,$cur_ver,$timeout);

Update the 'other_sites.txt' file from $url on a daily basis only.

  input:  url,  # complete url to 'other_sites.txt'
        # http://scans.bizsystems.net/other_sites.txt

          file, # path to your 'other_sites.txt'

          cur_ver       # optional current version
        # the current file will be opened and scanned
        # if this is not supplied

          timeout       # wait for http response        
        # default 60 seconds
  returns:      false on success or no update needed
                error msg on failure

EXPORT_OK

        parse_http_URL
        open_http
        parse_http_response
        short_response
        make_line
        not_hour
        not_day
        auto_update

COPYRIGHT

Copyright 2002 - 2004, Michael Robinton & BizSystems This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

AUTHOR

Michael Robinton, michael@bizsystems.com

SEE ALSO

perl(1), LaBrea::Tarpit(3), LaBrea::Codes(3), LaBrea::Tarpit::Report(3), LaBrea::Tarpit::Util(3), LaBrea::Tarpit::DShield(3)