The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Apache::Logmonster - Apache log file splitter, processor, sorter, etc

AUTHOR

Matt Simerson (matt@tnpi.net)

SUBROUTINES

new

Creates a new Apache::Logmonster object. All methods in this module are Object Oriented and need to be accessed through the object you create. When you create a new object, you must pass as the first argument, a hashref of values from logmonster.conf. See t/Logmonster.t for a working example.

check_awstats_file

Checks to see if /etc/awstats is set up for awstats. If not, it creates it and installs a default awstats.conf. Finally, it makes sure the $domain it was passed has an awstats file configured for it. If not, it installs it.

check_config

perform some basic sanity tests on the environment Logmonster is running in. It will complain quite loudly if it finds things not to its liking.

check_stats_dir

Each virtual host that gets stats processing is expected to have a "stats" dir. I name mine "stats" and locate in the vhosts document root. I set the files ownership to root so that the user doesn't inadvertantly delete it via FTP. After splitting up the log files based on vhist, this sub first goes through the list of files in $tmpdir/doms. If the file name matches the vhost name, the contents of that log correspond to that vhost.

If the file is zero bytes, it deletes it as there is nothing to do.

Otherwise, it gathers the vhost name from the file and checks the %domains hash to see if a directory path exists for that vhost. If no hash entry is found or the entry is not a directory, then we declare the hits unmatched and discard them.

For log files with entries, we check inside the docroot for a stats directory. If no stats directory exists, then we discard those entries as well.

compress_log_file

Compresses a file. Does a test first to make sure the file exists and then compresses it using gzip. You pass it a hostname and a file and it compresses the file on the remote host. Uses SSH to make the connection so you will need to have key based authentication set up.

consolidate_logfile

Collects compressed log files from a list of servers into a working directory for processing.

feed_the_machine

feed_the_machine takes the sorted vhost logs and feeds them into the stats processor that you chose.

fetch_log_files

extracts a list of hosts from logmonster.conf, checks each host for log files and then downloads them all to the staging area.

get_domains_list

checks your vhosts setting in logmonster.conf to determine where to find your Apache vhost information, and then parses your Apache config files to retrieve a list of the virtual hosts you server for as well as some attributes about each vhost (docroot, aliases).

If successful, it returns a hashref of elements.

get_domains_list_from_directories

Determines your list of domain and domain aliases based on presense of directories and symlinks on the file system. See the FAQ for details.

get_vhosts_from_file

Parses a file looking for virtualhost declarations. It stores several attributes about each vhost including: servername, serveralias, and documentroot as these are needed to determine where to output logfiles and statistics to.

returns a hashref, keyed with the vhost servername. The value of the top level hashref is another hashref of attributes about that servername.

get_log_dir

Determines where to fetch an intervals worth of logs from. Based upon the -i setting (hour,day,month), this sub figures out where to find the requested log files that need to be processed.

install_default_awstats_conf

Installs /etc/awstats.awstats.conf

report_hits

report_hits reads a days log results file and reports the results to standard out. The logfile contains key/value pairs like so:

    matt.simerson:4054
    www.tnpi.biz:15381
    www.nictool.com:895

This file is read by logmonster when called in -r (report) mode and is expected to be called via a SNMP agent.

report_close

Accepts a filehandle, which it then closes.

report_spam_hits

Appends information about referrer spam to the logmonster -v report. An example of that report can be seen here: http://www.tnpi.net/wiki/Referral_Spam

report_open

In addition to emailing you a copy of the report, Logmonster leaves behind a copy in the log directory. This file is ready when logmonster -r is run (typically by snmpd). This function simply opens the report and returns the filehandle.

sort_vhost_logs

By now we have collected the Apache logs from each web server and split them up based on vhost. Most stats processors require the logs to be sorted in cronological order. So, we open up each vhosts logs for the day, read them into a hash, sort them based on their log entry date, and then write them back out.

split_logs_to_vhosts

After collecting the log files from each server in the cluster, we need to split them up based upon the vhost they were intended for. This sub does that.

turn_domains_into_sort_key

From the info in $domains_ref, creates a hash like this:

  example.com => 'example.com',
  example.net => 'example.com',
  example.org => 'example.com',

as we parse through the log files, we do a lookup on this hash to see which logfile to write the entries out to. In theory, this is not necessary as we have appended the vhost name to the log entry, but this is absolutely required for the fallback method.

BUGS

None known. Report any to author.

TODO

Support for analog.

Support for individual webalizer.conf file for each domain

Delete log files older than X days/month

Do something with error logs (other than just compress)

If files to process are larger than 10MB, find a nicer way to sort them rather than reading them all into a hash. Now I create two hashes, one with data and one with dates. I sort the date hash, and using those sorted hash keys, output the data hash to a sorted file. This is necessary as wusage and http-analyze require logs to be fed in chronological order. Take a look at awstats logresolvemerge as a possibility.

SEE ALSO

http://www.tnpi.biz/internet/www/logmonster

COPYRIGHT

Copyright (c) 2003-2006, The Network People, Inc. All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

Neither the name of the The Network People, Inc. nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DIS CLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.