The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

arclog - Archive the log files monthly

SYNOPSIS

 arclog [options] logfile... [output]
 arclog [-h|-v]

DESCRIPTION

arclog archives the log files monthly. It strips off log entries that belongs to previous months, and then compresses and saves them to archived files named logfile.yyyymm.gz.

Currently, arclog supports Apache access log, Syslog, NTP, Apache 1 SSL engine log and my own bracketed, modified ISO date/time log file formats, and gzip and bzip2 compression methods. Several software projects log (or can log) in a format compatible with the Apache access log, like CUPS, ProFTPD, Pure-FTPd... etc., and arclog can archive their Apache-like log files, too.

Notice: Archiving takes time. To reduce the time occupying the source log file, arclog copies the content of the source log file to a temporary working file and restart the source log file first. Then arclog can take its time working on the temporary working file. However, please note:

1. If you have a huge log file (several hundreds of MBs), merely copying still takes a lot of time. In that case, you had better stop logging first, archive the log file and restart logging, to avoid racing condition in writing. If you archive the log file periodly, it shall not grow too big.

2. If arclog stops in the middle of the execution, it will leave a temporary working file. The next time arclog runs, it will stop when it sees that temporary working file. You have to process that temporary working file first. That temporary working file is merely a copy of the original log file. You can rename and archive it like an ordinary log file to solve this.

Do not sort unless you have a particular reason. Sorting has the following potential problem:

1. Sorting may eat huge memory on large log files. The amount of the memory required depends on the number of records in each archived month. Modern Linux and MSWin32 have memory consuming protection by killing processes that eats too much memory, but it still takes minutes, and your system will hang during that time. I do not know the memory consuming protection on other operating systems. If you try, you are at your own risk.

2. The time units of all recognized log formats are second. Log records happen in a same second will be sorted by the log file order (if you are archiving several log files at a time) and then the log record order. I try to ensure that the sorted archived records are in a correct order of the happening events, but I cannot guarantee. You have to watch out if the order in a second is important.

Be careful on the Syslog(2) and NTP log files: Syslog(2) and NTP does not record the year. arclog uses Date::Parse(3) to parse the date, which assumes the year between this month and last next month if the year is missing. For ex., if today is 2001-06-08, it will then assume the year between 2001-06-30 back to 2000-07-01 if the year is missing. I think this is smart enough. However, if you do have a Syslog(2) or NTP log file that has records older than one year, do not use arclog. It will destroy your log file.

If read from STDIN, please note:

1. You MUST specify the output prefix if you want to read from STDIN, since what it needs is an output pathname prefix, not an output file.

2. STDIN cannot be deleted, restarted or partially kept. If you read from STDIN, the keep mode will fall back to keep all. if you archive several source log files including STDIN, the keep mode will fall back to keep all for all source log files, to prevent disaster.

3. The answers of the ask mode is obtained from STDIN, too. Since you have only one STDIN, you cannot specify the ask mode while reading from STDIN. It will fall back to the fail mode in that case.

I suggest you to install File::MMagic(3) instead of counting on the file executable. The internal magic file of File::MMagic(3) seems to work better than the file(1) executable. arclog treats everything not gzip(1) nor bzip2(1) compressed as plain text. When a compressed log file is wrongly recognized as an image, arclog will treat it as plain text, read log records directly from it and fail. This failure does not hurt the source log files, but is still annoying.

OPTIONS

logfile

The log file to be archived. Specify - to read from STDIN. Multiple log files are supported. gzip(1) or bzip2(1) compressed files are supported, too.

output

The prefix of the output files. The output files will be named as output.yyyymm, ie: output.200101, output.200101. If not specified, the default is the same as the log file. You must specify this if you want to read from STDIN. You cannot specify - (STDIN), since this is only a name prefix, not the output file.

-c,--compress method

Specify the compression method for the archived files. Log files usually have large number of simular lines. Compress them saves you lots of disk spaces. (And this is why we want to archive them.) Currently the following compression methods are supported:

g,gzip

Compress with gzip(1). This is the default. arclog can use Compress::Zlib(3) to compress instead of calling gzip(1). This can be safer and faster for not calling foreign binaries. But if Compress::Zlib(3) is not installed, it will try to use gzip(1) instead. If gzip(1) is not available, either, the program will fail.

b,bzip2

Compress with bzip2(1). arclog can use Compress::Bzip2(3) to compress instead of calling bzip2(1). This can be safer and faster for not calling foreign binaries. But if Compress::Bzip2(3) is not installed, it will try to use bzip2(1) instead. If bzip2(1) is not available, either, the program will fail.

n,none

No compression at all. (Why? :p)

--nocompress

Do not compress the archived files. This is equal to --compress none.

-s,--sort

Sort the records by time (and then the record order). Sorting eats huge memory and CPU, so it is disabled by default. See the description above for a detailed illustration on sorting.

--nosort

Do not sort the records. This is the default.

-o,--override mode

Whether we should overwrite the existing archived files. Currently the following modes are supported:

o,overwrite

Overwrite existing target files. You will lost these existing records. Use with care. This is helpful if you are sure the master log file has the most complete records.

a,append

Append the records to the existing target files. You may destroy the log file completely by putting irrelevant entries altogether accidently. Use with care. This is helpful if you append want to merge 2 or more log files, for ex., 2 log files of different periods.

i,ignore

Ignore any existing target file, and discard all the records of those months. You will lost these log records. Use with care. This is helpful if you are supplying log records for the missing months, or if you are merging the log records in a complex manner.

f,fail

Stop processing whenever a target file exists, to prevent destroying existing files by accident. This should be mostly wanted when run from some automatic mechanism, like crontab(1). So, this is the default if no terminal is found at STDIN.

ask

Ask you what to do when a target file exists. This should be most wanted if you are running arclog interactively. So, this is the default if a terminal is found at STDIN. The answers are read from STDIN. Since you have only one STDIN, you cannot specify this mode if you want read the log file from STDIN. In that case, it will fall back to the <samp>fail</samp> mode. Also, if arclog cannot get its answer from STDIN, for ex., on a closed STDIN like crontab(1), it will fall back to fail mode.

-k,--keep mode

What to keep in the source file. Currently the following modes are supported:

a,all

Keep the source file after records are archived.

r,restart

Restart the source file after records are archived.

d,delete

Delete the source file after records are archived.

t,this-month

Archive and strip records of previous months off from the log file. Keep the records of this month in the source log file, to be archived next month. This is designed to be run from crontab(1) monthly, so this is the default.

-d, --debug

Show the detailed debugging messages.

-q, --quiet

Shihhhhhh. Only yell when errors.

-h, --help

Display the help message and exit.

-v, --version

Output version information and exit.

COPYRIGHT

Copyright (c) 2001-2007 imacat. All rights reserved.

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

AUTHOR

imacat <imacat@mail.imacat.idv.tw>. Please visit arclog's websites at http://arclog.sourceforge.net/ and http://www.imacat.idv.tw/tech/arclog.html .

BUGS

arclog has a mailing list at SourceForge: arclog-users@lists.sourceforge.net. It is for arclog's users to discuss and report problems. Its web page is at http://lists.sourceforge.net/lists/listinfo/arclog-users . If you have any problem or question on arclog, please go to this page, join the list, and send your questions on this list. Thank you.

TODO

Multi-lingual support

Support multi-lingual, either with Text::Iconv(3) or perl 5.8.0's Encode(3).

SEE ALSO

gzip(1), zlib(3), Compress::Zlib(3), bzip2(1), Compress::Bzip2(3), syslog(2)