The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

  alvisXMLsplit -- splits a big file into pieces in a directory for easier processing.

SYNOPSIS

  alvisXMLsplit [--bzip2] [--start N] <Alvis XML file> <N per file> <out-dir>

DESCRIPTION

--bzip2 Split a large file into N documentRecords per file into a directory. Both input and output are bzip2'ed

--start N Begin output at N.xml instead of 1.xml

Script to split a big file into pieces in a directory for easier processing. Algorithm is simple, but a bit slow because each document is built up in memory before being dumped, and this is not efficient in Perl.

Output file is UTF8 and Perl friendly, so one <documentRecord> or </documentRecord> per line to facilitate processing.

AUTHOR

Wray Buntine

COPYRIGHT AND LICENSE

Copyright (C) 2006 Wray Buntine

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.

SEE ALSO

alvisSource.pl, alvisSink.pl, alvisXMLsplit.pl, alvisXMLjoin.pl