alvisXMLsplit -- splits a big file into pieces in a directory for easier processing.
alvisXMLsplit [--bzip2] [--start N] <Alvis XML file> <N per file> <out-dir>
--bzip2 Split a large file into N documentRecords per file into a directory. Both input and output are bzip2'ed
--start N Begin output at N.xml instead of 1.xml
Script to split a big file into pieces in a directory for easier processing. Algorithm is simple, but a bit slow because each document is built up in memory before being dumped, and this is not efficient in Perl.
Output file is UTF8 and Perl friendly, so one <documentRecord> or </documentRecord> per line to facilitate processing.
Copyright (C) 2006 Wray Buntine
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
alvisSource.pl, alvisSink.pl, alvisXMLsplit.pl, alvisXMLjoin.pl