NAME
Text::NumericData - parsing and writing of textual numeric data files
SYNOPSIS
use Text::NumericData;
my $c = new Text::NumericData;
my $line = "6e-6 3.4e7 123 0\n";
my $data = $c->line_data($line);
$data->[3] *= $data->[0]*$data->[1];
print $c->data_line($data);
DESCRIPTION
This module (class) contains the basic parsing structure for Text::NumericData. It is intended for use with numerical data sets in text files as commonly produced by data aquisition software - there often called "ASCII file". It's not about arbitary tabular data - the main intention are rows of numbers after some header that ideally contains information about what kind of data is in there. Simple text fields are supported, but beware of any separator-looking characters in there! The provided function for checking if there is data or header relies on the first data item being a number. Additionall, the following general header layout is assumed:
#file title
#comments
#comments
#...
#row titles with proper separators with or without quotes
Comments are optional, only a single non-empty header line results in both file and row titles to be deduced from the very same line.
MEMBERS
Some members of the hash repesenting an Text::NumericData object:
Methods (Functions)
line_check($line[, $onlycheck]) -> 0 or 1
checks if $line appears to have some data (starts with a number followed by a separator or nothing) and scans for various stuff items as line ending, separator, file and data row titles... the scanning is disabled if $onlycheck is true; returns 1 if $line supposedly is containing data
line_data($line) -> \@data
input: extract the @data of $line; the return value is undefined for empty lines
data_line(\@data, \@cols_include, \@cols_exclude) -> \$line
output: have @data, produce $line
This also optionally accepts an array reference for zero-based column indices to include and/or an array of columns to exclude. If both are given, the exclusion referes to the columns after selection by @cols_include. If you just want to exclude, call it like this:
$c->data_line(\@data, undef, \@cols_exclude)
comment_line($comment) -> \$line
output: have bare comment, produce full $line
title_line(\@colnumbers, \@exclude_colnumbers) -> \$line
output: form a proper line with the titles matching the specified columns (starting at 0 - plain array index!) or a line for all columns if @colnumbers is unspecified.
You can also hand in a list of columns to exclude, just like with line_data():
$c->title_line(undef, \@exclude_colnumbers)
Same logic applies if you specify both.
chomp_line($line)
Just what you would expect from a chomp()... making sure that really any kind of line ending (that matches the internal regex) is chopped off; modifies the input directly
make_naked($line)
chomp + remove comment characters
Data
title
file title
comments
array ref of comment lines (w/o comment character(s) and line ending)
titles
array ref of data row titles
CONFIGURATION
You can provide a hash reference with configuration data to the constructor to tweak some aspects of the created object. They have (mostly) sensible defaults and/or are normally deduced from the input data (p.ex. line end, separator) if possible. You can always check the active configuration by looking at the $c->{config} hash reference.
The module takes deep copies of only the hash elements that correspond to internal parameters and does not modify the given hash in any way; so you can rest assured that handing in some program-specific config hash with lots of additional settings that you also want to work on does not pose a problem.
The different parameters fall roughly into two categories:
Parsing
These influence how the data is parsed. A considerable amount of work went into the regexes... to change them properly you better should know and understand the source code of this module.
numregex
the regular expression a number has to match (I really hope that the default is reasonable!)
comregex
regex for beginning of comment line (additional to not starting with a number)
strict [0/1]
just split at every separator occurence to get the data array (otherwise there is some more fuzzy logic with treating multiple separation characters as one separation)
text [0/1]
allow text in data field in non-strict mode (no effect in strict mode)
fill [0/1]
specify a value to fill in if non-present data is demanded (when invoking line_data())
Output
separator
The separator to use for data fields/columns; when strict is set it is also used for the actual data parsing.
lineend
line ending to use
comchar
character(s) to put in front of comment lines
numformat
listref with formats for numbers as sprintf format strings (p.ex. %02d), one list entry is for one data row
SEE ALSO
For computations on multi-dimensional arrays in general, there is the mighty Perl Data Language, PDL. For many day-to-day applications, the tools based on Text::NumericData are quickly applied with shell one-liners. But if the time for number crunching itself becomes important, or you need more complex operations, you can easily create PDL data structures from the parsed data arrays yourself. Also, see PDL::IO::Misc for direct handling of ASCII data with PDL.
AUTHOR
Thomas Orgis <thomas@orgis.org>
COPYRIGHT AND LICENSE
Copyright (C) 2004-2023, Thomas Orgis.
This module is free software; you can redistribute it and/or modify it under the same terms as Perl 5.10.0. For more details, see the full text of the licenses in the directory LICENSES.
This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose.