NAME

Text::NumericData - parsing and writing of textual numeric data files

SYNOPSIS

use Text::NumericData;
my $c = new Text::NumericData;
my $line = "6e-6 3.4e7 123 0\n";
my $data = $c->line_data($line);

$data->[3] *= $data->[0]*$data->[1];

print $c->data_line($data);

DESCRIPTION

This module (class) contains the basic parsing structure for Text::NumericData. It is intended for use with numerical data sets in text files as commonly produced by data aquisition software - there often called "ASCII file". It's not about arbitary tabular data - the main intention are rows of numbers after some header that ideally contains information about what kind of data is in there. Simple text fields are supported, but beware of any separator-looking characters in there! The provided function for checking if there is data or header relies on the first data item being a number. Additionall, the following general header layout is assumed:

#file title
#comments
#comments
#...
#row titles with proper separators with or without quotes

Comments are optional, only a single non-empty header line results in both file and row titles to be deduced from the very same line.

MEMBERS

Some members of the hash repesenting an Text::NumericData object:

Methods (Functions)

  • line_check($line[, $onlycheck]) -> 0 or 1

    checks if $line appears to have some data (starts with a number followed by a separator or nothing) and scans for various stuff items as line ending, separator, file and data row titles... the scanning is disabled if $onlycheck is true; returns 1 if $line supposedly is containing data

  • line_data($line) -> \@data

    input: extract the @data of $line; the return value is undefined for empty lines

  • data_line(\@data, \@cols_include, \@cols_exclude) -> \$line

    output: have @data, produce $line

    This also optionally accepts an array reference for zero-based column indices to include and/or an array of columns to exclude. If both are given, the exclusion referes to the columns after selection by @cols_include. If you just want to exclude, call it like this:

    $c->data_line(\@data, undef, \@cols_exclude)
  • comment_line($comment) -> \$line

    output: have bare comment, produce full $line

  • title_line(\@colnumbers, \@exclude_colnumbers) -> \$line

    output: form a proper line with the titles matching the specified columns (starting at 0 - plain array index!) or a line for all columns if @colnumbers is unspecified.

    You can also hand in a list of columns to exclude, just like with line_data():

    $c->title_line(undef, \@exclude_colnumbers)

    Same logic applies if you specify both.

  • chomp_line($line)

    Just what you would expect from a chomp()... making sure that really any kind of line ending (that matches the internal regex) is chopped off; modifies the input directly

  • make_naked($line)

    chomp + remove comment characters

Data

  • title

    file title

  • comments

    array ref of comment lines (w/o comment character(s) and line ending)

  • titles

    array ref of data row titles

CONFIGURATION

You can provide a hash reference with configuration data to the constructor to tweak some aspects of the created object. They have (mostly) sensible defaults and/or are normally deduced from the input data (p.ex. line end, separator) if possible. You can always check the active configuration by looking at the $c->{config} hash reference.

The module takes deep copies of only the hash elements that correspond to internal parameters and does not modify the given hash in any way; so you can rest assured that handing in some program-specific config hash with lots of additional settings that you also want to work on does not pose a problem.

The different parameters fall roughly into two categories:

Parsing

These influence how the data is parsed. A considerable amount of work went into the regexes... to change them properly you better should know and understand the source code of this module.

  • numregex

    the regular expression a number has to match (I really hope that the default is reasonable!)

  • comregex

    regex for beginning of comment line (additional to not starting with a number)

  • strict [0/1]

    just split at every separator occurence to get the data array (otherwise there is some more fuzzy logic with treating multiple separation characters as one separation)

  • text [0/1]

    allow text in data field in non-strict mode (no effect in strict mode)

  • fill [0/1]

    specify a value to fill in if non-present data is demanded (when invoking line_data())

Output

  • separator

    The separator to use for data fields/columns; when strict is set it is also used for the actual data parsing.

  • lineend

    line ending to use

  • comchar

    character(s) to put in front of comment lines

  • numformat

    listref with formats for numbers as sprintf format strings (p.ex. %02d), one list entry is for one data row

SEE ALSO

For computations on multi-dimensional arrays in general, there is the mighty Perl Data Language, PDL. For many day-to-day applications, the tools based on Text::NumericData are quickly applied with shell one-liners. But if the time for number crunching itself becomes important, or you need more complex operations, you can easily create PDL data structures from the parsed data arrays yourself. Also, see PDL::IO::Misc for direct handling of ASCII data with PDL.

AUTHOR

Thomas Orgis <thomas@orgis.org>

COPYRIGHT AND LICENSE

Copyright (C) 2004-2023, Thomas Orgis.

This module is free software; you can redistribute it and/or modify it under the same terms as Perl 5.10.0. For more details, see the full text of the licenses in the directory LICENSES.

This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose.