The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Text::NumericData::File - process a whole file with text data

SYNOPSIS

        use Text::NumericData::File;

        #read $filename on construction
        my $file = new Text::NumericData::File(\%config,$filename);

        #create a fresh object without contents...
        my $file2 = new Text::NumericData::File(\%config);
        #...and read the file afterwards
        $file2->read_all($filename);    

        print "third value of fourth data set: ",$file->{data}->[3][2],"\n";    

DESCRIPTION

This is a subclass of Text::NumericData::Lines, so all properties are still there and only some sugar is added. It abstracts a file and contains all data for instant access (and memory consumption;-). There is the possibility to address the data sets indexed via an arbitrary colum and to do some interpolation between points in this index. A word on this index: It connects one value of the index column with the first data set it occured in - other sets containing this value are not concerned!

MEMBERS

Methods

  • init

    erases data and parsed config stuff from memory; leaving only in_file and out_file untouched

  • read_head($file)

    Just read head part of file (titles), closing file afterwards or slurping the header including first data line into the buffer.

  • read_all($file)

    Read the data from the filename provided on construction or from $file if defined (where the internal value for the associated filename is set to $file). If $file is the empty string, data is read from STDIN.

    This wraps around pipe operation using Text::ASCIIPipe. The return codes are identical to Text::ASCIIPipe::pull_file, i.e. <0 is error, =0 is fine, but no more files to expect, >0 fine, expect more files via pipe.

  • write_all($file,\@selection)

    writes the header and all data or columns in @selection to $file (use WriteFile(undef, \@selection) to write to the internally remembered output file from a previous run).

  • write_header($handle, \@selection)

    writes a header to file handle $handle, tries to provide the appropriate column titles as last line when @selection is defined. Apart from this possibly constructed last line the header here is just the raw header read from the input file - including the original column titles if they were there.

  • write_data($handle, \@selection)

    writes only the data part to file handle

  • write_new_header($handle,\@selection)

    constructs a new header according to the selected columns, preserving the title and comments from the old header but putting it in current comment style and omitting the obsolete column title line, and writes it to $handle.

  • set_of($value,$indexcolumn) -> \@dataset

    returns the data set (line) corresponding to $value in the index of $indexcolumn (0 is default if not specified), employing interpolation if configured (default is linear interpolation)

  • set_of_noint($value,$indexcolumn) -> \@dataset

    does the same as above but prevents any interpolation (so, prepare to get nothing).

  • y($xvalue,$x,$y) -> $yvalue

    returns a specific "Y" value to the "X" value $xvalue with the optional parameters $x and $y telling which columns we mean with X and Y - they default to 0 and 1; so the name of this method actually makes sense with files that contain X-Y data. This includes interpolation just like set_of().

  • y_noint($xvalue,$x,$y) -> $yvalue

    ...see set_of_noint; just the same for y()

  • neighbours($val,$x) -> \($n1,$n2)

    searches two neighbouring points in column $x suitable for interpolation for $val.

  • compute_index(\@columns)

    computes indexes (hash) with elements of the named columns as keys and the according data set (row) as value. Only the first found occurrence of the key in the data makes it into the index! This sets $file->{data_index}.

  • compute_sorted_index(\@columns)

    computes the sorted indices for @columns (and the normal indices before that if necessary)

  • get_sorted_data($column) -> \@sorted_data

    returns ref to array of all data sets that made it into the index for $column in ascending order in respect to $column.

  • max($formula) -> $maxval

    computes the maximum value of the given formula over all data sets

  • min($formula) -> $minval

    computes the minimum value of the given formula over all data sets

  • max_val($col) NOT IMPLEMENTED

    This may be a faster specialization of Max("[$col]")...

  • sort_data(\@cols, \@down, \&sortfunction)

    Sort the data in-place in a stable manner, according to column indices @cols, in order. Optional boolean array @down decides if we sort down or up (default) for each column. New in Text::NumericData since 1.10.0: Sort returns the generated function used for sorting (the comparison operator); you can hand this one back as third parameter to re-use it (avoids recompilation of that code). Or you can provide a custom comparison function, even.

  • calc($formula)

    Perform given simple calculation over all data sets. Example:

            $txd->calc('[2]*=10');

    will multiply the values of the second column by 10.

  • calc($formula, \@files, \@A, \@C, \%config)

    Applies the full power of Text::NumericData::FileCalc. Refer to that module for details.

  • mean($col, $xcol, $begin, $end) -> $mean_value

    calculates the arithmetic mean value of column $col taken according the given range $begin to $end of column $xcol Hm, this perhaps should also compute RMS.

  • columns()

    Give number of columns in data.

  • delete_rows(@list)

    Delete the rows indicated by the given index list (starting at 0) from the data set. for a single item or block, you can just use splice() yourself. This method is trying to delete multiple sparsely placed rows in an effective manner (many whole-array-splices would be rather wasteful).

data

Here are some keys for the hash reference representing a Text::NumericData::File.

  • data

    2-dimensional array reference. Stores values as $file->{data}->[$row][$col] (data row/record/set and column start at 0, while the indices in formulae start at 1). The transpose of this might make more sense in applcation, though. Future versions could go that route. But this here is what naturally follows the file structure.

  • records

    Count of non-empty records (data lines).

  • data_index

    An alternative access for the data sets; $file->{data_index}->[$indexcol]->{$value} is a reference to the data set where $value stands in column $indexcol (in fact only the first occurence).

  • sorted_data

    @{$file->{sorted_data}->[$x]} contains the data sets that made it into the index of column $x in ascending order concerning the values of column $x.

  • buffer

    A raw string buffer holding file (stdin) containing what has been read by ReadHead and may be inaccessible otherwise.

Configuration hash keys:

  • interpolation ('linear')

    You can choose which type of interpolation is performed. Either 'linear' for simple built-in linear interpolation or 'spline' for splines provided by Math::Spline.

  • indexformat

    sprintf-style format string for formatting (implies rounding) values for index keys

  • orderedint (0)

    This is effective for the built-in linear interpolation only. Normally, interpolation is made between the nearest neighbours of the desired value - determined out of the whole file with SortedIndex. If this option is true, however, the data sets are searched in the true order of the file, thus interpolation takes place between the first two points that appear to enclose the wanted one. This may be preferrable in some situations where the data is not monotonic in the variable used as index.

  • extrapol (1)

    The linear interpolation only extrapolates if this is set. The splines always extrapolate.

AUTHOR

Thomas Orgis <thomas@orgis.org>

COPYRIGHT AND LICENSE

Copyright (C) 2004-2013, Thomas Orgis.

This module is free software; you can redistribute it and/or modify it under the same terms as Perl 5.10.0. For more details, see the full text of the licenses in the directory LICENSES.

This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose.