The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Fsdb::IO - base class for Fsdb IO (FsdbReader and FsdbWriter)

EXAMPLES

There are several ways to do IO. We look at several that compute the product of x and y for this input:

    #fsdb x y product
    1 10 -
    2 20 -

The following routes go from most easy-to-use to least, and also from least efficient to most. For IO-intensive work, if fastpath takes 1 unit of time, then using hashes or arrays takes approximately 2 units of time, all due to CPU overhead.

Using A Hash

    use Fsdb::IO::Reader;
    use Fsdb::IO::Writer;

    # preamble
    my $out;
    my $in = new Fsdb::IO::Reader(-file => '-', -comment_handler => \$out)
        or die "cannot open stdin as fsdb\n";
    $out = new Fsdb::IO::Writer(-file => '-', -clone => $in)
        or die "cannot open stdin as fsdb\n";

    # core starts here
    my %hrow;
    while ($in->read_row_to_href(\%hrow)) {
        $hrow{product} = $hrow{x} * $hrow{y};
        $out->write_row_from_href(\%hrow);    
    };

It can be convenient to use a hash because one can easily extract fields using hash keys, but hashes can be slow.

Arrays Instead of Hashes

We can add a bit to end of the preamble:

    my $x_i = $in->col_to_i('x') // die "no x column.\n";
    my $y_i = $in->col_to_i('y') // die "no y column.\n";
    my $product_i = $in->col_to_i('product') // die "no product column.\n";

And then replace the core with arrays:

    my @arow;
    while ($in->read_row_to_aref(\@arow)) {
        $arow[$product_i] = $arow[$x_i] * $arow[$y_i];
        $out->write_row_from_aref(\@arow);    
    };

This code has two advantages over hrefs: First, there is explicit error checking for presence of the expected fields. Second, arrays are likely a bit faster than hashes.

Objects Instead of Arrays

Keeping the same preamble as for arrays, we can directly get internal Fsdb "row objects" with a new core:

    # core
    my $rowobj;
    while ($rowobj = $in->read_rowobj) {
        if (!ref($rowobj)) {
            # comment
            &{$in->{_comment_sub}}($rowobj);
            next;
        };
        $rowobj->[$product_i] = $rowobj->[$x_i] * $rowobj->[$y_i];
        $out->write_rowobj($rowobj);    
    };

This code is a bit faster because we just return the internal representation (a rowobj), rather than copy into an array.

However, unfortunately it doesn't handle comment processing.

Fastpathing

To go really fast, we can build a custom thunk (a chunk of code) that does exactly what we want. This approach is called a "fastpath".

It requires a bit more in the preamble (building on the array version):

    my $in_fastpath_sub = $in->fastpath_sub();
    my $out_fastpath_sub = $out->fastpath_sub();

And it allows a shorter core (modeled on rowobjs), since the fastpath includes comment processing:

    my $rowobj;
    while ($rowobj = &$in_fastpath_sub) {
        $rowobj->[$product_i] = $rowobj->[$x_i] * $rowobj->[$y_i];
        &$out_fastpath_sub($rowobj);
    };

This code is the fastest way to implement this block without evaling code.

FUNCTIONS

new

    $fsdb = new Fsdb::IO;

Creates a new IO object. Usually you should not create a FsdbIO object directly, but instead create a FsdbReader or FsdbWriter.

Options:

-fh FILE_HANDLE Write IO to the given file handle.
-header HEADER_LINE Force the header to the given HEADER_LINE (should be verbatim, including #h or whatever). =back
-fscode CODE Define just the column (or field) separator fscode part of the header. See dbfilealter for a list of valid field separators.
-rscode CODE Define just the row separator part of the header. See dbfilealter for a list of valid row separators.
-cols CODE Define just the columns of the header.
-compression CODE Define the compression mode for the file that will take effect after the header.
-clone $fsdb Copy the stream's configuration from $FSDB, another Fsdb::IO object.

_reset_cols

    $fsdb->_reset_cols

Internal: zero all the mappings in the curren schema.

_find_filename_decompressor

returns the name of the decompression program for FILE if it ends in a compression extension

config_one

    $fsdb->config_one($arglist_aref);

Parse the first configuration option on the list, removing it.

Options are listed in new.

config

    $fsdb->config(-arg1 => $value1, -arg2 => $value2);

Parse all options in the list.

default_binmode

    $fsdb->default_binmode();

Set the file to the correct binmode, either given by -encoding at setup, or defaulting from LC_CTYPE or LANG.

If the file is compressed, we will reset binmode after reading the header.

compare

    $result = $fsdb->compare($other_fsdb)

Compares two Fsdb::IO objects, returning the strings "identical" (same field separator, columns, and column order), or maybe "compatible" (same field separator but different columns), or undef if they differ.

close

    $fsdb->close;

Closes the file, frees open file handle, or sends an EOF signal (and undef) down the open queue.

error

    $fsdb->error;

Returns a descriptive string if there is an error, or undef if not.

The string will never end in a newline or punctuation.

update_v1_headerrow

internal: create the header the internal schema

parse_v1_headerrow

internal: interpet the header

update_headerrow

internal: create the header the internal schema

parse_headerrow

internal: interpet the v2 header. Format is:

    #fsdb [-F x] [-R x] [-Z x] columns

All options must come first, start with dashes, and have an argument. (More regular than the v1 header.)

Columns have optional :t type specifiers.

parse_v1_fscode

internal

parse_fscode

Parse the field separator. See dbfilealter for a list of valid values.

parse_rscode

Internal: Interpret rscodes.

See dbfilealter for a list of valid values.

parse_compression

Internal: Interpret compression.

See dbfilealter for a list of valid values.

establish_new_col_mapping

internal

col_create

    $fsdb->col_create($col_name)

Add a new column named $COL_NAME to the schema. Returns undef on failure, or 1 if sucessful. (Note: does not return the column index on creation because so that or can be used for error checking, given that the column number could be zero.) Also, update the header row to reflect this column (compare to _internal_col_create).

colspec_to_name_type_spec

    ($name, $type, $type_speced) = $fsdb->colspec_to_name_type($colspec)

Split a colspec into a name, type, and the type as specified (which may be null if no type was given).

_internal_col_create

    $fsdb->_internal_col_create($colspec)

For internal Fsdb::IO use only. Create a new column $COL_NAME, just like col_create, but do not update the header row (as that function does).

field_contains_fs

    $boolean = $fsdb->field_contains_fs($field);

Determine if the $FIELD contains $FSDB's fscode (in which case it is malformed).

fref_contains_fs

    $boolean = $fsdb->fref_contains_fs($fref);

Determine if any field in $FREF contains $FSDB's fscode (in which case it is malformed).

correct_fref_containing_fs

    $boolean = $fsdb->correct_fref_containing_fs($fref);

Patch up any field in $FREF contains $FSDB's fscode, as best as possible, but turning the field separator into underscores. Updates $FREF in place, and returns if it was altered. This function looses data.

fscode

    $fscode = $fsdb->fscode;

Returns the fscode of the given database. (The encoded verison representing the field separator.) See also fs to get the actual field separator.

fs

    $fscode = $fsdb->fs;

Returns the field separator. See fscode to get the "encoded" version.

rscode

    $rscode = $fsdb->rscode;

Returns the rscode of the given database.

ncols

    @fields = $fsdb->ncols;

Return the number of columns.

cols

    $fields_aref = $fsdb->cols;

Returns the column names (the field names, without type specifications) of the open database as an aref.

colspecs

    $fields_aref = $fsdb->colspecs();

Returns the column headings (the field names) of the open database as an aref.

col_to_i

    $coli = $fsdb->col_to_i($column_name);

Returns the column index (0-based) of a given $COLUMN_NAME. (Names cannot have types with them.)

Note: tests for existence of columns must use defined, since the index can be 0 which would be interpreted as false.

colspec_to_i

    $coli = $fsdb->colspec_to_i($column_specification);

Returns the column index (0-based) of a given $COLUMN_NAME. Name may or may not include a type.

Note: tests for existence of columns must use defined, since the index can be 0 which would be interpreted as false.

col_to_name

    @fields = $fsdb->col_to_name($column_name);

Returns the column anme a given $COLUMN_NAME_OR_INDEX.

col_to_type

    @fields = $fsdb->col_to_type($column_name, $force_type);

Returns the column type (and undef if type is not required, unless $FORCE_TYPE) of a given $COLUMN_NAME.

col_to_colspec

    @fields = $fsdb->col_to_colspec($column_name, $force_type);

Returns the column specification (type is optional, unless $FORCE_TYPE) of a given $COLUMN_NAME.

col_type_is_numeric

    @fields = $fsdb->col_type_is_numeric($column_name);

Returns non-zero if column specification is numeric. (Actually, returns 1 for integers and 2 for floats.)

i_to_col

    @fields = $fsdb->i_to_col($column_index);

Return the name of the COLUMN_INDEX-th (0-based) column.

fastpath_cancel

    $fsdb->fastpath_cancel();

Discard any active fastpath code and allow fastpath-incompatible operations.

codify

    ($code, $has_last_refs) = $self->codify($underscored_pseudocode);

Convert db-code $UNDERSCORED_PSEUDOCODE into perl code in the context of a given Fsdb stream.

We return a string of code $CODE that refs @{$fref} and @{$lfref} for the current and prior row arrays, and a flag $HAS_LAST_REFS if @{$lfref} is needed. It is the callers job to set these up, probably by evaling the returned string in the context of those variables.n

The conversion is a rename of all _foo's into database fields. For more perverse needs, _foo(N) means the Nth field after _foo. Also, as of 29-Jan-00, _last_foo gives the last row's value (_last_foo(N) is not supported). To convert we eval $codify_code.

20-Feb-07: _FROMFILE_foo opens the file called _foo and includes it in place.

NEEDSWORK: Should make some attempt to catch misspellings of column names.

clean_potential_columns

    @clean = Fsdb::IO::clean_potential_columns(@dirty);

Clean up user-provided column names.