Data::CTable - Read, write, manipulate tabular data

            ## Read some data files in various tabular formats
            use          Data::CTable;
            my $People = Data::CTable->new("people.merge.mac.txt");
            my $Stats  = Data::CTable->new("stats.tabs.unix.txt");

            ## Clean stray whitespace in fields
            $Stats ->clean_ws();

            ## Retrieve columns
            my $First = $People->col('FirstName');  
            my $Last  = $People->col('LastName' );

            ## Calculate a new column based on two others
            my $Full  = [map {"$First->[$_] $Last->[$_]"} @{$People->all()}];

            ## Add new column to the table
            $People->col(FullName => $Full);

            ## Another way to calculate a new column
            $People->calc(sub {no strict 'vars'; $Key = "$Last,$First";});

            ## "Left join" records matching Stats:PersonID to People:Key
            $Stats->join($People, PersonID => 'Key');

            ## Find certain records
            $Stats->select(Department => sub {/Sale/i  });  ## Sales depts
            $Stats->omit  (Department => sub {/Resale/i});  ## not Resales
            $Stats->select(UsageIndex => sub {$_ > 20.0});  ## high usage

            ## Sort the found records
            $Stats->sortspec('DeptNum'   , {SortType => 'Integer'});
            $Stats->sortspec('UsageIndex', {SortType => 'Number' });
            $Stats->sort([qw(DeptNum UsageIndex Last First)]);

            ## Make copy of table with only found/sorted data, in order
            my $Report = $Stats->snapshot();

            ## Write an output file
            $Report->write(_FileName => "Rept.txt", _LineEnding => "mac");

            ## Print a final progress message.

            ## Dozens more methods and parameters available...

    Data::CTable is a comprehensive utility for reading, writing,
    manipulating, cleaning and otherwise transforming tabular data. The
    distribution includes several illustrative subclasses and utility

    A Columnar Table represents a table as a hash of data columns, making it
    easy to do data cleanup, formatting, searching, calculations, joins, or
    other complex operations.

    The object's hash keys are the field names and the hash values hold the
    data columns (as array references).

    Tables also store a "selection" -- a list of selected / sorted record
    numbers, and a "field list" -- an ordered list of all or some fields to
    be operated on. Select() and sort() methods manipulate the selection
    list. Later, you can optionally rewrite the table in memory or on disk
    to reflect changes in the selection list or field list.

    Data::CTable reads and writes any tabular text file format including
    Merge, CSV, Tab-delimited, and variants. It transparently detects,
    reads, and preserves Unix, Mac, and/or DOS line endings and tab or comma
    field delimiters -- regardless of the runtime platform.

    In addition to reading data files, CTable is a good way to gather,
    store, and operate on tabular data in memory, and to export data to
    delimited text files to be read by other programs or interactive
    productivity applications.

    To achieve extremely fast data loading, CTable caches data file contents
    using the Storable module. This can be helpful in CGI environments or
    when operating on very large data files. CTable can read an entire
    cached table of about 120 megabytes into memory in about 10 seconds on
    an average mid-range computer.

    For simple data-driven applications needing to store and quickly
    retrieve simple tabular data sets, CTable provides a credible
    alternative to DBM files or SQL.

    For data hygiene applications, CTable forms the foundation for writing
    utility scripts or compilers to transfer data from external sources,
    such as FileMaker, Excel, Access, personal organizers, etc. into
    compiled or validated formats -- or even as a gateway to loading data
    into SQL databases or other destinations. You can easily write short,
    repeatable scripts in Perl to do reporting, error checking, analysis, or
    validation that would be hard to duplicate in less-flexible application

    The data representation is simple and open so you can directly access
    the data in the object if you feel like it -- or you can use accessors
    to request "clean" structures containing only the data or copies of it.
    Or you can build your own columns in memory and then when you're ready,
    turn them into a table object using the very flexible new() method.

    The highly factored interface and implementation allow fine-grained
    subclassing so you can easily create useful lightweight subclasses.
    Several subclasses are included with the distribution.

    Most defaults and parameters can be customized by subclassing,
    overridden at the instance level (avoiding the need to subclass too
    often), and further overridden via optional named-parameter arguments to
    most major method calls.

  Similar / related modules on CPAN

    The Data::Table module by Yingyao Zhou & Guangzhou Zou offers similar
    functionality, but uses a different underlying data representation
    (2-dimensional array), and has a somewhat different feature set. Check
    it out. Maybe you will prefer it for your application.


    The Data::ShowTable module renders tables in various viewable formats.
    CTable relies on ShowTable's ShowBoxTable method to implement its own
    format() and out() methods.



    The CTable documentation, source code, and examples assume familiarity
    with large nested data structures, object-oriented syntax and
    terminology, and comfort working with array and hash references and
    array and hash slice syntax.

    See the perlref man page for more on these topics.

  How to learn more

    Dozens more methods, parameters, and examples are described below.

    See the full source code in

    Or, after installing, read the man page using:

            man     Data::CTable
            perldoc Data::CTable

    See the eg/ (examples) folder in the Data::CTable distribution and the script for scripts demonstrating every CTable method.

    For latest version and other news, check the Data::CTable home page:


    Or search CPAN:


    Using CPAN module:

            perl -MCPAN -e 'install Data::CTable'

    Or manually:

            tar xzvf Data-CTable*gz
            cd Data-CTable-?.??
            perl Makefile.PL
            make test
            make install

    In addition to the module itself, there are a number of subclasses and
    simple utilities included with the Data::CTable distribution.


    The Data::CTable distribution includes these example subclasses. Each is
    installed in your Perl environment along with the main module, and so
    may be used by your scripts. Each has its own man/perldoc page
    containing more detail.

    Data::CTable::ProgressLogger is a subclass that logs all progress
    messages to a list within the object itself rather than (merely) echoing
    them to STDERR. Later, you may retrieve and examine the list.

    Data::CTable::Script is a virtual subclass that includes class and
    object methods that make it easy to write a simple interactive
    command-line program that parses options and outputs a table.

    Data::CTable::Listing is a very useful subclass of Data::CTable::Script
    that implements a souped-up Unix-like "ls" (file listing) command -- it
    first gets an optionally-recursive listing of any number of files and/or
    directories, builds a list of their full absolute or relative paths,
    then build a Data::CTable::Listing object that contains all the paths,
    plus about 25+ other pieces of useful information about each file or

    The "tls" utility, below, is simply a command-line cover for this class,
    but you could use this class in your own scripts in order to get
    detailed file listings.


    Each of these utilities is provided mainly so you can see mature
    examples of how to use Data::CTable in real-world scripts.

    But each is also genuinely useful, too, and you may enjoy adding them to
    your regular bag of tricks or using them as an easily-modifiable basis
    for scripts of your own.

    On most systems, these will be installed in an appropriate directory in
    your path when you install CTable, and hence will be executable just by
    typing their file name. (On Windows, they'll be wrapped by a .bat file
    and installed in C:\Perl\bin or equivalent. On *nix, they'll be in
    /usr/bin/ or equivalent.)

    tls is a command-line utility that wraps Data::CTable::Listing to
    implement a variant on the classic Unix "ls" command in which an
    internal CTable object is used to hold and calculate a very large amount
    of meta-data about each file and directory and then output that data in
    formatted tables, or as any kind of delimited text file, and with much
    more flexibility and control over included data, and sort and sub-sort
    order than with ls.

    tshow is a command-line utility that reads each of its arguments as a
    Data::CTable file and then calls the out() method to display its entire
    contents. (Warning: out() is slow with very large data sets.)

    getweather is a command-line utility that takes a US zip code, grabs the
    local weather report from a popular weather web site and uses a CTable
    object to store, process, and clean, and present the table of weather
    data that results in a simple text format.

            ## Create an object / read file(s) / override params

            use  Data::CTable;

            $t = Data::CTable->new()
            $t = Data::CTable->new($File)
            $t = Data::CTable->new($File1, $File2....)
            $t = Data::CTable->new($Params)
            $t = Data::CTable->new($Params, $File1)
            $t = Data::CTable->new($Params, $File1, $File2....)

            ## Internal initializer (subclassable): called for you by new()


    If the first argument to new() is a hash ref, it is the $Params hash of
    initial parameters and/or data columns which, if supplied will form the
    starting point for the object being created. Any non-hash and/or further
    arguments to new() are treated as file names to be opened.

    If supplied, data in the $Params hash will be shallowly copied -- the
    original hash object passed will not be used, but any sub-structures
    within it will now "belong" to the resulting new object which will feel
    free to manipulate them or discard them.

    Then, any parameters not supplied (usually most of them) will be
    defaulted for you because new() will call the internal method
    initialize() before the object is finished and returned.

    See the PARAMETER REFERENCE section below for the parameters you can
    choose to supply or have defaulted for you.

    Any file name arguments will be read and appended into a single object.
    If any of the files fails to be read, then new() will fail (return a
    false value) and no object will be created.

    initialize() makes sure there is a legal and consistent value for every
    internal parameter in the object. Generally, initialize() leaves alone
    any parameters you supplied to new(), and simply sets default values for
    any that were not yet supplied.

    You should never need to call initialize() directly.

    After calling initialize(), new() then calls the append_files_new()
    method to process the filename arguments. This method then calls read()
    on the first filename, and then append_file on all subsequent file
    names, appending them to the first, in sequence.

    See append() for an explanation of how the data from multiple files is
    combined into a single table.

  Advanced: Using a template object

            ## Calling new() with a template object

            $v =           $t->new()                           
            $v =           $t->new($File)                      
            $v =           $t->new($File1, $File2....)         
            $v =           $t->new($Params)                    
            $v =           $t->new($Params, $File1)                    
            $v =           $t->new($Params, $File1, $File2....)

    You can also call $t->new() to use an existing object, $t, as a template
    for the new object. $t->new() will create a new object of the same class
    or subclass as the template object. Furthermore, the template object, if
    provided, will be used as a starting point for the resulting object --
    in fact, it will initially share shallow copies of all data columns, if
    any, and all internal parameters and data structures, if any.

    This advanced shared-data technique could be used to create two separate
    table objects that share and operate on the same underlying data columns
    in memory but have different custom field lists, custom selections, sort
    behavior, etc. But don't do this unless you're sure you understand what
    you're doing, because changing data in one table would change it in the

    The parameters listed here are recognized by new(), initialize(), and by
    many functions that use the named-parameter calling convention, such as
    read(), write(), sort(), and others.

    Any parameter listed may be specified when using new().

    Most parameters should not be directly accessed or manipulated once the
    object has been created, except those that have appropriate accessor
    methods described throughout in this documentation.

    Each parameter in the lists below is listed along with its defaulting
    logic as performed by new() via initialize().

  Custom field list and custom selection (record list)

    _FieldList ||= undef;
        This is the ordered list (array reference) of columns / fields
        present in the object. It is set by read() to reflect the names and
        order of the fields encountered or actually read from in the
        incoming data file, if any. Initially this list is undefined, and if
        removed or left undefined, then the de-facto field list will be a
        list of all columns present in the table object, in alphabetical
        order (see fieldlist() and fieldlist_all()).

        Normally, all fields present in the table object would be listed in
        the field list.

        However, this parameter may be set in the object (or overridden in
        named-parameter function calls like read(), write(), etc.) to cause
        a subset of fields to be read, written or otherwise used. If a
        subset field list is specified before reading a data file, then ONLY
        fields listed will be read -- this is a way to read just certain
        fields from a very large file, but may not always be what you want.

        Specifying the field list before calling read() is required if the
        data file has no header row giving names to its fields -- the names
        you specify in the field list, in order, will be applied to the data
        file being read. See the _HeaderRow parameter, below.

        If a subset field list is used after columns are already loaded in
        memory, the columns not listed in the field list will still be
        present in the object (and can be listed by calling
        fieldlist_all()), but they will be omitted from most operations that
        iterate over fields.

    _Selection ||= undef;
        This is a list of the record numbers of "selected" records in the
        table, possibly indicating sorted order.

        If absent, then all records are considered to be selected, in
        "natural" order -- i.e. the order they occur in the file.

        You can create and set your own selection list or get and modify an
        existing one. Deleting it resets the selection (for example, by
        calling select_all()).

        Calling sort() will create a _Selection if none existed. Otherwise
        it operates by modifying the existing _Selection, which may be a
        subset of all record numbers.

  Cache behavior controls

    See sections related to Cacheing, below.

    _CacheOnRead = 1 unless exists
        Boolean: whether data files read by the read() method should be
        cached after reading. Once cached, the data will be read from the
        cache instead of the original file the NEXT TIME READ() IS CALLED,
        but only if: 1) the cache file is found, and 2) its date is later
        than the original. Otherwise, the cache file is ignored or
        re-written. Cacheing can be up to 10x faster than parsing the file,
        so it's almost always worth doing in any situation where you'll be
        reading a data file more often than writing it.

        This parameter defaults to true.

    _CacheOnWrite = 0 unless exists
        Boolean: whether tables written by the write() method should be
        cached after writing. This defaults to false on the assumption that
        the program won't need to re-read a file it just wrote. However,
        this behavior would be useful if a later step in your program or
        another program will be reading the file that was written and would
        benefit from having the cacheing already done. Cacheing a file after
        writing is quite fast since the data is already in memory and of
        course it speeds up subsequent read() operations by up to 10x.

    _CacheExtension = ".cache" unless exists
        This is the file name extension that is added to a file's name to
        determine the name of its corresponding cache file. (First, any
        existing extension, if any, is removed.) If this extension is empty,
        then the cache file will be named the same as the original assuming
        the cache file is being stored in a different directory. (See next

    _CacheSubDir = "cache" unless exists
        This is the absolute or relative path to the subdirectory that
        should be used to store the cache files. The default value is the
        relative path to a directory called "cache". Relative paths will be
        appended to the directory path containing the original file being

        Absolute cache paths (such as /tmp or c:\temp\) can also be used.

        Override _CacheExtension and _CacheSubdir in a subclass, in each
        object, or in each call to read() or write() in order to have the
        cache files stored elsewhere. But remember: unless you use the same
        cache settings next time you read the same file, the cache files
        will be orphaned.

  Progress routine / setting

    _Progress = undef unless exists
        The _Progress setting controls the routing of diagnostic messages.
        Four possible settings are recognized:

                undef (default)        The class's progress settings are used.
                subroutine reference   Your own custom progress routine.
                true                   Built-in progress_default() method used.
                0/false                No progress messages for this object.

        See the PROGRESS section for a description of the interface of
        custom progress routines and for details on how the builtin one

  File format settings

    _LineEnding ||= undef;
        _LineEnding indicates the line ending string or setting to be used
        to read a file, the setting that actually *was* used to read a file,
        and/or the line ending that will be used to write a file.

        Set this parameter to force a particular encoding to be used.

        Otherwise, leave it undef. The program will Do What You Mean.

        If _LineEnding is undef when read() is called, read() will try to
        guess the line ending type by inspecting the first file it reads.
        Then it will set this setting for you. It can detect DOS, Unix, and
        Mac line endings.

        If _LineEnding is undef when write() is called, write() will use
        ""\n"", which yields different strings depending on the current
        runtime platform: \x0A on Unix; \x0D in MacPerl, \x0D\x0A on DOS.

        Otherwise, write() uses the value defined in _LineEnding, which
        would match the value filled in by read() if this object's data
        originally had been read from a file. So if you read a file and then
        later write it out, the line endings in the written file will match
        the format of original unless you override _LineEnding specifically.

        Since Data::CTable supports reading and writing all common endings,
        base your decision on line ending format during write() on the needs
        of other programs you might be using.

        For example: FileMaker Pro and Excel crash / hang if Unix line
        endings are used, so be sure to use the ending format that matches
        the needs of the other programs you plan to use.

        As a convenience, you may specify and retrieve the _LineEnding
        setting using the mnemonic symbols "mac", "dos" and "unix." These
        special values are converted to the string values shown in this

                 symbol   string value  chars  decimal     octal     control
                  dos     "\x0D\x0A"    CR/LF   13,10    "\015\012"   ^M^J
                  mac     "\x0D"        CR      13       "\015"       ^M
                  unix    "\x0A"        LF      10       "\012"       ^J

        See the section LINE ENDINGS, below, for accessor methods and
        conversion utilities that help you get/set this parameter in either
        symbolic format or string format as you prefer.

    _FDelimiter ||= undef;
        _FDelimiter is the field delimiter between field names in the header
        row (if any) and also between fields in the body of the file. If
        undef, read() will try to guess whether it is tab ""\t"" or comma
        <",">, and set this parameter accordingly. If there is only one
        field in the file, then comma is assumed by read() and will be used
        by write().

        To guess the delimiter, the program looks for the first comma or tab
        character in the header row (if present) or in the first record.
        Whichever character is found first is assumed to be the delimiter.

        If you don't want the program to guess, or you have a data file
        format that uses a custom delimiter, specify the delimiter
        explicitly in the object or when calling read() or make a subclass
        that initializes this value differently. On write(), this will
        default to comma if it is empty or undef.

    _QuoteFields = undef unless exists
        _QuoteFields controls how field values are quoted by write() when
        writing the table to a delimited text file.

        An undef value (the default) means "auto" -- each field is checked
        individually and if it contains either the _FDelimiter character or
        a double-quote character, the field value will be surrounded by
        double-quotes as it is written to the file. This method is slower to
        write but faster to read, and may make the output easier for humans
        to read.

        A true value means always put double-quotes around every field
        value. This mode is faster to write but slower to read.

        A zero value means never to use double-quotes around field values
        and not to check for the need to use them. This method is the
        fastest to read and write. You may use it when you are certain that
        your data can't contain any special characters. However, if you're
        wrong, this mode will produce a corrupted file in the event that one
        of the fields does contain the active delimiter (such as comma or
        tab) or a quote.

    _HeaderRow = 1 unless exists
        _HeaderRow is a boolean that says whether to expect a header row in
        data files. The default is true: a header row is required. If false,
        _FieldList MUST be present before calling read() or an error will be
        generated. In this latter case, _FieldList will be assumed to give
        the correct names of the fields in the file, in order, before the
        file is read. In other words, the object expects that either a) it
        can get the field names from the file's header row or b) you will
        supply them before read() opens the file.

  Encoding of return characters within fields

    _ReturnMap = 1 unless exists
        _ReturnMap says that returns embedded in fields should be decoded on
        read() and encoded again on write(). The industry-standard encoding
        for embedded returns is ^K (ascii 11 -- but see next setting to
        change it). This defaults to true but can be turned off if you want
        data untouched by read(). This setting has no effect on data files
        where no fields contain embedded returns. However, it is vital to
        leave this option ON when writing any data file whose fields could
        contain embedded returns -- if you have such data and call write()
        with _ReturnMap turned off, the resulting file will be an invalid
        Merge/CSV file and might not be re-readable.

        When these fields are decoded on read(), encoded returns are
        converted to ""\n"" in memory, whatever its interpretation may be on
        the current platform (\x0A on Unix or DOS; \x0D on MacPerl).

        IMPORTANT NOTE: When these fields are encoded by write(), any
        occurrence of the current _LineEnding being used to write the file
        is searched and encoded FIRST, and THEN, any occurrence of "\n" is
        also searched and encoded. For example, if using mac line endings
        (^M) to write a file on a Unix machine, any ^M characters in fields
        will be encoded, and then any "\n" (^J) characters will ALSO be
        encoded. This may not be what you want, so be sure you know how your
        data is encoded in cases where your field values might contain any
        ^J and/or ^M characters.

        IMPORTANT NOTE: If you turn _ReturnMap off, fields with returns in
        them will still be double-quoted correctly. Some parsers of tab- or
        comma-delimited files are able to support reading such files.
        HOWEVER, the parser in this module's read() method DOES NOT
        currently support reading files in which a single field value
        appears to span multiple lines in the file. If you have a need to
        read such a file, you may need to write your own parser as a
        subclass of this module.

    _ReturnEncoding ||= "\x0B";
        This is the default encoding to assume when embedding return
        characters within fields. The industry standard is "\x0B" (ascii 11
        / octal \013 / ^K) so you should probably not ever change this

        When fields are encoded on write(), ""\n"" is converted to this
        value. Note that different platforms use different ascii values for
        ""\n"", which is another good reason to leave the ReturnEncoding
        feature enabled when calling write().

        To summarize: this module likes to assume, and you should too, that
        returns in data files on disk are encoded as "\x0B", but once loaded
        into memory, they are encoded as the current platform's value of

    _MacRomanMap = undef unless exists
        Data::CTable assumes by default that you want field data in memory
        to be in the ISO 8859-1 character set (the standard for Latin 1
        Roman characters on Unix and Windows in the English and Western
        European languages -- and also the default encoding for HTML Web

        _MacRomanMap controls the module's optional mapping of Roman
        characters from Mac format on disk to ISO format in memory when
        reading and writing data files. These settings are recognized:

                undef   ## Auto: Read/write Mac chars if using Mac line endings  
                1       ## On:   Assume Mac char set in all fields
                0       ## Off:  Don't do any character mapping at all

        The default setting is undef, which enables "Auto" mode: files found
        to contain Mac line endings will be assumed to contain Mac
        upper-ASCII characters and will be mapped to ISO on read(); and
        files to be written with Mac line endings will mapped back from ISO
        to Mac format on write().

        If your data uses any non-Latin-1 character sets, or binary data, or
        you really want Mac upper-ASCII characters in memory, or you just
        don't want this module messing with your encodings, set this option
        to 0 (Off) or make a subclass that always sets it to 0.

        See also the clean() methods that can help you translate just the
        columns you want after reading a file or before writing it, which
        may be faster for you if only a few fields might contain high-ASCII

    _FileName ||= undef;
        This is the name of the file that should be read from or WAS read
        from. (read() will set _FileName to the value it used to read the
        file, even if _FileName was only supplied as a named parameter.)

        This name will also be used, unless overridden, to re-write the file
        again, but with an optional extension added. (See next setting.)

    _WriteExtension = ".out" unless exists
        The _WriteExtension is provided so that CTable won't overwrite your
        input data file unless you tell it to.

        _WriteExtension will be added to the object's _FileName setting to
        create a new, related file name, before writing.... UNLESS _FileName
        is supplied as an direct or named parameter when calling write().

        In the latter case, write() uses the file name you supply and adds
        no extension, even if this would mean overwriting the original data

        To add _WriteExtension, write() places it prior to any existing
        final extension in the _FileName:

                _FileName             default file name used by write()
                People.merge.txt      People.merge.out.txt
                People                People.out

        If you want to always overwrite the original file without having to
        supply _FileName each time, simply set _WriteExtension to undef in a
        subclass or in each instance.

        If _CacheOnWrite is true, then the _WriteExtension logic is applied
        first to arrive at the actual name of the file to be written, and
        then the _CacheExtension logic is applied to that name to arrive at
        the name of the cache file to be written.

  Sorting-related parameters

    _SortOrder ||= undef;
        _SortOrder is the list of fields which should be used as primary,
        secondary, etc. sort keys when sort() is called. Like other
        parameters, it may be initialized by a subclass, stored in the
        object, or provided as a named parameter on each call to sort().

        If _SortOrder is empty or undefined, then sort() sorts the records
        by record number (i.e. they are returned to their "natural" order).

    _SortSpecs ||= {};
        _SortSpecs is a hash of specifications for the SortType and
        SortDirection of fields on which sorting may be done. For any field
        missing a sort spec or the SortType or SortDirection components of
        its sort spec, the _DefaultSortType and _DefaultSortDirection
        settings will be used. So, for example, if all fields are of type
        String and you want them to sort Ascending, then you don't need to
        worry about _SortSpecs. You only need to provide specs for fields
        that don't take the default settings.

        _SortSpecs might look like this:

                {Age      => {SortType => 'Integer'}, 
                 NameKey  => {SortType => 'Text', SortDirection => -1}}

    _SRoutines ||= {};
        _SRoutines is a hash mapping any new SortTypes invented by you to
        your custom subroutines for sorting that type of data. (See the
        section on sort routines, below, for a full discussion.)

  Sorting defaults

    _DefaultSortType ||= 'String';
        If you sort using a field with no sort spec supplied, or whose sort
        spec omits the SortType, it will get its SortType from this

        See the sections below on SORT TYPES and SORT ROUTINES.

    _DefaultSortDirection ||= 1;
        If you sort using a field with no sort spec supplied, or whose sort
        spec omits the SortDirection, it will get its SortDirection from
        this parameter.

        Legal sort directions are: 1 (Ascending) or -1 (Descending).

        See the section below on DEFAULT SORT DIRECTION.

  Miscellaneous parameters

    _ErrorMsg ||= "";
        This parameter is set by read() or write() methods that encounter an
        error (usually a parameter error or file-system error) that prevents
        them from completing. If those methods or any methods that call them
        return a false value indicating failure, then _ErrorMsg will contain
        a string explaining the problem. The message will also have been
        passed to the progress() method for possible console feedback.

        This parameter is set to 1 (true) by read() if the last call to
        read() brought in a subset of the fields available in the file; 0

        The object uses this field internally so it knows to abandon any
        cache files that might not contain all requested fields upon read().

    Most subclasses will override initialize() to set default values for the
    parameters of the parent class and then they may provide default values
    for other subclass-specific parameters. Then, the subclass's
    initialize() should call SUPER::initialize() to let the parent class(es)
    take care of the remaining ones.

    Every initialize() method should always allow for parameters to have
    already been provided by the $Params hash or template object. It should
    not overwrite any valid values that already exist.

    The following sample subclass changes the default setting of the
    _Progress parameter from undef to 1 and then overrides the
    progress_default() method to log all progress messages into a new
    "_ProgrLog" (progress log) parameter stored in the object.

            {   ## Data::CTable::ProgressLogger: store messages in the object

                package Data::CTable::ProgressLogger;
                use vars qw(@ISA); @ISA=qw(Data::CTable);

                sub initialize       ## Add a new param; change one default
                    my $this           = shift;
                    $this->{_Progress} = 1 unless exists($this->{_Progress});
                    $this->{_ProgrLog} ||= [];

                sub progress_default ## Log message to object's ProgMsgs list
                    my $this            = shift;
                    my ($msg)           = @_;
                    chomp                                       $msg;
                    push @{$this->{_ProgrLog}}, localtime() . " $msg";


                sub show_log                 ## Use Dumper to spit out the log list
                    my $this            = shift;

            ## Later...

            my $Table = Data::CTable::ProgressLogger->new("mydata.txt");
            # ... do stuff...

            ## Getting / setting the object's _FieldList

            $t->fieldlist()             ## Get _FieldList or fieldlist_all()
            $t->fieldlist_hash()        ## Get fieldlist() as keys in a hash
            $t->fieldlist_all()         ## Get all fields (ignore _FieldList)
            $t->fieldlist($MyList)      ## Set field list (_FieldList param)

            $t->fieldlist(0)            ## Remove field list (use default)

            $t->fieldlist_force($MyList)## Set list; remove non-matching cols

            $t->fieldlist_truncate()    ## Just remove nonmatching cols

            $t->fieldlist_default()     ## Default field list (alpha-sorted)

            $t->fieldlist_add($MyName)    ## Append new name to custom list.
            $t->fieldlist_delete($MyName) ## Delete name from custom list.

    A CTable object can optionally have a custom field list. The custom
    field list can store both the ORDER of the fields (which otherwise would
    be unordered since they are stored as keys in a hash), and also can be a
    subset of the fields actually in the object, allowing you to temporarily
    ignore certain effectively-hidden fields for the benefit of certain
    operations. The custom field list can be changed or removed at any time.

    The custom field list is stored in the private _FieldList parameter.

    fieldlist() always returns a list (reference). The list is either the
    same list as _FieldList, if present, or it is the result of calling
    fieldlist_default(). In CTable, fieldlist_default() in turn calls
    fieldlist_all() -- hence fieldlist() would yield an auto-generated list
    of all fields in alphabetical order.

    fieldlist_all() can be called directly to get a list of all fields
    present regardless of the presence of a _FieldList parameter. The list
    is an alphabetical case-insensitively sorted list of all hash keys whose
    names do not begin with an underscore.

    You could override this method if you want a different behavior. Or, you
    could create your own custom field list by calling fieldlist_all() and
    removing fields or ordering them differently.

    To set a custom field list (in _FieldList), call fieldlist() or
    fieldlist_set() with a list (reference). The list must be a list of
    strings (field names) that do not begin with underscore. The object owns
    the list you supply.

    To remove a custom field list (and let the default be used), call
    fieldlist(0) or fieldlist_set() with no arguments (these will return the
    fieldlist that was deleted, if any).

    fieldlist_freeze() "freezes" the fieldlist in its current state. This is
    equivalent to the following:


    ... which would force the fieldlist to $t->fieldlist_all() if and only
    if there is not already a custom _FieldList present.

    IMPORTANT NOTE ABOUT PARTIAL FIELD LISTS: When setting a field list, the
    object ensures that all fields (columns) mentioned in the list are
    present in the object -- it creates empty columns of the correct length
    as necessary. However, it does NOT delete any fields not mentioned in
    the field list. This allows you to manipulate the field list in order to
    have certain fields be temporarily ignored by all other methods, then
    alter, restore, or remove it (allow it to revert to default) and they
    will be effectively unhidden again. Some methods (such as cols(),
    write(), etc.) also allow you to specify a custom field list that will
    override any other list just during the execution of that method call
    but will not modify the object itself.

    Call fieldlist_force() to set the list AND have any non-listed fields
    also deleted at the same time (by calling fieldlist_truncate()
    internally). You can also just delete individual columns one-by-one, of
    course, using the column-manipulation methods and the custom fieldlist,
    if any, will be appropriately updated for you.

    fieldlist_truncate() deletes any fields found in the table but not
    currently present in _FieldList. A hash of the deleted columns is
    returned to the caller. If there is no _FieldList, then this method does

    fieldlist_default() just calls fieldlist_all() in this implementation,
    but could be changed in subclasses.

    fieldlist_add() is the internal method that adds a new field name to the
    custom field list (if present) and if the field name was not already on
    the list. It is called by other methods any time a new column is added
    to the table. Don't call it directly unless you know what you're doing
    because the corresponding column won't be created. (Instead, use col().)
    The field name is appended to the end of the existing custom field list.
    If there is no custom field list, nothing is done.

    fieldlist_delete() is the internal method that deletes a field name from
    the custom field list (if present). It is called by other methods when
    columns are deleted, but it does not actually delete the columns
    themselves, so use with caution: deleting a field from the custom field
    list effectively hides the field. This method has no effect, however, if
    there is no custom field list present. So don't call this method
    directly unless you know what you're doing.

            ## Getting or setting data in entire columns

            $t->{$ColName}                ## Get a column you know exists
            $t->col($ColName)             ## Get a column or make empty one.

            $t->col($ColName, $ListRef)   ## Set all of a column all at once.
            $t->col_set($ColName, $ListRef)
            $t->col_force($ColName, $ListRef) ## Add but don't check size or
                                              ##  add to custom field list

            $t->col_set($ColName, undef)  ## Delete a column completely

            $t->col_empty()            ## An empty col presized for table
            $t->col_empty(22)          ## An empty col of another length
            $t->col_empty($Col)        ## An empty col sized to match another

            $t->col_default()          ## Default if req. column not found.

            $t->col_exists($Field)     ## Check existence of column
            $t->col_active($Field)     ## Restrict check to fieldlist()

            $t->cols($ColList)         ## Get list of multiple named columns
            $t->cols_hash($ColList)    ## Get hash " " "

            $t->col_rename($Old => $New) ## Change name of columns
            $t->col_rename($Old1 => $New1, $Old2 => $New2) ## Change several

    A "column" is a field in the table and all its data. The column's field
    name is a key in the object itself, and may also optionally be listed in
    a custom field list if present. The column's data is the key's value in
    the hash and is an array ref of values presumed to be of the same data
    type (e. g. string, integer, etc.)

    Sometimes the terms "column" and "field" are used interchangeably in
    this documentation.

    If you already know that a column exists (because you got it from the
    fieldlist() method and you've not previously manipulated _FieldList
    directly but instead carefully used the method calls available for
    that), then you can safely get the column by just looking it up in the
    object itself.

    The col() method does the same thing, but forces the column to spring
    into existence if it did not already (which can also have the
    potentially unwanted side-effect of hiding coding errors in which you
    retreive mis-named columns: so beware). Columns brought into existence
    this way will automatically be pre-sized (i.e. they will will be created
    and set to whatever col_default() returns).

    The col() or col_set() methods can also be used to set a column. When
    the column is set, the list you pass is automatically sized (lengthened
    or truncated) to match the current length of the table. If this is not
    what you want, then call col_force() which will not check whether the
    new column matches the size of the others.

    No matter how you set it, the object now "owns" the list you gave it.

    As a convenience, col(), col_set() and col_force() return the column
    that was set. They silently discard any previous column.

    All three methods of column setting will append the column to the custom
    field list if one is present and the column name is not already listed
    there (by calling fieldlist_add()). They will also call the extend()
    method to ensure all columns have the same length (either others will be
    extended to match the length of the new one, or the new one will be
    extended to match the length of the others).

    col_delete() deletes a column.

    col_empty() returns an anonymous list reference that is pre-sized to the
    length of the table (by default). You could use it to get an empty
    column that you intend to fill up and then later insert into the table
    or use to hold the results of an operation on other columns. If you want
    a different length, specify it as a number or as an array ref whose
    length should be matched.

    col_default() is the internal method that implements the "springing into
    existence" of missing columns. Currently it just calls col_empty().
    Other subclasses might want to have it return undef or a string like
    "NO_SUCH_COLUMN" in order to help track programming errors where
    nonexistent columns are requested.

    cols($FieldList) returns an ordered list of the requested column names.
    If no list is given, then fieldlist() is used.

    cols_hash($FieldList) does the same as cols(), but the result is a hash
    whose keys are the field names and whose values are the columns -- much
    like the original object itself, but not blessed into the class. The
    resulting hash, however, could be used as the prototype for a new
    Data::CTable object (by calling the new() method). However, be warned
    that both objects will think they "own" the resulting shared so be
    careful what you do..... which brings us to this:

    IMPORTANT NOTE ABOUT GETTING COLUMNS: The columns you retrieve from a
    table are still "owned" by the table object as long as it lives. If you
    modify them, you are modifying the table's data. If you change their
    length, then you may be invalidating the table's own expectations that
    all its columns have the same length. So beware.

    Just make yourself a copy of the data if that isn't what you want. For
    example, instead of this:

            my $Foo =    $Table->col('Foo');   ## Reference to actual column

    Do this:

            my $Foo = [@{$Table->col('Foo')}]; ## Shallow copy of the column

            ## Performing your own cleanups or validations

            $t->clean($Sub)           ## Clean with custom subroutine
            $t->clean($Sub, $Fields)  ## Clean specified columns only

            ## Cleaning whitespace

            $t->clean_ws()        ## Clean whitespace in fieldlist() cols
            $t->clean_ws($Fields) ## Clean whitespace in specified cols

            ## Cleaning methods that map character sets



            ## Character mapping utilities (not methods)

            use Data::CTable qw(


            &ISORoman8859_1ToMacRoman(\ $Str)  ## Pass pointer to buffer
            &MacRomanToISORoman8859_1(\ $Str)  ## Pass pointer to buffer

            &ISORoman8859_1ToMacRoman_clean()  ## Operates on $_
            &MacRomanToISORoman8859_1_clean()  ## Operates on $_

    One of the most important things you can do with your data once it's
    been placed in a table in Perl is to use the power of Perl to scrub it
    like crazy.

    The built-in clean_ws() method applies a standard white-space cleanup to
    selected records in every field in the fieldlist() or other list of
    fields you optionally supply (such as fieldlist_all()).

    It does the following cleanups that are deemed correct for the majority
    of data out there:

            - Remove all leading whitespace, including returns (\n)
            - Remove all trailing whitespace, including returns (\n)
            - Convert runs of spaces to a single space
            - Convert empty string values back to undef to save space

    Of course, depending on your data, clean_ws() might just be the first
    thing you do in your cleanup pass. There might be many more cleanups
    you'd like to apply.

    clean() is like clean_ws() except you supply as the first argument your
    own cleaning subroutine (code reference). It should do its work by
    modifying $_.

    Both clean_ws() and clean() apply cleaning ONLY to selected records. If
    this isn't what you want, then select_all() before cleaning.

    Since a cleanup subroutine can do ANY modifications to a field that it
    likes, you can imagine some cleanup routines that also supply default
    values and do other validations.

    For example, a cleanup routine could convert every value in each field
    to an integer, or apply minimum or maximum numerical limits:

            sub {$_ =     int($_)      }
            sub {$_ = max(int($_), 0)  }
            sub {$_ = min(int($_), 200)}

    Or your cleanup routine could use regular expressions to do
    capitalizations or other regularizations of data:

            sub Capitalize {/\b([a-z])([a-z]+)\b)/\U$1\E$2/g}

            $t->clean(\ &Capitalize , ['FirstName', 'LastName']);
            $t->clean(\ &PhoneFormat, ['Phone', 'Fax'         ]);
            $t->clean(\ &LegalUSZip,  ['HomeZip', 'WorkZip'   ]);

    ... and so on. Cleanups are easy to write and quick and easy to apply
    with Data::CTable. Do them early! Do them often!

  Hints for writing cleanup routines

    If your cleanup routine may be used to clean up fields that could be
    empty/undef and empty/undef is a legal value, it should not touch any
    undef values (unintentionally converting them to strings).

    Finally, instead of setting any values to the empty string, it should
    set them to undef instead. This includes any values it might have left
    empty during cleanup. (Using undef instead of empty string to represent
    empty values is one way that Data::CTable likes to save memory in tables
    that may have lots of those.)

    For an example of a well-behaved cleanup routine, consider the following
    implementation of the builtin CleanWhitespace behavior:

            sub CleanWhitespace
                return unless defined;    ## Empty/undef values stay that way
                s/ \s+$//sx;              ## Remove trailing whitespace
                s/^\s+ //sx;              ## Remove leading whitespace
                s/ +/ /g;                 ## Runs of spaces to single space
                $_ = undef unless length; ## (Newly?) empty strings to undef

  Roman character set mapping

    The character set mapping cleanup routines can be used to convert
    upper-ASCII characters bidirectionally between two popular Roman
    Character sets -- Mac Roman 1 and ISO 8859-1 (also sometimes called ISO
    Latin 1) -- i.e. the Western European Roman character sets.

    By default, read() converts all incoming data fields in data files with
    Mac line endings to ISO format when reading in. Conversely, write() does
    the reverse mapping (ISO to Mac) when writing a file with Mac line

    However, you may wish to turn off these default behaviors and instead
    apply the mappings manually, possibly just to certain fields.

    For example, if a table contains fields with non-Roman character sets,
    you would definitely not want to apply these mappings, and instead might
    want to apply some different ones that you create yourself.

  Utility routines for character mapping

    This module can optionally export four utility routines for mapping
    character Latin 1 character sets. Always be sure to map the correct
    direction -- otherwise you'll end up with garbage! Be careful to only
    pass Western Roman strings -- not double-byte strings or strings encoded
    in any single-byte Eastern European Roman or non-Roman character set.

            &ISORoman8859_1ToMacRoman(\ $Str)  ## Pass pointer to buffer
            &MacRomanToISORoman8859_1(\ $Str)  ## Pass pointer to buffer

    These routines translate characters whose values are 128-255 from one
    Western Roman encoding to another. The argument is a string buffer of
    any size passed by reference.

    The functions return a count of the number of characters that were
    mapped (zero or undef if none were).

            &ISORoman8859_1ToMacRoman_clean()  ## Operates on $_
            &MacRomanToISORoman8859_1_clean()  ## Operates on $_

    These routines are variants of the above, but they're versions that are
    compatible with clean() -- they operate on $_ and will take care to
    leave undefined values undefined. They do not have return values.

  More advanced cleaning and validation

    Unfortunately, clean() only lets you operate on a single field value at
    a time -- and there's no way to know the record number or other useful
    information inside the cleaning routine.

    For really powerful cleaning and validation involving access to all
    fields of a record as well as record numbers, see the discussion of the
    calc() method and other methods for doing complex field calculations in
    the next section.

            ## Calculate a new field's values based on two others

            $t->calc($Sub)              ## Run $Sub for each row, with 
                                        ##  fields bound to local vars

            $t->calc($Sub,  $Sel)           ## Use these row nums
            $t->calc($Sub, undef, $Fields)  ## Use only these fields
            $t->calc($Sub,  $Sel, $Fields)  ## Use custom rows, fields

            my $Col = $t->calc($Sub)    ## Gather return vals in vector

            ## Example 1: Overwrite values in an existing column.

            $t->calc(sub{no strict 'vars'; $Size = (stat($Path))[7]});

            ## Example 2: Create empty column; fill fields 1 by 1

            $t->calc(sub{no strict 'vars'; $PersonID = "$Last$First"});

            ## Example 3: Calculate values; put into to table if desired

            $PersonID = $t->calc(sub{no strict 'vars'; "$Last$First"});
            $t->sel('PersonID', $PersonID);

            ## Example 4: Using fully-qualified variable names

            $t->calc(sub{$main::PersonID = "$main::Last$main::First"});

    calc() runs your custom calculation subroutine $Sub once for every row
    in the current selection() or other list of row numbers that you specify
    in the optional $Sel argument.

    This lets you apply a complex calculation to every record in a table in
    a single statement, storing the results in one or more columns, or
    retrieving them as a list.

    Your custom subroutine may refer to the value in any field in the
    current row by using a global variable with the field's name:

    For example, if the table has fields First, Last, and Age, then $Sub may
    use, modify, and set the variables $First, $Last, $Age. (Also known as
    $main::First, $main::Last, $main::Age).

    Modifying any of these specially-bound variables actually modifies the
    data in the correct record and field within the table.

    By default, the fields available to $Sub are all fields in the table.
    calc() must bind all the field names for you for each row, which can be
    time-consuming for tables with very large numbers of fields.

    You can speed up the operation of calc() by listing only the fields your
    $Sub needs in the optional parameter $Fields. Any field names you don't
    mention won't be available to $Sub. Conversely, calc() will run faster
    because it can bind only the fields you actually need.

    If you include non-existent fields in your custom $Fields list, calc()
    creates them for you before $Sub runs the first time. Then your $Sub can
    store field values into the new column, referring to it by name.

    Variables in $Sub are in the "main" package. So you should set $Sub to
    use pacakge "main" in case the rest of your code is not in "main".

    Similarly, if you "use strict", Perl will complain about global
    variables in $Sub. So you may need to assert "no strict 'vars'".

            {   package Foo; use strict;
                $t = ...;
                $t->calc(sub {package main; no strict 'vars'; 
                              $Age = int($Age)});

            ## Or this:

            {   package Foo;  use strict;

                $t = ...;
                {   package main; no strict 'vars';
                    my $Sub = sub {$Age = int($Age)};

    You may be able to get around both problems more easily by prefixing
    each variable reference in $Sub with "main::". This takes care of the
    package name issue and bypasses "use strict" at the same time, at the
    slight cost of making the calculation itself a bit harder to read.

            $t->calc(sub {$main::Age = int($main::Age)}); ## OK in any package

    In addition to the field names, the following three values are defined
    during each invocation of $Sub:

            $_r ($main::_r) -- the row number in the entire table
            $_s ($main::_s) -- the item number in selection or $Recs
            $_t ($main::_t) -- the table object itself

    You could use these values to print diagnostic information or to access
    any of the data, parameters, or methods of the table itself from within
    $Sub. Or you could even calculate field values using $_r or $_s.

    For example, after searching & sorting, you could make a field which
    preserves the resulting sort order for future reference:

            $t->col('Ranking');      ## Create the empty column first.
            $t->calc(sub{$main::Ranking = $main::_s});

    This last example is equivalent to:

            $t->sel(Ranking => [0 .. $#{$t->selection()}]); ## See sel() below

    calc() (see previous secion) is the briefer, more elegant way to do
    batch calculations on entire columns in a table, but it can be slightly
    slower than doing the calculations yourself.

    If you have extremely large tables, and you notice the processing time
    for your calculations taking more than a second, you might want to
    rewrite your calculations to use the more efficient techniques shown

    You will often need to create new calculated columns based on one or
    more existing ones, and then either insert the columns back in the
    tables or use them for further calculations or indexing.

    Examples 1a and 1b create a new field 'NameOK' containing either the
    string "OK" or undef (empty) depending on whether the field 'Name' is
    empty. Just use map() to iterate over the existing values in the other
    column, binding $_ to each value in turn.

            ### Example 1a: Calculation based on one other column

            ## Retrieve column
            my $Name    = $t->col('Name');
            ## Make new column
            my $NameOK  = [map {!!length && 'OK'} @$Name];

            ## Insert column back into table:
            $t->col(NameOK => $NameOK);

            ### Example 1b: Same calculation, in a single statement:

            $t->col(NameOK => [map {!!length && 'OK'} @{$t->col('Name')}]);

    In order to iterate over MULTIPLE columns at once, you need a list of
    the row numbers generated by $t->all() so you can index the two columns
    in tandem. Then, you use map to bind $_ to each row number, and then use
    the expression $t->col($ColName)->[$_] to retreive each value.

    Examples 2a and 2b demonstrate this method. They create a new field
    'FullName' which is a string joining the values in the 'First' and
    'Last' columns with a space between.

            ### Example 2a: Calculation based on multiple columns

            ## Retrieve columns
            my $First = $t->col('First');  
            my $Last  = $t->col('Last' );

            ## Retreive row nums
            my $Nums  = $t->all();

            ## Calculate a new column based on two others
            my $Full  = [map {"$First->[$_] $Last->[$_]"} @$Nums];

            ## Add new column to the table
            $t->col(FullName => $Full);

            ### Example 2b: Same calculation, in a single statement:

            $t->col(FullName => 
                    [map {"$t->col('First')->[$_] t->col('Last')->[$_]"} 

    In examples 1 and 2, you create entirely new columns and then add or
    replace them in the table.

    Using the approach in Examples 3a and 3b, you can assign calculated
    results directly into each value of an existing column as you go.

            ## Example 3a: Calculate by assigning directly into fields...

            my $A = $t->col->('A'); ## This column will be modified
            my $B = $t->col->('B');
            my $C = $t->col->('C');

            foreach @($t->all()) {$A->[$_] = $B->[$_] + $C->[$_];}

            ## Example 3b: Same calculation, in a single statement:

            foreach @($t->all()) {($t->col('A')->[$_] =
                                   $t->col('B')->[$_] + 

    Before writing your code, think about which calculation paradigms best
    suit your needs and your data set.

    Just as Perl Hackers know: There's More Than One Way To Do It!

            ## Make indexes of columns or just selected data in columns

            my $Index1 = $t->index_all($Key);  ## entire column
            my $Index2 = $t->index_sel($Key);  ## selected data only

            ## Make hashes of 2 columns or just selected data in columns

            my $Index1 = $t->hash_all($KeyFld, $ValFld);  ## entire column
            my $Index2 = $t->hash_sel($KeyFld, $ValFld);  ## selected data

    index_all() creates and returns a hash (reference) that maps keys found
    in the column called $Key to corresponding record numbers.

    Ideally, values in $Key would be unique (that's up to you).

    If any values in $Key are NOT unique, then later values (higher record
    numbers) will be ignored.

    index_sel() creates and returns a hash (ref) that maps keys found in the
    SELECTED RECORDS of column $Key to corresponding record numbers.

    Any keys in unselected records are ignored. Otherwise, the behavior is
    equivalent to index_all().

    hash_all() and hash_sel() are similar, except they create and return
    hashes whose keys are taken from column $KeyFld, but whose values are
    from $ValFld in the corresponding records.

    So, for example, imagine you have a tab-delimited file on disk with just
    a single tab per line (2 fields) and no header row. The entries on the
    left side of the tab on each line are keys and the right side are
    values. You could convert that file into a hash in memory like this:

            my $t = Data::CTable->new({_HeaderRow=>0, _FieldList=>[qw(F1 F2)]}, 

            my $DeptLookup = $t->hash_all(qw(F1 F2));

  Reverse indexes

    If you'd like an index mapping record number to key, just get
    $t->col($Key). That's what the data columns in Data::CTable are.

            ## Getting or setting rows / records

            $t->row($Num)             ## Get a row or make empty one.

            $t->row($Num, $HashRef)   ## Set all of a row all at once.
            $t->row_set($Num, $HashRef) 

            $t->row_set($Num, undef)  ## Delete a row completely
            $t->row_delete($Beg, $End)## Delete a range of rows      

            $t->row_move($Old, $New)   ## Move a row to before $New

            $t->row_empty()            ## An empty hash
            $t->row_exists($Num)       ## True if $Num < $t->length()

            $t->rows($RowList)         ## Get list of multiple row nums

            $t->row_list($Num)          ## Get row vals as a list
            $t->row_list($Num, $Fields) ## Get row vals: specified fields

            $t->row_list_set($Num, undef, $Vals)   ## Set row vals as a list
            $t->row_list_set($Num, $Fields, $Vals) ## Set row vals as a list
            $t->row_list_set($Num, $Fields)        ## Set vals to empty/undef

    Usually, when working with Data::CTable objects, you are operating on
    entire columns or tables at a time (after all: any transformation you do
    on one record you almost always want to do on all records or all
    selected ones).

    You should very rarely need to access data by retrieving rows or setting
    rows, moving them around individually, and so on. (It's much cleaner,
    and much more efficient to manipulate the selection() (list of selected
    row numbers) instead -- just delete a row number from the selection, for
    example, and then for most operations it's almost as if the row is gone
    from the table, except the data is really still there.)

    However, if on rare occasions you really do need direct row operations,
    you're reading the right section.

    A row is generally accessed as a hash. The hash you provide or get back
    is not saved by the object in any way. Data values are always copied in
    or out of it, so you always "own" the hash.

    Rows are specified by $Num -- the row number with in the unsorted
    columns (the raw data in the table). These numbers are just array
    indices into the data columns, and so their legal range is:

            [0 .. ($t->length() - 1)]     ##   (Zero-based row numbering.)

    The row hash (or "record") has keys that are field names and values that
    are copies of the scalar values stored in the data columns within the

    row() always copies only the fields in fieldlist(), except for
    row_list() which allows you to specify an optional $Fields parameter
    which can override the current fieldlist().

    If the fieldlist happens to be a subset of all fields, but you really
    want to get all fields in your record, then call fieldlist_set(0) first
    to permanently or temporarily delete it.

    row() and row_get() always return a hash.

    row($Num, $Hash), row_set() take a hash and set just the fields you
    specify in the hash (in the given row of course). Any non-existent field
    names in the hash are created, so be careful.

    In fact, in general with either getting or setting rows, any
    non-existent fields mentioned will be created for you (by internally
    calling col()). So you could build a whole table of 100 rows by starting
    with an empty, new, table and setting row 99 from a hash that gives the
    field names.

    Setting a row number higher than any existing row number with row(),
    row_set() or row_force() will automatically set the new length of the
    entire table to match (extending all the columns with empty rows as

    table auto-extend by one with each row you set. This is slow and gets
    rapidly slower if there's lots of data because the arrays holding the
    data columns will keep getting reallocated on every insert. Instead,
    first pre-extend the table to your highest row number by calling
    length($Len), and then set your rows. Or easier: if convenient just set
    your rows starting with the highest-numbered one first. If you don't
    know how many you'll have, guess or estimate and pre-extend to the
    estimated number and then cut back later. This will be faster than
    extending all columns by one each time.

    row_delete() removes a row or range of rows completely from the table.
    Any rows above the deleted ones will move down and the table's length()
    will decrease. If the data columns are very large, this could be a bit
    slow because a lot of data could be moved around. The low and high row
    numbers will be limited for you to 0 and length() - 1, respectively.
    Null ranges are OK and are silently ignored. The range is inclusive, so
    to delete just row 99, call row_delete(99) or row_delete(99,99).

    EFFICIENCY NOTE: Don't call row_delete() to remove lots of individual
    rows. Instead, select those row numbers by setting the selection (if not
    already selected), and then invert the selection using
    selection_invert(), so the undesired rows are deselected, and then use
    the cull() method to rewrite the entire table at once. The deselected
    rows will be omitted very efficiently this way.

    row_move() moves a row from its $Old row number to the position before
    the row currently in row $New (specify $New = length() to move the row
    to the end). Again, in shuffling data in columns, lots of data could get
    moved around by this operation, so expect it to be slow. If as with
    row_delete(), if you will be doing several moves, consider building an
    appropriate selection() first, and then using cull() instead.

    Using row_delete() and row_move() to shift records around changes the
    record numbers of the affected records and many others in the table. The
    record numbers in the custom selection, if any, are updated to reflect
    these changes, so the records that were selected before will still be
    selected after the move (except those that were deleted of course). If
    you had a private copy of the selection, your copy will likely become
    outdated after these operations. You should get it again by calling

    row_empty() returns a hash whose keys are the entries in fieldlist() and
    whose values are undef. (You could use it to fill in values before
    calling row_set()). Note: in this class, row_empty() does exactly the
    same thing as fieldlist_hash() when the latter is called with no

    row_exists() returns true if "(($Num >= 0) && ($Num < $t->length()))".

    rows() calls row() for each row num in a list and returns a list of the
    resulting hashes.

    row_list() gets row values as a list instead of a hash. They appear in
    the order specified in fieldlist() unless you supply an optional $Fields
    parameter listing the fields you want to get.

    row_list_set() sets row values as a list instead of a hash. Pass your
    own $Fields list or undef and fieldlist() will be used. $Values should
    be a list with the same number of values as fields expected; any
    shortage will result in undef/empty values being set.

            ## Getting or setting table length

            $t->length()        ## Get length

            $t->length(22)      ## Set length (truncate or pre-extend)

            $t->extend()        ## Set length of all columns to match longest

    The length* methods assume the table already has columns of equal
    length. So the length of the table is the length of any field taken at
    random. We choose the first one in the field list.

    Setting the length will truncate or pre-extend every column in the table
    to a given length as required.

    (Pre-extending means setting each column's length via $# so that it has
    the correct number of entries already allocated (and filled with undef)
    so that operations that fill up the table can be done much more quickly
    than with push().

    However, if a new column has been added directly, or a table has been
    constructed out of columns whose length may not initially match, the
    extend() method may be (should be) called to inspect all columns and
    extend them all to match the longest one. Note that extend() operates on
    all fields in the object, ignoring the custom _FieldList if any.

    The length of a table with no columns is zero.

            ## Getting or setting the custom selection list itself (_Selection)

            $t->selection()               ## Get sel if any; else all()

            $t->selection($List)          ## Set sel (list of rec nums)
            $t->selection(0)              ## Remove sel (select all)

            $t->selection_inverse()       ## Get inverse copy of selection
            $t->select_inverse()          ## Invert the selection

            $t->selection_validate()      ## Remove invalid #s from sel

            ## List of all rec nums present (regardless of selection)


            ## Getting or setting just selected fields in columns
            ## (as contrasted with col() and friends).

            $t->sel($ColName)             ## Get col but only records in sel

            $t->sel($ColName, $ListRef)   ## Set selected fields in col...
            $t->sel_set($ColName, $ListRef) ##... in selection order

            $t->sel_set($ColName)         ## Set selected fields to undef

            $t->sels($ColList)            ## Like cols, but selected fields
            $t->sels_hash($ColList)       ## " "   cols_hash()... " " " "

            ## Finding out size of selection (number of rows)

            $t->sel_len()                 ## Get number of selected rows.

    A selection is an ordered list of record numbers. The record numbers in
    the selection may be a subset of available records. Furthermore, they
    may be in non-record-number order, indicating that the records have been

    Record numbers are numeric array indices into the columns in the table.
    It is an error for any selection list to contain an index less than zero
    or greater than (length() - 1), so if you set a selection explicitly, be

    Any selection list you get or set belongs to the object. Be careful of
    modifying its contents.

    The custom selection, if any, is stored internally in the _Selection
    parameter. If this parameter is absent, the selection defaults to all()
    -- i.e. a list of all record numbers, in order: [0..($this->length() -
    1)] (which becomes [] if length() is 0).

    REMEMBER: length() is one-based, but record numbers are zero-based.

    Removing the selection (that is, removing the LIST itself of which
    records are selected), is the same as selecting all records.
    consequently, selection(0), selection_delete(), and select_all() are all

    selection_validate() removes any entries from the current _Selection
    list (if any) that are not valid record numbers -- i.e. it removes any
    record whose integer value is < 0 or greater than length() - 1. This
    routine is mainly used by other methods that might delete records, such
    as length_set().

    Getting or setting just selected data from columns

    Sometimes, you don't want to get/set entire columns, you instead want to
    get or set data in just the selected fields in a column.

    The sel(), sel_get(), sel_set(), sels() and sels_hash() methods are
    analagous to the corresponding col(), ... cols_hash() methods except in
    these two ways:

    - the 'sels' variants get or set just selected data, as determined by
    the current selection(), which gives an ordered list of the selected /
    sorted records.

    - the 'sels' variants all make COPIES of the data you request or supply
    -- the data is copied out of or into the correspnding column. So, you
    "own" any vector you pass or receive in reply.

    So, for example, imagine you have just set selection() to only list
    record numbers where the LastName field is not empty. Then you have
    called sort() to sort those record numbers by the LastName field. You
    could then call $t->sel('LastName') to get a sorted list of all
    non-empty last names.

    It might be helpful to think of "sel" as short for "selected". So
    $t->sel('LastName') would mean "get the selected field values from the
    LastName field".

            ## Modifying the table's custom selection (_Selection)

            $t->select_all()      ## Set _Selection = $t->all() or undef
            $t->select_none()     ## Set _Selection = []
            $t->select_inverse()  ## Invert the curr. sel. (and get it) 

            ## Specific searches: "the select() methods"

            $t->select($Field1=>$Sub1, ## Del nonmatching recs from sel.
                       $Field2=>$Sub2, ## i.e. narrow sel. to match

            $t->omit  ($Field1=>$Sub1, ## Del matching recs from sel.

            $t->add   ($Field1=>$Sub1, ## Add matching recs to sel.

            $t->but   ($Field1=>$Sub1, ## Add nonmatching recs to sel.

            ## Getting useful lists of record numbers...

            $t->all()                  ## Get "full" sel. (all record #s)
            $t->selection()            ## Get current selection
            $t->selection_inverse()    ## Get inverse copy of curr. sel.

            ## Example 1: Refine a selection by narrowing down...

            $t->select(Field1 => sub {$_});
            $t->select(Field2 => sub {$_});
            $t->select(Field3 => sub {$_});

            ## Example 2: Manually refine and set the selection...

            $Sel = [grep {$t->col($Field1)->[$_]} @{$t->all      ()}];
            $Sel = [grep {$t->col($Field2)->[$_]} @$Sel];
            $Sel = [grep {$t->col($Field3)->[$_]} @$Sel];
            $t->selection($Sel);    ## Set the selection when done.

            ## Example 3: Complex manual search using calculated value
            my $A = $t->col('A');
            my $B = $t->col('B');
            my $S = [grep 
                     {my $X = $A->[$_] + $B->[$_]; ($X > 100 && $X < 200);} 
                     @{$t->all()}]; ## Or could start with $t->selection().
            $t->selection($S);      ## Set the selection when done.

            ## Example 4: Refine a selection by building up...

            $t->add(Field1 => sub {$_});
            $t->add(Field2 => sub {$_});
            $t->add(Field3 => sub {$_});

            ## Example 5: Combining the select() methods to build a query...

            $t->select(Status  => sub {/prime/i   });
            $t->omit  (DueDate => sub {$_ > $Today});
            $t->add   (Force   => sub {$_         });

    select() and its friends omit(), add(), and but(), known collectively as
    "the select() methods," all work similarly: they take a series of one or
    more pairs indicating matches to be done, where each match is specified
    as (FieldName => Subroutine).

    In addition to the field names already present in the table, the
    FieldName in any Spec may also be one of these two special

        the record number of the record being compared

        the numerical position of the record being compared within the
        previous selection (only usable with select() and omit() since add()
        and but() by definition operate on non-selected records).

    For example:

            ## Match 2nd 100 rec numbers
            $t->select(_RecNum => sub {$_ >= 100 && $_ <= 199});

            ## Match 2nd 100 currently selected/sorted items
            $t->select(_SelNum => sub {$_ >= 100 && $_ <= 199});

    Be careful when using _SelNum in a search. In the above _SelNum search
    example, since the selection itself will be modified by select(), the
    items that were formerly selection items 100 - 199 will now be _SelNum 0
    - 99 in the new selection.

    The Subroutine is an anonymous grep-style predicate that operates on $_
    and should return true/false to indicate a match with an element of the
    field FieldName.

    The order of multiple matches in a single method call is significant
    only in that the searches can be faster if the field that will match the
    fewest records is listed first.

    A given FieldName may be listed in the specs more than once if it has
    multiple search criteria that you prefer to execute as multiple
    subroutines (though it would be more efficient on very large tables to
    combine their logic into one subroutine joined with "&&").

    Each field match will be applied (with an implied AND joining them) to
    determine whether the record itself matches. Then, based on whether the
    record itself matches, it will either be added or deleted from the
    selection based on which method is being called:

            method...  operates on...     action....
            select()   selected records   Keep only recs that DO     match
            omit()     selected records   Keep only recs that DO NOT match
            add()      non-selected recs  Add       recs that DO     match
            but()      non-selected recs  Add       recs that DO NOT match

    Here's how to think about what's going on:

            methods... think...
            select()   "SELECT things matching this"...
            omit()     "... then OMIT those matching this."

            select()   "SELECT things matching this"...
            add()      "... and ADD any others matching this."

            select()   "SELECT things matching this"...
            but()      "... and add any others BUT those matching this."

    select() and omit() both NARROW the selection.

    add() and but() both INCREASE the selection.

    IMPORTANT: You DO NOT need to use these select() routines to work with
    selections. It may be much easier for you to clarify your logic, or more
    efficient to express your search, using a single grep or series of grep
    operations as in Examples 2 or 3 above.

    Building the selection manually is required if you want to filter based
    on any COMPLEX RELATIONSHIPS BETWEEN FIELDS. For example, if you want to
    add two fields and match or reject the record based on the sum of the

    In Example 3 above, we add the values in fields "A" and "B" and then
    match the record only if the SUM is between 100 and 199. By grepping to
    produce a subset of @{$t->all()}, you end up with a Selection -- a list
    of record numbers you want "selected". Then you call $t->selection() to
    put the selection you built into the object.

    If you had instead wanted to narrow an existing selection in the above
    example, you would start with $t->selection() (which defaults to
    $t->all()) instead of starting with $t->all().

    Each of the select() methods returns $this->selection() as a

  The effects of modifying a sorted selection

    Generally, you should sort AFTER finding, and you should not generally
    rely on sort order after doing a find. But in case you want to know, the
    following chart explains what happens to the sort order after the
    various select() commands are called (at least in the current
    implementation, which may change without notice):

            method... effect on an existing sort order...
            select()  relative sort order is preserved (stay sorted)
            omit()    all selected recs restored to "natural" order (unsorted)
            add()     orig. recs preserved; new recs appended: "natural" order
            but()     orig. recs preserved; new recs appended: "natural" order

    In other words, you could sort() first and then call select() to narrow
    down the selection repeatedly without disrupting the sort order.
    However, any of the other methods will disrupt the sort order and you
    would need to re-sort. The preservation of order when using select(),
    and other sort order effects, are likely but not guaranteed to be
    preserved in future implementations.

  Hints about Boolean logic

    Consider the following example and the alternative below it. You might
    initially think these are equivalent, but they're not:

            ## Case 1:

            my $Sel = $t->add(Force  => sub {$_ == 1      });
            my $Sel = $t->add(Status => sub {$_ eq 'Prime'});

            ## Case 2:

            my $Sel = $t->add(Force  => sub {$_ == 1      },
                              Status => sub {$_ eq 'Prime'});

    Case 1 extends the selection by adding all records where Force == 1, and
    then extends it again by adding all additional records where Status eq

    Case 2 adds only those records where: Force == 1 AND ALSO, IN THE SAME
    RECORD, Status eq 'Prime'.

    One final note about logic. This is not SQL and these select() etc.
    routines are not meant to replace the full power of a programming

    If you want full Boolean expressions, use the power of Perl to form your
    own arbitrarily complex query using grep as in Example 3 above.

    Writing your own grep is also almost always faster than chaining the
    builtin select() methods or using multiple Field / Sub specifications,
    so keep that in mind when working with extremely large data sets.

    With tables of only a few thousand records or so, you probably won't
    notice the difference in efficiency.

            ## Sort the current table's _Selection

            $t->sort()                       ## Use existing/default params
            $t->sort([qw(Last First Phone)]) ## Specify _SortOrder (fields)
            $t->sort(                        ## Named-parameter call:
                     _SortOrder => [...],    ##  override sort-related params.
                     _Selection => [...],    ##  (See param lists above).
                     _SortSpecs => {...},
                     _SRoutines => {...},

    The sort() method modifies the _Selection (creating one with all records
    if it was missing, undef, or not supplied by caller) so that the record
    numbers listed there are sorted according to the criteria implied by
    _SortOrder, _SortSpecs, _SRoutines, etc.

    For example, before sorting, a table's "natural" order might be:

            Rec# First  Last Age State
            0    Chris  Zack 43  CA
            1    Marco  Bart 22  NV
            2    Pearl  Muth 15  HI

    ... and its selection() method would yield: [0, 1, 2] -- which is a list
    of all the records, in order.

    After calling $t->sort([Last]), selection() would yield [1, 2, 0]. So
    displaying the table in "selection" order would yield:

            Rec# First  Last Age State
            1    Marco  Bart 22  NV
            2    Pearl  Muth 15  HI
            0    Chris  Zack 43  CA

    IMPORTANT: sorting does not alter any data in the table. It merely
    alters the _Selection parameter (which you can then get and set using
    the selection() methods described above).

    If you want to permanently alter the table's data in memory so that the
    new sorted order becomes the "natural" order, you can use the cull()
    method to modify the original object, the snapshot() method to make a
    new object, or use the write() method to write the data to disk in
    selected/sorted order and then read() it back again.

  Using the Named-parameter calling convention with sort()

    You may specify any combination of the parameters listed above when
    calling sort(). Any you specify will be used IN PLACE OF the
    corresponding parameters already found in the object.

    If you specify _Selection using the named-parameter calling, the sort()
    method reserves the right to "own" the list you provide, and use it as
    the object's new _Selection, possibly discarding the previous
    _Selection, if any and modifying the one you provided. So don't make any
    assumptions about ownership of that list object after calling sort().
    Luckily, you will rarely need to provide _Selection explicitly since
    generally you'll want to be sorting the selection() already inherent in
    the object.

    sort() returns the _Selection list owned by the object (the same list
    that would be returned if you called the selection() method immediately
    after calling sort()).

    See the next sections for complete descriptions of _SortOrder and other
    sorting parameters.

            ## Getting / Setting table's default _SortOrder

            $t->sortorder()         ## Get sortorder (default is [])
            my $Order = [qw(Last First State Zip)];
            $t->sortorder($Order)   ## Set sortorder (use [] for none)

            $t->sortorder_default() ## Get the object's default sort order ([])

    The sort order is an optional list of field names on which to sort and
    sub-sort the data when sorting is requested. The field names must be the
    names of actual columns in the table. The names in the sort order do not
    necessarily need to coincide with the custom fieldlist if any.

    There is one special value that can be included: _RecNum. This sorts on
    the imaginary "record number" field. So for example, you could specify a
    sort order this way:

            [qw(Last First _RecNum)]

    (There is no point in putting _RecNum anywhere except at the end of the
    sort order because no two records will ever have the same record number
    so there will be no further need to disambiguate by referring to
    additional fields.)

    Sorting by _RecNum adds a bit of computational overhead because sort()
    first builds a record number vector for use in sorting, so for very
    large tables, don't do it unless you really need it.

    A sort order can be specified each time the object is sorted (see the
    sort() method for details).

    Or, the object's sort order can be set once, and then sort() will use
    that sort order when no other sort order is specified.

    If sorting is done when there is no sort order present in the object or
    specifed for the sort() method, the selection is sorted by record number
    (i.e. it is "unsorted" or returned to its "natural" order).

    In order words, a sortorder that is undef or [] is considered the same
    as: [qw(_RecNum)]. This is sometimes called "unsorting".

    In order to decide how values in each field should be compared, sort()
    is informed by SortSpecs (specifying SortType and SortDirection for each
    field) and by SortRoutines, each of which may similarly either be
    pre-set for the object or specified when calling sort() -- see below for
    further details.

            ## Getting / Setting table's default _SortSpecs

            $t->sortspecs()         ## Get sortspecs (default is {} -- none)

            my $Specs = {Last => {SortType       => 'String' , 
                                  SortDirection  => -1        },
                         Zip  => {SortType       => 'Integer' }};

            $t->sortspecs($Specs)   ## Set sortspecs

            $t->sortspecs_default() ## Get the object's default sort specs ({})

    The sortspecs are an optional hash mapping field names to "sort

    Each field's sort specification may specify zero or more of these

        the sort type to use (For example: String, Integer)

        the sort direction (1: ascending, -1: descending)

    Sortspecs can be specified when calling the sort() routine, or, a set of
    specs can be placed beforehand into the object itself and those will be
    used by sort() if no other specs are given.

    For any field listed in the sort order at the time of sorting, but
    lacking a sort spec or any component of the sort spec, the object's
    default sort type (see sorttype_default()) and default sort direction
    (see sortdirection_default()) will be used.

    In addition to getting/setting sort specs as a whole, they may be
    gotten/set on a per-field basis, too:

            sortspec($Field)       ## Get sortspec for $Field or default spec

            my $Spec   = {SortType => 'Integer', SortDirection => -1};
            sortspec('Zip', $Spec) ## Set sortspec

            sortspec_default()     ## Get a sortspec with all defaults filled in

    For any $Field not found in the object's sortspecs, sortspec($Field)
    returns the same thing returned by sortspec_default(), which is a
    sortspec filled in with the default sort type and sort direction (see

    For a list of available built-in SortTypes, and instructions for how to
    define your own, see SORT ROUTINES, below.

            ## Getting / Setting table's default _DefaultSortDirection

            $t->sortdirection_default()     ## Get default sort direction

            $t->sortdirection_default(-1)   ## Set default sort direction

    Each element in a sort specification can optionally specify a sort

    1 = ascending, -1 = descending

    For any sort specs that don't specify a direction, the object's default
    sort direction will be used. Use these routines to get/set the default
    sort direction.

            ## Getting / Setting table's default _DefaultSortType

            $t->sorttype_default()          ## Get default sort type

            $t->sorttype_default('Integer') ## Set default sort type

    Each element in a sort specification can optionally specify a sort type.
    The sort type is a string (like 'String' or 'Integer' or 'Date') that
    selects from one or more sort routines. (See Sort Routines, below).

    There are several sort routines built into the CTable object, and you
    can also add as many of your own routines (and hence Sort Types) as you
    like or need. This allows for very flexible sorting.

    For any sort specs that don't specify a type, the object's default sort
    type will be used. Use these routines to get/set the default sort type,
    which initially is 'String'.

            ## Getting / Setting table's custom sort routines (_SRoutines)

            $t->sortroutine($Type)       ## Get a sort routine for $Type

            $t->sortroutine($Type, $Sub) ## Set a sort routine for $Type

            $t->sortroutine($Type, 0   ) ## Remove sort routine for $Type
            $t->sortroutines()           ## Get hash of any sort routines
            $t->sortroutines_builtin()   ## Get hash of builtin routines

    Each SortType in the sortspecs should have a corresponding sort routine
    (any unrecognized type will be sorted using the 'String' sort routine).

    The sort() command looks up the appropriate sort routine for each field
    it is asked to sort, based on the SortType for that field, as given in
    the sortspecs, as described above.

    Builtin sort types, recognized and implemented natively by this module,

            String   ## Fastest case-sensitive compare (data is string)
            Text     ## Insensitive compare (lowercases, then compares)
            Number   ## Number works for floats or integers
            Integer  ## Faster than floats.  Uses "use integer"
            DateSecs ## Same as integer; assumes date in seconds
            Boolean  ## Treats item as a Perlish boolean (empty/undef = false)

    The above sort types are always recognized. Additional sort types may be
    added by subclasses (and could shadow the builtin implementations of the
    above types if desired) and/or may be added to instances (and again
    could shadow the above implementations), and/or may be specified when
    the sort() method is called, once again optionally shadowing any deeper

    A custom sort routine is called with two arguments, each of which is a
    pointer to a scalar. The sort routine should dereference each pointer
    and compare the resulting scalars, returning -1 if the first scalar is
    smaller than the second, 1 if it is larger, and 0 if they are considered

    For example, here is the built-in comparison routine for 'String':

            sub {   $ {$_[0]}  cmp    $ {$_[1]} }

    NOTE: Your custom sort routines should NOT compare $a and $b as with
    Perl's builtin sort() command.

    Examine the variable $BuiltinSortRoutines in this module's
    implementation to see some additional examples of sort routines.

    Internally, sort() calls the sortroutines() method to get a hash that
    should consist of all builtin sort routines with the per-object sort
    routines, if any, overlaid. sortroutines() in turn calls the
    sortroutines_builtin() method to get a copy of the hash of all builtin
    sort routines for the object. (So a subclass could easily add additional
    SortTypes or reimplement them by just overriding sortroutines_builtin()
    and adding its own additional routines to the resulting hash.)

    sortroutine() may be called to get or set a custom sort routine for a
    given type in the given object.

    There is no way to directly manipulate the builtin sort routines for the
    entire class. To accomplish that, you should define and use a subclass
    that extends sortroutines_builtin() to add its own routines.

    For example:

            {   ## A subclass of Data::CTable with an INetAddr SortType.
                package IATable;    use vars qw(@ISA);    @ISA = qw(Data::CTable);

                sub sortroutines_builtin
                    my $this = shift;
                    my $CustomRoutines = 
                    {INetAddr => 
                     sub {use integer; ip2int($ {$_[0]}) <=> ip2int($ {$_[1]})}};
                    my $AllRoutines = 
                    {%{$this->SUPER::sortroutines_builtin()} %$CustomRoutines};

                sub ip2int {.....}  $# Could memoize & inline for efficiency

            my $Table = IATable::new(......);

    The IATable class would then have all the same features of Data::CTable
    but would then also support the INetAddr SortType.

            ## Freeze data layout: re-order columns; omit unused fields
                    $t->cull(...params...)     ## Rebuild table in order
            my $s = $t->snapshot(...params...) ## Make copy as if rebuilt

    The cull() method re-writes all data in the table to be in the order
    indicated in _Selection (if present). This will cause any records not
    listed in _Selection to be omitted (unless selection is null in which
    case all records are retained in original order).

    In addition, if there is a custom field list present, it removes any
    fields NOT mentioned in _FieldList.

    The snapshot() method is similar, except instead of modifying the object
    itself, it makes a copy of the object that's equivalent to what cull()
    would have created, and returns that new object, leaving the original
    untouched. (All data structures are deep-copied from the old object to
    the new one, leaving the objects totally independent.)

    cull() and snapshot() both take two optional named parameters:
    _FieldList and/or _Selection to be used in place of the corresponding
    parameters found in the object.

    If only a single argument is supplied, it is assumed to be _Selection.

            ## Get current value

            $t->lineending()          ## Get actual setting: string or symbol
            $t->lineending_symbol()   ## Get setting's symbolic value if possible
            $t->lineending_string()   ## Get setting's string value if possible

            ## Set value

            $t->lineending($Ending)   ## Will be converted internally to symbol

            ## Convert a value to symbol or string form

            $t->lineending_symbol($L) ## Convert string form to symbolic form
            $t->lineending_string($L) ## Convert symbol form to string form

            ## Get internal conversion hash tables

            $t->lineending_symbols()  ## Hash ref mapping known strings to symbols
            $t->lineending_strings()  ## Hash ref mapping known symbols to strings

    Use these accessor functions to get/set the _LineEnding parameter.

    You can set the parameter in either string or symbol form as you wish.
    You can get it in its raw, as-stored, form, or, you can get it in string
    form or symbol form as desired.

    Finally, some utility conversion calls allows you to convert a string
    you have on hand to a symbolic form. For example:

            $L = "\x0D";
            print ("This file uses " . $t->lineending_symbol($L) . " endings.");

    This would print:

            This file uses mac endings.

    By default, Data::CTable makes cached versions of files it reads so it
    can read them much more quickly the next time. Optionally, it can also
    cache any file it writes for quicker re-reading later.

    On Unix systems, cache files are always created with 0666 (world-write)
    permissions for easy cleanup.

    When reading files, Data::CTable checks the _CacheOnRead parameter. If
    that parameter is true, which it is by default, the module tries to find
    an up-to-date cache file to read instead of the original. Reading a
    cache file can be 10x faster than reading and parsing the original text

    In order to look for the cache file, it must first calculate the path
    where the cache file should be located, based on the _FileName of the
    file to be read.

    The path of the cache file is calculated as follows:

    If the _CacheSubDir parameter is a RELATIVE PATH, then it is appended to
    the directory component of _FileName to arrive at the directory to use
    to store the cache file. If it is an ABSOLUTE PATH, then _CacheSubDir is
    used by itself. (The trailing path separator is optional and an
    appropriate one will be added by Data::CTable if it is missing.)

    The file name of the cache file is calculated as follows:

    If the _CacheExtension parameter is specified, it is appended to the
    base file name component from the _FileName parameter. If you want the
    cached file name to be the same as the name of the original file, you
    can set _CacheExtension to "", which is not recommended.

    Then, the cache path and cache file name are joined to arrive at the
    name of the cache file. If both _CacheSubDir and _CacheExtension were
    empty, then the cache file path will be the same as the _FileName, and
    Data::CTable will refuse to either read or write a cache file, so
    setting these fields both to empty is equivalent to setting _CacheOnRead
    to false.

    The cache file contains a highly-efficient representation of all the
    following data that would otherwise have to be determined by reading and
    parsing the entire text file:

            - All the data columns (field values)
            - _FieldList:  The list of fields, in order
            - _HeaderRow:  Whether a header row is / should be present
            - _LineEnding: The line ending setting
            - _FDelimiter: The field delimiter setting

    If found prior to a read(), AND, the date of the cache file is LATER
    than the date of the original file, the cache file is used instead. (If
    the date is EARLIER, then the cache file is ignored because it can be
    presumed that the data inside the text file is newer.)

    If cacheing is ON, then after successfully reading the text file (either
    because there was no cache file yet or the cache file was out of date or
    corrupted or otherwise unusable), read() will then try to create a cache
    file. This, of course, takes some time, but the time taken will be more
    than made up in the speedup of the next read() operation on the same

    If creating the cache file fails (for example, because file permissions
    didn't allow the cache directory to be created or the cache file to be
    written), read() generates a warning explaining why cacheing failed, but
    the read() operation itself still succeeds.

    No parameters in the object itself are set or modified to indicate the
    success or failure of writing the cache file.

    Similarly, there is no way to tell whether a successful read() operation
    read from the cache or from the original data file. If you want to be
    SURE the reading was from the data file, either turn off _CacheOnRead,
    or call the read_file() method instead of read().

    NOTE: because the name of the cache file to be used is calculated just
    before the read() is actually done, the cache file can only be found if
    the _CacheSubDir and _CacheExtension are the same as they were when the
    cache was last created. If you change these parameters after having
    previously cached a file, the older caches could be "orphaned" and just
    sit around wasting disk space.

  Cacheing on write()

    You may optionally set _CacheOnWrite (default = false) to true. If done,
    then a cache file will be saved for files written using the write()
    command. Read about write() below for more about why you might want to
    do this.

    When Data::CTable needs to write a file, (a cache file or a data file),
    it automatically tries to create any directories or subdirectories you
    specify in the _FileName or _CacheSubDir parameters.

    If it fails while writing a data file, write() will fail (and you will
    be warned). If it fails to create a directory while writing a cache
    file, a warning will be issued, but the overall read() or write()
    operation will still return a result indicating success.

    Any directories created will have the permissions 0777 (world-write) for
    easy cleanup.

    Generally, the only directory the module will have to create is a
    subdirectory to hold cache files.

    However, since other directories could be created, be sure to exercise
    caution when allowing the module to create any directories for you on
    any system where security might be an issue.

    Also, if the 0666 permissions on the cache files themselves are too
    liberal, you can either 1) turn off cacheing, or 2) call the
    prep_cache_file() method to get the name of the cache file that would
    have been written, if any, and then restrict its permissions:

            chmod (0600, $this->prep_cache_file());

            ## Replacing data in table with data read from a file

            $t->read($Path)     ## Simple calling convention

            $t->read(           ## Named-parameter convention

                 ## Params that override params in the object if supplied...

                 _FileName      => $Path, ## Full or partial path of file to read

                 _FieldList     => [...], ## Fields to read; others to be discarded

                 _HeaderRow     => 0,     ## No header row (_FieldList required!)

                 _LineEnding    => undef, ## Text line ending (undef means guess)
                 _FDelimiter    => undef, ## Field delimiter (undef means guess)

                 _ReturnMap     => 1,     ## Whether to decode internal returns
                 _ReturnEncoding=>"\x0B", ## How to decode returns.
                 _MacRomanMap   => undef, ## Whether/when to read Mac char set 

                 _CacheOnRead   => 0,     ## Enable/disable cacheing behavior
                 _CacheExtension=> ".x",  ## Extension to add to cache file name
                 _CacheSubDir   => "",    ## (Sub-)dir, if any, for cache files

                 ## Params specific to the read()/write() methods...

                 _MaxRecords    => 200,   ## Limit on how many records to read

            $t->read_file()     ## Internal: same as read(); ignores cacheing

    read() opens a Merge, CSV, or Tab-delimited file and reads in all or
    some fields, and all or some records, REPLACING ANY EXISTING DATA in the
    CTable object.

    Using the simple calling convention, just pass it a file name. All other
    parameters will come from the object (or will be defaulted if absent).
    To specify additional parameters or override any parameters in the
    object while reading, use the named-parameter calling convention.

    See the full PARAMETER LIST, above, or read on for some extra details:

    _ReturnMap controls whether return characters encoded as ASCII 11 should
    be mapped back to real newlines (""\n"") when read into memory. If
    false, they are left as ASCII 11 characters. (default is "true")

    _ReturnEncoding controls the character that returns are encoded as, if
    different from ASCII 11.

    _FieldList is an array (reference) listing the names of fields to
    import, in order (and will become the object's _FieldList upon
    successful completion of the read() operation). If not provided and not
    found in the object, or empty, then all fields found in the file are
    imported and the object's field list will be set from those found in the
    file, in the order found there. If _HeaderRow is false, then this
    parameter is required (either in the object or as a formal parameter)
    and is assumed to give the correct names for the fields as they actually
    occur in the file. If _HeaderRow is true and _FieldList is provided,
    then _FieldList specifies the (sub-)set of fields to be read from the
    file and others will be ignored.

    _HeaderRow, which defaults to true, if set to false, tells read() to not
    expect a header row showing the field names in the file. Instead, it
    assumes that the _FieldList gives those (and _FieldList must therefore
    be specified either as a parameter or an existing parameter in the

    _MaxRecords (optional) is an upper limit on the number of fields to
    import. If not specified, or zero, or undef, then there is no limit; all
    records will be imported or memory will be exhausted.

    read() returns a Boolean "success" code.

    If read() returns false, then it will also have set the _ErrorMsg
    parameter in the object. It may or may not have partially altered data
    in the object if an error is encountered.

    After a successful read:

    fieldlist() (the object's _FieldList parameter) tells which fields were
    actually read, in what order. It may omit any fields requested in
    _FieldList that were not actually found in the file for whatever reason.

    length() tells how many fields were read.

    The selection() is reset to no selection (all selected / unsorted)

    The object's _FileName parameter contains the path to the file that was
    read. If the _FileName you specified did not have a path, then _FileName
    will be prepended with a path component indicating "current directory"
    (e.g. "./" on Unix).

    _FDelimiter will contain the actual delimiter character that was used to
    read the file (either tab or comma if the delimiter was guessed, or
    whatever delimiter you specified).

    _LineEnding will contain the actual line-ending setting used to read the
    file. This will be either "mac" ("\x0D"), "unix" ("\x0D"), or "dos"
    ("\x0D\x0A") if the line endings were guessed by read(). Otherwise it
    will be whatever _LineEnding you specified.

    As mentioned, read() allows the following flexibilities in reading
    text-based tabular data files:

    You may specify the line endings (record delimiters), or it can guess
    them (mac, unix, dos are supported).

    You may specify the field delimiters, or it can guess them (tab and
    comma are supported).

    It can get field names from a header row, or, if there is no header row,
    you can tell it the field names, in order.

    You can tell it whether or not to decode embedded returns in data
    fields, and if so, which character they were encoded as.

    Beyond supporting the above flexible options, read() makes the following
    non-flexible assumptions:

    Fields must NOT contain unencoded returns -- that is: whatever character
    sequence is specified for _LineEnding will NEVER occur inside a field in
    the text file; in addition, the current platform's definition of ""\n""
    will NEVER occur; these characters if present in field data, MUST have
    been encoded to some safe character string before the file was created.

    Each field may OPTIONALLY be surrounded with double-quote marks.
    However, if the field data itself contains either a double-quote
    character (""") or the current file's field delimiter (such as tab or
    comma), then the field MUST be surrounded with double-quotes.
    (Currently, all data written by Data::CTable have all field values
    surrounded by double-quotes, but a more selective policy may be used in
    the future.)

    If a field contains a double-quote character, then each double-quote
    character in the field must be encoded as """" -- i.e. each """ in the
    original data becomes """" in the text file.

    Data files may not mix line-ending types or field delimiter types. Once
    determined, the same endings and delimiters will be used to read the
    entire file.

    The fields recognized on each line will either be determined by the
    header row or the _FieldList provided by the caller. Any extra fields on
    any given line will be ignored. Any missing fields will be treated as

    If you are having trouble reading a delimited text file, check that all
    data in the file obeys these assumptions.

            ## Writing some or all data from table into a data file

            $t->write($Path)              ## Simple calling convention

            $t->write(                    ## Named-parameter convention

                 ## Params that override params in the object if supplied...

                 _FileName      => $Path, ## "Base path"; see _WriteExtension

                 _WriteExtension=> ".out",## Insert/append extension to _FileName

                 _FieldList     => [...], ## Fields to write; others ignored
                 _Selection     => [...], ## Record (#s) to write; others ignored

                 _HeaderRow     => 0,     ## Include header row in file

                 _LineEnding    => undef, ## Record delimiter (default is "\n")
                 _FDelimiter    => undef, ## Field delimiter (default is comma)

                 _ReturnMap     => 1,     ## Whether to encode internal returns
                 _ReturnEncoding=>"\x0B", ## How to encode returns
                 _MacRomanMap   => undef, ## Whether/when to write Mac char set 

                 _CacheOnWrite  => 1,     ## Enable saving cache after write()
                 _CacheExtension=> ".x",  ## Extension to add to cache file name
                 _CacheSubDir   => "",    ## (Sub-)dir, if any, for cache files

                 ## Params specific to the read()/write() methods...

                 _MaxRecords    => 200,   ## Limit on how many records to write

            $t->write_file()    ## Internal: same as write(); ignores cacheing

    write() writes a Merge, CSV, or Tab-delimited file.

    It uses parameters as described above. Any parameters not supplied will
    be gotten from the object.

    Using the simple calling convention, just pass it a path which will
    override the _FileName parameter in the object, if any.

    All other parameters will come from the object (or will be defaulted if

    If no _FileName or path is specified, or it is the special string "-"
    (dash), then the file handle \ * STDOUT will be used by default (and you
    could redirect it to a file). You can supply any open file handle or
    IO::File object of your own for the _FileName parameter.

    If write() is writing to a file handle by default or because you
    specified one, then no write-cacheing will occur.

    To specify additional parameters or override any parameters in the
    object while reading, use the named-parameter calling convention.

    If the object's data was previously filled in using new() or read(),
    then the file format parameters from the previous read() method will
    still be in the object, so the format of the written file will
    correspond as much as possible to the file that was read().

    write() returns the path name of the file actually written, or the empty
    string if a supplied file handle or STDOUT was written to, or undef if
    there was a failure.

    If write() returns undef, then it will also have set the _ErrorMsg
    parameter in the object.

    write() never modifies any data in the object itself.

    Consequently, if you specify a _FieldList or a _Selection, only those
    fields or records will be written, but the corresponding parameters in
    the object itself will be left untouched.

  How write() calculates the Path

    The _FileName parameter is shared with the read() method. This parameter
    is set by read() and may be overridden when calling write().

    In the base implementation of Data::CTable, write() will try not to
    overwrite the same file that was read, which could possibly cause data

    To avoid this, it does not use the _FileName parameter directly.
    Instead, it starts with _FileName and inserts or appends the value of
    the _WriteExtension parameter (which defaults to ".out") into the file
    name before writing.

    If the _FileName already has an extension at the end, write() will place
    the _WriteExtension BEFORE the final extension; otherwise the
    _WriteExtension will be placed at the end of the _FileName.

    For example:

            Foobar.txt        ==>  Foobar.out.txt
            Foobar.merge.txt  ==>  Foobar.merge.out.txt
            My_Merge_Data     ==>  My_Merge_Data.out

    If you DON'T want write() to add a _WriteExtension to _FileName before
    it writes the file, then you must set _WriteExtension to empty/undef
    either in the object or when calling write(). Or, you could make a
    subclass that initializes _WriteExtension to be empty. If
    _WriteExtension is empty, then _FileName will be used exactly, which may
    result in overwriting the original data file.

    Remember: write() returns the path name it actually used to successfully
    write the file. Just as with read(), if the _FileName you specified did
    not have a path, then write() will prepend a path component indicating
    "current directory" (e.g. "./" on Unix) and this will be part of the
    return value.

  Cacheing with write()

    By default, Data::CTable only creates a cached version of a file when it
    reads that file for the first time (on the assumption that it will need
    to read the file again more often than the file's data will change.)

    But by default, it does not create a cached version of a file when
    writing it, on the assumption that the current program probably will not
    be re-reading the written file and any other program that wants to read
    it can cache it at that time.

    However, if you want write() to create a cache for its output file, it
    is much faster to create it on write() than waiting for the next read()
    because the next read() will be able to use the cache the very first

    To enable write-cacheing, set _CacheOnWrite to true. Then, after the
    write() successfully completes (and only if it does), the cached version
    will be written.

FORMATTED TABLES (Using Data::ShowTable)
            ## Get formatted data in memory

            my $StringRef = $t->format();     ## Format same data as write()
            my $StringRef = $t->format(10);   ## Limit records to 10
            my $StringRef = $t->format(...);  ## Specify arbitrary params
            print $$StringRef;

            ## Write formatted table to file or terminal

            $t->out($Dest, ....);## $Dest as follows; other params to format()
            $t->out($Dest, 10, ....) ## Limit recs to 10; params to format()

            $t->out()            ## print formatted data to STDOUT
            $t->out(\*STDERR)    ## print to STDERR (or any named handle)
            $t->out("Foo.txt")   ## print to any path (file to be overwritten)
            $t->out($FileObj)    ## print to any object with a print() method

    out() takes a first argument specifying a destination for the output,
    then passes all other arguments to format() to create a nice-looking
    table designed to be human-readable; it takes the resulting buffer and
    print()s it to the destination you specified.

    Sample output:

             | First | Last | Age | State |
             | Chris | Zack | 43  | CA    |
             | Marco | Bart | 22  | NV    |
             | Pearl | Muth | 15  | HI    |

            (Note extra space character before each line.)

    The destination may be a file handle (default if undef is \*STDOUT), a
    string (treated as a path to be overwritten), or any object that has a
    print() method, especially an object of type IO::File.

    The main purpose of out() is to give you a quick way to dump a table
    when debugging. out() calls format() to create the output, so read on...

    format() produces a human-readable version of a table, in the form of a
    reference to a string buffer (which could be very large), and returns
    the buffer to you. Dereference the resulting string reference before

    If format() is given one argument, that argument is the _MaxRecords
    parameter, which limits the length of the output.

    Otherwise, format() takes the following named-parameter arguments, which
    can optionally override the corresponding parameters, if any, in the

            _FieldList        ## Fields to include in table
            _Selection        ## Records to be included, in order

            _SortSpecs        ## SortType controls number formatting

            _MaxRecords       ## Limit number of records output

            _MaxWidth         ## Limit width of per-col. data in printout

    format() will obey _MaxRecords, if you'd like to limit the number of
    rows to be output. _MaxRecords can also be a single argument to
    format(), or a second argument to out() if no other parameters are

    format() also recognizes the _SortSpecs->{SortType} and _DefaultSortType
    parameters to help it determine the data types of the fields being
    formatted. Fields of type "Number" are output as right-justified floats;
    "Integer" or "Boolean" are output as right-justified integers, and all
    others (including the default: String) are output as left-justified

    In addition, there is one parameter uniquely supported by format() and

    _MaxWidth ||= 15;
    _MaxWidth specifies the maximum width of columns. If unspecifed, this
    will be 15; the minimum legal value is 2. Each column may actually take
    up 3 more characters than _MaxWidth due to divider characters.

    The data to be output will be examined, and only the necessary width
    will be used for each column. _MaxWidth just limits the upper bound, not
    the lower.

    Data values that are too wide to fit in _MaxWidth spaces will be
    truncated and the tilde character "~" will appear as the last character
    to indicate the truncation.

    Data values with internal returns will have the return characters mapped
    to slashes for display.

    format() and out() will NOT wrap entries onto a second line, like you
    may have seen Data::ShowTable::ShowBoxTable do in some cases. Each
    record will get exactly one line.

    format() and out() ignore the _HeaderRow parameter. A header row showing
    the field names is always printed.

    format() and out() make no attempt to map upper-ascii characters from or
    to any particular dataset. The encoding used in memory (generally ISO
    8859-1 by default) is the encoding used in the output. If you want to
    manipulate the encoding, first call format(), then change the encoding,
    then format the resulting table.

            ## Append all records from a second table

            $t->append($Other)                ## Append records from $Other
            $t->append_file($File, $Params)   ## Append from new($Params, $File)
            $t->append_files($Files, $Params) ## Call append_file for all files
            $t->append_files_new($Files, $Params) ## Internal helper routine

            ## Combine all fields from a second table

            $t->combine($Other)                ## Combine fields from $Other
            $t->combine_file($File, $Params)   ## Combine new($Params, $File)
            $t->combine_files($Files, $Params) ## combine_file on each file

            ## Left-join records from a second table (lookup field vals)

            $t->join      ($Other,          $KeyField1, [$KeyField2, $Fields]) 
            $t->join_file ($File,  $Params, $KeyField1, [$KeyField2, $Fields])
            $t->join_files($Files, $Params, $KeyField1, [$KeyField2, $Fields])

    The append() method concatenates all the records from two CTable objects
    together -- even if the two tables didn't start out with exactly the
    same fields (or even any of the same fields).

    It takes all the data records from another CTable object and appends
    them into the present table. Any columns present in the $Other table but
    not in the first table, are created (and the corresponding field values
    in the first table will all be empty/undef). Similarly, any columns
    present in $t but not present in $Other will be extended to the correct
    new length as necessary and the field values in the original columns
    will be empty/undef. Columns present in both will, of course, have all
    the data from both the original sets of data.

    All data from the second table is brought into the first one. No attempt
    whatsoever is made to eliminate any duplicate records that might result.

    The number of records (length()) after this call is the sum of the
    length() of each of the tables before the operation.

    IMPORTANT NOTE: The data from the $Other object is COPIED in memory into
    the new object. This could be hard on memory if $Other is big. Might
    want to be sure to discard $Other when you're done with it.

    $Other is left untouched by the operation.

    All columns from both tables are combined whether or not they are
    mentioned in the custom field list of either.

    The custom field lists, if present in either table object, are
    concatenated into this object's custom field list, but with duplications
    eliminated, and order retained.

    Any existing custom selections, custom sort order, sort specs, and/or
    sort routines are also combined appropriately, with settings from this
    object taking precedence over those from $Other anywhere the two have
    conflicting settings.

    append_file() takes a file name and optional $Params hash. It uses those
    to create a new() object with data read from the file. Then, the new
    table is appended to $t using append() and then the new table is

    append_files() is a convenience function that calls append_file() on
    each file in a list, using the same optional $Params for each.

    append_files_new() is the internal routine that implements the
    processing done by new() on the optional list of files to be read. It
    does the following: It calls read() on the first file in the list. Then,
    it calls append_files() to read the remaining into their own new()
    objects of the same class as $t and using the same $Params to new() (if
    any were supplied). Then each of these is append()-ed in turn to $t and
    discarded. The final result will be that $t will hold a concatenation of
    all the data in all the files mentioned. However, consistent with the
    behavior of append(), the _FileName parameter and other
    read()-controlled settings will correspond to the first file read. The
    intermediate objects are discarded.

    NOTE: As with new() and read(), if a non-empty _FieldList Param is
    specified, the read() methods called internally by the append_file*()
    methods will read only the fields mentioned and will ignore any other
    fields in the files.

  Combining tables

    combine() adds columns from a second table into the current one.

    CAUTION: You should only use combine() when you have two tables where
    all the (possibly selected) records in the second table line up
    perfectly with all the (unselected) records in the first table -- in
    other words, each table before combine() should contain a few of the
    columns of the new table -- for example, maybe one table contains a
    column of file names, and the other contains columns of corresponding
    file sizes and modification times. If you don't understand the
    consequences of combine, don't use it or you could end up with some
    records whose field values don't refer to the same object. (Maybe you
    meant to use append() or join() instead.)

    If the second table has a custom field list, only those columns are
    brought in.

    If any column in the second table has the same name as one in the
    current table, the incoming column replaces the one by the same name.

    All columns are COPIED from the second table, so the first table owns
    the new data exclusively.

    If the second table has a selection, only those records are copied, in
    selection order. (select_all() first if that's not what you want.)

    The selection in the first table, if any, is ignored during the combine.
    If this isn't what you want, then consider using cull() before

    Field list and sort order are concatenated (but retaining uniqueness:
    second mentions of a field in the combined lists are omitted).

    Custom sort routines and sort specs are combined, with those in the
    first table taking precedence over any copied in with the same name.

    The custom _Selection from the first table, if any, is retained. (It
    will initially omit any records added by extend()).

    All other parameters from the first table are retained, and from the
    second table are ignored.

    combine() calls extend() after combining to ensure that all columns have
    the same length: if either the older or newer columns were shorter, they
    will all be set to the length of the longest columns in the table --
    creating some empty field values at the end of the lengthened columns.

    combine_file() does the same as combine() except starting with a file
    name, first creating the $Other object by creating it using new($Params,
    $File), then discarding it after combining.

  Joining tables (Looking up data from another table)

    join() looks up field values from a second table, based on common values
    in key fields which may have different or the same names in each table.
    It adds columns to the current table if necessary to hold any new field
    values that must be brought in.

    join() never adds any new or non-matching records to the table: records
    where the lookup fails will simply have empty/undef values in the
    corresponding columns.

            ## Example:

            $t->join     ($People,          'FullName', 'FirstAndLast'); ## or
            $t->join_file("People.txt", {}, 'FullName', 'FirstAndLast');

    Here's how join() calculates the list of fields to bring in:

            - Legal field names from the optional $Fields list, if supplied
            - Otherwise, the fieldlist() from second table
            - ... minus any fields with same name as $KeyField1 or $KeyField2

    Join starts by adding new empty columns in the first table for any field
    to be brought in from the second but not yet present in the first.

    Here's how join() calculates the records eligible for lookup:

            - Join only modifies the selected records in the first table
            - Join only looks up values from selected records in second table

    (If you want all records to be used in both or either table, call the
    table's select_all() method before calling join().)

    Then, for every selected record in $t (using the example above), join
    examines the FullName field ($KeyField1), and looks up a corresponding
    entry (must be 'eq') in the FirstAndLast field ($KeyField2) in the
    second table.

    IMPORTANT NOTE ABOUT KEY LENGTH: To speed lookup, hash-based indices are
    made. The strings in $Key1 and $Key2 fields should not be so long that
    the hash lookups bog down or things could get ugly fast. There is no
    fixed limit to hash key length in Perl, but fewer than 128 characters in
    length is longer than customary for such things. (Many systems require
    text-based keys to be no longer than 31 characters.) So be judicious
    about the values in $Key1 and $Key2 fields.

    The first record found in the second table's selection with a matching
    value in the key field is then copied over (but only the appropriate
    fields are copied, as explained above). Any field values being brought
    over will REPLACE corresponding field values in the first table,
    possibly overwriting any previous values if the field being looked up
    was already present in the first table and contained data.

    The first table's _FieldList is updated to reflect new fields added.

    Its _Selection is untouched.

    Its _SortOrder is untouched.

    Its _SortSpecs are augmented to include any entries from the second
    table that should be brought over due to the field additions.

    Its _SRoutines are augmented to add new ones from the second table.

    All other parameters of table 1 are untouched.

    The second table is not modified. No data structures will be shared
    between the tables. Data is only copied.

    join_file() calls join() after creating a seond table from your $File.

    join_files() calls join_file() repeatedly for each file in a list, but
    it is important to note that each file in the list of files to be joined
    must have a $Key2 field -- AND, that any values looked up from the
    second file will overwrite any values of the same key found in the first
    file, and so on. You probably will not ever need join_files(). It is
    mainly here for completeness.

            ## Re-orient table's data using vals from $ColName as field names...

    Sometimes a situation gives you a table that's initially organized with
    column data in rows, and field names in one of the columns, so you need
    to flip the table in order to be able to work meaningfully with it.

    "Inverting" a table means to rewrite each row as a column. One row is
    designated to be used as the field names.

    For example, consider this table:

            F01    F02   F03   F04
            First  Chris Agnes James
            Last   Bart  Marco Nixon
            Age    22    33    44

    Calling invert() using field names from "F01"...


    ... would change the table to look like this:

            First  Last  Age
            Chris  Bart  22
            Agnes  Marco 33
            James  Nixon 44

    The field F01 which formerly contained the field names, is now gone, and
    the remaining data columns have been converted from their old row
    orientation into a column orientation.

            ## Printing a progress message....
            $t->progress($Msg)       ## Print a message per current settings

            ## Progress settings applying to this object only...

            $t->progress_get()       ## Get current progress setting

            $t->progress_set(1)      ## Use progress_default() method
            $t->progress_set($Sub)   ## Set a custom progress routine
            $t->progress_set(0)      ## Disable progress
            $t->progress_set(undef)  ## Use class's settings (default)...

            ## Class's settings (for instances with _Progress == undef)

            $t->progress_class()     ## Get current setting.

            $t->progress_class(1)    ## Use progress_default() method
            $t->progress_class($Sub) ## Set shared custom prog routine
            $t->progress_class(0)    ## Disable class-default progress
            Data::CTable->progress_class(..) ## Call without an object

            ## Call builtin default progress method regardless of settings

            $t->progress_default($Msg) ## Default prog. routine for class

            ## Generate a warning (used internally by other methods)

            $t->warn($Msg)   ## In this class, calls progress_default()

            ## Timed progress: print msg to start, then at most once/2 sec

            $t->progress_timed($Op, $Item)  ## Re-print msg every 2 sec
            $t->progress_timed($Op, $Item, $Pos, $Tot) ##... with % readout
            $t->progress_timed($Op, $Item, $Pos, $Tot, $Wait) ## Not 1st x

            $t->progress_timed_default($Msg)  ## Called by progress_timed

    Data::CTable is especially useful in creating batch-oriented
    applications for processing data. As such, routines that may perform
    time-consuming tasks will, by default, generate helpful progress
    messages. The progress mechanism is highly customizable, however, to
    suit the needs of applications that don't require this output, or that
    require the output to go somewhere other than STDERR or the console.

    The default progress routine is one that prints a message with a
    date/time stamp to STDERR if and only if STDERR is an interactive
    terminal, and otherwise is silent.

    You could write a custom progress routine that does something else or
    something in addition (e.g. logs to a file or syslog). The custom
    routine could either be implemented by overriding the progress_default()
    method in a subclass, or by calling progress_set() in any instance.

    The custom progress routine, if any, is stored in the _Progress
    parameter of the object. But use progress_set() and progress_get() to
    access it.

    The interface for your custom progress routine should be:

            sub MyProgress {my ($Obj, $Message) = @_; chomp $Message; .....}

    In other words, the routine takes a single message which may or may not
    have a trailing newline. It should always chomp the newline if present,
    and then do its business... which generally will include printing or
    logging a message (usually with a newline added).

    The default, built-in progress routine for Data::CTable is:

            sub progress_default
                my ($this, $msg) = @_; 
                chomp $msg; 
                print STDERR (localtime() . " $msg\n") if -t STDERR;

                return(1);   ## Indicate progress actually completed

    Of course, you are free to call this method directly at any time, and it
    will do its thing regardless of other progress-enabling settings. But
    the preferred way is to first set the settings and then call progress().

    The warn() method always calls progress_default() -- i.e. warnings will
    display even if progress is otherwise disabled or overridden at the
    object or class level. However, you could create a subclass that changes
    warn()'s behavior if desired. (For example, it could just call perl's
    builtin warn function, or be even more forceful, generating warnings
    even if STDERR is not a terminal, for example.)

    The progress_set() method may be used to override the progress routine
    for an individual object (set to 1/true for default behavior, or
    0/undef/false to disable progress for that object entirely).

    Call progress_class() to set similar values to control the global
    default behavior (e.g. turning on/off default progress behavior for all
    instances), but be cautious about using this approach in any environment
    where other programs might be accessing the same loaded class data,
    since the setting is stored in a class-owned global

    Manipulating the class-default settings is only recommended in batch or
    shell-script environments, not in mod_perl Web applications where the
    module stays loaded into the Perl environment across multiple
    invocations, for example.

    If you want a particular method (e.g. read() but not write()) to be
    silent, you could make a subclass and could override that method with an
    implementation that first disables progress, calls the SUPER:: method,
    and then restores the progress setting to its original setting.

  Timed progress

    Timed progress is a way of printing periodically-recurring progress
    messages about potentially time-consuming processes to the terminal.

    For example, consider the following messages which might appear every 2
    seconds during a lengthy read() operation:

            Reading... 0 (0%)
            Reading... 2000 (4%)
            Reading... 38000 (96%)
            Reading... 40000 (100%)

    The progress_timed() method is called internally by potentially
    time-consuming processes (read(), write(), and sort()), and you may want
    to call it yourself from your own scripts, to produce
    weary-programmer-soothing visual output during otherwise panic-producing
    long delays.

    Generally, progress_timed() is called with the $Wait parameter set to
    true, which delays the display of any messages until 2 seconds have
    passed, so no messages will be displayed unless the process actually
    does end up being slower than 2 seconds.

    Parameters are:

            $Op    The string that identifies the "operation" taking place
            $Item  A milestone such as a number or datum to indicate progress
            $Pos   A numeric position against the (maybe estimated) baseline
            $Tot   The baseline.  If estimated, don't re-estimate too often
            $Wait  If true, skip printing the first message for this $Op

    All parameters except $Op are optional.

    progress_timed() has a throttle that keeps it from re-triggering more
    often than every 2 seconds for any given sequence of the same $Op. The
    clock is restarted each time you call it with a different $Op or $Tot
    from the previous call (on the assumption that if the operation or the
    baseline changes then that fact should be noted).

    The messages printed will start with "$Op... ".

    If you supply $Item, which could be a number or a string, the messages
    will then show the $Item after the $Op.

    If you supply BOTH $Pos and $Tot, then a percentage will be calculated
    and added to the readout; otherwise omitted.

    If you supply $Wait, the first message (only) that uses this $Op will be
    skipped, and the next one won't appear for at least 2 seconds.

    If using $Pos and $Tot to display percentages for your user, be sure to
    call progress_timed() one final time when $Pos == $Tot so your user sees
    the satisfying 100% milestone. This "completion" call will not be
    skipped even if 2 seconds have not passed since the previous timed
    progress message was printed.

    Althought progress_timed() is designed to cut down on too much visual
    output when called often in a tight loop, remember that it still takes
    some processing time to call it and so if you call it too frequently,
    you're slowing down the very loop you wish were running faster.

    So, you might want to call it every tenth or 100th or even 1000th time
    through a tight loop, instead of every time through, using the mod (%)

            $t->progress_timed(....) if ($LoopCount % 100) == 0;

    progress_timed_default() is the method called internally by
    progress_timed() to actually print the messages it has prepared. In this
    implementation, progress_timed_default() just calls progress_default().
    That is, it ignores all other progress-inhibiting or -enhancing settings
    so delay-soothing messages will print on the terminal even if other
    messages are turned off.

    This is because the author assumes that even if you don't want all those
    other progress messages, you might still want these ones that explain
    long delays. If you REALLY don't, then just make yourself a lightweight
    subclass where progress_timed_default() is a no-op, or maybe calls
    regular progress(). For example:

            BEGIN {package Data::CTable::Silent; use vars qw(@ISA); 
                   @ISA=qw(Data::CTable); sub progress_timed_default{}}

            ## Later...
            my $t = Data::CTable::Silent->new(...);

            ## Print some debugging output...

            $t->out()             ## Pretty-print $t using Data::ShowTable

            $t->dump()            ## Dump $t using Data::Dumper
            $t->dump($x, $y)      ## Dump anything else using Data::Dumper

            ## Print some debugging output and then die.

            die $t->out()         ## Same but die afterwards.

            die $t->dump()        ## Same but die afterwards.
            die $t->dump($x, $y)  ## Same but die afterwards.

    These two methods can be very helpful in debugging your scripts.

    The out() method, which has many options, is described in complete
    detail in the section below titled "FORMATTED TABLES". In short, it
    prints a nicely-formatted diagram of $t, obeying the custom field list
    if any and custom selection if any.

    The dump() method uses the Data::Dumper module to call &Dumper() on the
    table itself (by default) and prints the result to STDERR. If you
    specify any number of other values, those will be dumped instead using a
    single call to &Dumper (rather than individually).

  Optional module dependencies

    These methods require the otherwise-optional modules shown here:

            out()       Data::ShowTable  
            dump()      Data::Dumper

    You'll get a warning at runtime if you try to call either method without
    the appropriate module installed on your system.

    The following utilities are methods of the Data::CTable object. They may
    be called directly by clients, subclassed, or used by subclass
    implementations as needed.

            ## Get cache file path (all args optional: defaulted from $t)

            $f  = $t->prep_cache_file($FileName, $CacheExtension, $CacheSubDir)

            ## Verify all directories in a path, creating any needed ones.

            $ok = $t->verify_or_create_path($DirPath, $Sep)

            ## Testing readability / writeability of a proposed file

            $ok = $t->try_file_read ($Path); ## Opens for read; then closes
            $ok = $t->try_file_write($Path); ## Opens for write; then deletes

            ## Getting parameters from object with optional overrides

            $param = $t->getparam($Params, $Param)

    prep_cache_file() is the internal method used by both read() and write()
    to calculate the name of a cache file to be used for a given $FileName.

    It calculates the path to the cache file that corresponds to the given
    $FileName (which may be a bare file name, a relative path, or a partial
    path, as long as it obeys the current platform's path format rules). All
    arguments are optional and if absent (undef), will be defaulted from the
    corresponding parameters in $t.

    In addition to calculating the path and file name, it also prepends the
    "current directory" path if there was no path. Then it checks that all
    directories mentioned in the path actually exist. If not, it fails.
    Then, it checks that EITHER the file exists and is readable, OR it does
    not exist but would be writeable in that directory. If any of these
    directory creations or file checks fails, then undef is returned (and
    there would be no cache file).

    You may call it with no arguments on a file that has been read() to find
    the path to the cache file that may have been used and/or created, if

    You may call it with a file name that was written to, to see what the
    corresponding written cache file would be.

    For example:

            ## Get name of cache file used or created by read and delete it.

            $RCache = $t->prep_cache_file() and unlink($RCache);

            ## Cache on write() and get name of file and delete it.

            $Written = $t->write(_CacheOnWrite=>1, _FileName=>"Foo.txt");
            $WCache  = $t->prep_cache_file($Written) and unlink($WCache);

    verify_or_create_path() is the internal routine used by read(), write(),
    and the cache-writing logic, that makes sure a requested file path
    exists (by creating it if necessary and possible) before any file is
    written by this module.

    (If you don't like this module's tendency to try to create directories,
    make yourself a subclass in which this routine simply checks -d on its
    $Path argument and returns the result.)

    It must be called with a full or partial path TO A DIRECTORY, NOT A
    FILE. You may supply $Sep, a platform-appropriate separator character
    (which defaults correctly for the runtime platform if you don't).

    Returns true if the path verification and/or creation ultimately
    succeeded, false otherwise (meaning that, after this call, there is no
    such directory on the system and so you should not try to write a file

    try_file_read() and try_file_write() are the internal methods called by
    prep_cache_file() as well as by read() and write() to preflight proposed
    file reading and writing locations.

    try_file_read() opens a file for read and closes it again; returns true
    if the open was possible.

    try_file_write() opens a file for write and closes it again, deleting it
    if successful. Returns true if the open for write and the delete were
    successful. (Be aware that this call will actually delete any existing
    file by this name.)

    The reason that failure to delete causes try_file_write() to fail is
    that successful cacheing depends on the ability to delete cache files as
    well as create them or write to them. A file in a location that couldn't
    be deleted will not be used for cacheing.

    getparam() looks up a named parameter in a params hash if it exists
    there, otherwise looks it up in the object, thereby allowing $Params to
    shadow any parameters in $this.

    This internal routine is used by any methods that allow overriding of
    parameters in the object when using a named-parameter calling interface.
    It should be used by any subclasses that also wish to use a
    named-parameter calling convention. For example:

            my $this        = shift;
            my $Params      = (@_ == 1 ? {_FieldList => $_[0]} : {@_});

            my($FieldList, $Selection) = map {$this->getparam($Params, $_)} 
            qw(_FieldList  _Selection);

    These general-purpose utility routines are defined in the Data::CTable
    module but are not method calls. You may optionally import them or call
    them by their fully-qualified name.

            use Data::CTable qw(

            ## File-format guessing

            my $E = &guess_endings($IOHandle) ## Guess txt file line endings
            my $D = &guess_delimiter($String) ## Tab if found, else comma

            ## Cross-platform file path analysis

            my $Info = path_info();   ## Hash: 3 of platform's path values:
            my ($Sep,                 ## ... path separator (  / on Unix)
                $Up,                  ## ... "up" component (../ on Unix)
                $Cur) =               ## ... curr. dir path ( ./ on Unix)
            @$Info{qw(sep up cur)};

            my $Abs = path_is_absolute($Path)  ## Check path type

            ## Our old favorites min and max

            $x = max($x, 0);          ## Should have been part of Perl...
            $x = min($x, 100);

    guess_endings() tries to figure out whether an open IO::File handle has
    DOS, Mac, or Unix file endings. It reads successively large blocks of
    the file until it either finds evidence of at least two separate line
    endings (of any type, but presumably they are the same), or until it
    reaches the end of the file. Then, it takes the resulting block and
    searches for the first qualifying line ending sequence it finds, if any.
    This sequence is then returned to the caller. If it returns undef, it
    was not able to find any evidence of line endings in the file.

    guess_delimiter() takes a string buffer and returns a "," unless it
    finds a tab character before the first comma in the $String, if any, in
    which case a tab is returned.

    path_info() returns a hash of three helpful strings for building and
    parsing paths on the current platform. Knows about Mac, Dos/Win, and
    otherwise defaults to Unix.

    path_is_absolute($Path) returns true if it thinks the given path string
    is an absolute path on the current platform.

    Column (field) names must not start with underscore
        This object is implemented as a blessed hash reference. By
        convention, keys that do not start with underscore are data columns
        and the key is the field name. Keys that do start with underscore
        refer to parameters or other data structures stored in the object.

        Consequently, no field names may start with underscore. When a file
        is read from disk, any field names that DO start with underscores
        will have the leading underscores stripped off. Strange things could
        then occur if the field names are then no longer unique. For
        example, field "A" and "_A" in the data file would be treated as the
        single field "A" after the file was read.

    Field values are always read as strings
        Field values when written to a file are necessarily converted to
        strings. When read back in, they are read as strings, regardless of
        original format. The sole exception is the empty string which is
        read back in as undef for efficiency.

        An exception is when the _CacheOnWrite feature is used: field values
        stored internally as integers or other scalar types may be saved and
        later restored as such. However, you should not rely on this

    Undef vs. empty
        Empty field values are stored as "undef" for efficiency. This means
        that programs should generally not rely on any differences between
        "" and undef in field values. However, when working with large but
        sparse tables, programs should take care not to convert undef values
        to empty strings unnecessarily since the separate string objects
        consume considerably more memory than undef.

    Corrections, bug reports, bug fixes, or feature additions are
    encouraged. Please send additions or patches with a clear explanation of
    their purpose. Consider making additions in the form of a subclass if

    I'm committed to bundling useful subclasses contributed by myself or
    others with this main distribution.

    So, if you've got a subclass of Data::CTable (which should have a name
    like Data::CTable::YourClassName) and you would like it included in the
    main distribution, please send it along with a test script and I'll
    review the code and add it (at my discretion).

    If you've got a module that uses, augments, or complements this one, let
    me know that, too, and I'll make appropriate mention of it.

    The Data::CTable home page:


    The implementation in

    The script, other subclasses, and examples.

    The Data::ShowTable module.

    The Data::Table module by Yingyao Zhou & Guangzhou Zou.

    The perlref manual page.

    Chris Thorman <>

    Copyright (c) 1995-2002 Chris Thorman. All rights reserved.

    This program is free software; you can redistribute it and/or modify it
    under the same terms as Perl itself.