The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Text::AutoCSV - helper module to automate the use of Text::CSV

VERSION

version 1.1.9

SYNOPSIS

By default, Text::AutoCSV will detect the following characteristics of the input:

- The separator, among ",", ";" and "\t" (tab)

- The escape character, among '"' (double-quote) and '\\' (backslash)

- Try UTF-8 and if it fails, fall back on latin1

- Read the header line and compute field names

- If asked to (see "fields_dates_auto"), detect any field that contains a DateTime value, trying 20 date formats, possibly followed by a time (6 time formats tested)

- If asked to (see "fields_dates"), detect DateTime format of certain fields, croak if no DateTime format can be worked out

- Fields identified as containing a DateTime value ("fields_dates_auto" or "fields_dates") are stored as DateTime objects by default

Text::AutoCSV also provides methods to search on fields (using cached hash tables) and it can populate the value of "remote" fields, made from joining 2 CSV files with a key-value search

General

        use Text::AutoCSV;

        Text::AutoCSV->new()->write();    # Read CSV data from std input, write to std output

        Text::AutoCSV->new(in_file => 'f.csv')->write(); # Read CSV data from f.csv, write to std output

                # Read CSV data from f.csv, write to g.csv
        Text::AutoCSV->new(in_file => 'f.csv', out_file => 'g.csv')->write();

                # "Rewrite" CSV file by printing out records as a list (separated by line breaks) of field
                # name followed by its value.
        my $csv = Text::AutoCSV->new(in_file => 'in.csv', walker_hr => \&walk);
        my @cols = $csv->get_fields_names();
        $csv->read();
        sub walk {
                my %rec = %{$_[0]};
                for (@cols) {
                        next if $_ eq '';
                        print("$_ => ", $rec{$_}, "\n");
                }
                print("\n");
        }

OBJ-ish functions

                # Identify column internal names with more flexibility as the default mechanism
        my $csv = Text::AutoCSV->new(in_file => 'zips.csv',
                fields_hr => {'CITY' => '^(city|town)', 'ZIPCODE' => '^zip(code)?$'});
                # Get zipcode of Claix
        my $z = $csv->vlookup('CITY', ' claix    ', 'ZIPCODE');

        my $csv = Text::AutoCSV->new(in_file => 'zips.csv');
                # Get zipcode of Claix
        my $z = $csv->vlookup('CITY', ' claix    ', 'ZIPCODE');
                # Same as above, but vlookup is strict for case and spaces around
        my $csv = Text::AutoCSV->new(in_file => 'zips.csv', search_case => 1, search_trim => 0);
        my $z = $csv->vlookup('CITY', 'Claix', 'ZIPCODE');

                # Create field 'MYCITY' made by taking pers.csv' ZIP column value, looking it up in the
                # ZIPCODE columns of zips.csv, taking CITY colmun value and naming it 'MYCITY'. Output is
                # written in std output.
                # If a zipcode is ambiguous, say it.
        Text::AutoCSV->new(in_file => 'pers.csv')
                ->field_add_link('MYCITY', 'ZIP->ZIPCODE->CITY', 'zips.csv',
                        { ignore_ambiguous => 0, value_if_ambiguous => '<duplicate zipcode found!>' })->write();

                # Note the above can also be written using Text::AutoCSV level attributes:
        Text::AutoCSV->new(in_file => 'pers.csv',
                search_ignore_ambiguous => 0, search_value_if_ambiguous => '<duplicate zipcode found!>')
                ->field_add_link('MYCITY', 'ZIP->ZIPCODE->CITY', 'zips.csv')->write();

                # Create 'MYCITY' field as above, then display some statistics
        my $nom_compose = 0;
        my $zip_not_found = 0;
        Text::AutoCSV->new(in_file => 'pers.csv', walker_hr => \&walk)
                ->field_add_link('MYCITY', 'ZIP->ZIPCODE->CITY', 'zips.csv')->read();
        sub walk {
                my $hr = shift;
                $nom_compose++ if $hr->{'NAME'} =~ m/[- ]/;
                $zip_not_found++ unless defined($hr->{'MYCITY'});
        }
        print("Number of persons with a multi-part name: $nom_compose\n");
        print("Number of persons with unknown zipcode: $zip_not_found\n");

Updating

        Text::AutoCSV->new(in_file => 'names.csv', out_file => 'ucnames.csv',
                read_post_update_hr => \&updt)->write();
        sub updt { $_[0]->{'LASTNAME'} =~ s/^.*$/\U&/; }

        Text::AutoCSV->new(in_file => 'squares.csv', out_file => 'checkedsquares.csv',
                out_filter => \&wf)->write();
        sub wf { return ($_[0]->{'X'} ** 2 == $_[0]->{'SQUAREOFX'}); }

                # Add a field for the full name, made of the concatenation of the
                # first name and the last name.
                # Also display stats about empty full names.
        Text::AutoCSV->new(in_file => 'dirpeople.csv', out_file => 'dirwithfn.csv', verbose => 1)
                ->field_add_computed('FULLNAME', \&calc_fn)->write();
        sub calc_fn {
                my ($field, $hr, $stats) = @_;
                my $fn = $hr->{'FIRSTNAME'} . ' ' . uc($hr->{'LASTNAME'});
                $stats->{'empty full name'}++ if $fn eq ' ';
                return $fn;
        }

                # Read a file with a lot of columns and keep only 2 columns in output
        Text::AutoCSV->new(in_file => 'big.csv', out_file => 'addr.csv',
                out_fields => ['NAME', 'ADDRESS'])
                ->out_header('ADDRESS', 'Postal Address')
                ->write();

Datetime management

                # Detect any field containing a DateTime value and convert it to yyyy-mm-dd whatever the
                # input format is.
        Text::AutoCSV->new(in_file => 'in.csv', out_file => 'out.csv', fields_dates_auto => 1,
                out_dates_format => '%F')->write();

                # Detect any field containing a DateTime value and convert it to a US DateTime whatever the
                # input format is.
        Text::AutoCSV->new(in_file => 'in.csv', out_file => 'out.csv', fields_dates_auto => 1,
                out_dates_format => '%b %d, %Y, %I:%M:%S %p', out_dates_locale => 'en')->write();

                # Find dates of specific formats and convert it into yyyy-mm-dd
        Text::AutoCSV->new(in_file => 'raw.csv', out_file => 'cooked.csv',
                dates_formats_to_try => ['%d_%m_%Y', '%m_%d_%Y', '%Y_%m_%d'],
                out_dates_format => '%F')->write();

                # Take the dates on columns 'LASTLOGIN' and 'CREATIONDATE' and convert it into French dates
                # (day/month/year).
                # Text::AutoCSV will croak if LASTLOGIN or CREATIONDATE do not contain a DateTime format.
                # By default, Text::AutoCSV will try 20 different formats.
        Text::AutoCSV->new(in_file => 'in.csv', out_file => 'out.csv',
                fields_dates => ['LASTLOGIN', 'CREATIONDATE'], out_dates_format => '%d/%m/%Y')->write();

                # Convert 2 DateTime fields into unix standard epoch
                # Write -1 if DateTime is empty.
        sub toepoch { return $_->epoch() if $_; -1; }
        Text::AutoCSV->new(in_file => 'stats.csv', out_file => 'stats-epoch.csv',
                fields_dates => ['ATIME', 'MTIME'])
                ->in_map('ATIME', \&toepoch)
                ->in_map('MTIME', \&toepoch)
                ->write();

                # Do the other way round from above: convert 2 fields containing unix standard epoch into a
                # string displaying a human-readable DateTime.
        my $formatter = DateTime::Format::Strptime->new(pattern => 'DATE=%F, TIME=%T');
        sub fromepoch {
                return $formatter->format_datetime(DateTime->from_epoch(epoch => $_)) if $_ >= 0;
                '';
        }
        $csv = Text::AutoCSV->new(in_file => 'stats-epoch.csv', out_file => 'stats2.csv')
                ->in_map('ATIME', \&fromepoch)
                ->in_map('MTIME', \&fromepoch)
                ->write();

Miscellaneous

        use Text::AutoCSV 'remove_accents';
                # Output 'Francais: etre elementaire, Tcheque: sluzba dum' followed by a new line.
        print remove_accents("Français: être élémentaire, Tchèque: služba dům"), "\n";

NAME

Text::AutoCSV - helper module to automate the use of Text::CSV

METHODS

new

        my $csv = Text::AutoCSV->new(%attr);

(Class method) Returns a new instance of Text::AutoCSV. The object attributes are described by the hash %attr (can be empty).

Currently the following attributes are available:

Preliminary note about "fields_hr", "fields_ar" and "fields_column_names" attributes

By default, Text::AutoCSV assumes the input has a header and will use the field values of this first line (the header) to work out the column internal names. These internal names are used everywhere in Text::AutoCSV to designate columns.

The values are transformed as follows:

- All accents are removed using the exportable function "remove_accents".

- Any non-alphanumeric character is removed (except underscore) and all letters are switched to upper case. The regex to do this is

        s/[^[:alnum:]_]//gi; s/^.*$/\U$&/;

Thus a header line of

        'Office Number 1,Office_2,Personal Number'

will produce the internal column names

        'OFFICENUMBER1' (first column)

        'OFFICE_2' (second column)

        'PERSONALNUMBER' (third column).

The attribute "fields_hr", "fields_ar" or "fields_column_names" (only one of the three is useful at a time) allows to change this behavior.

NOTE

The removal of accents is *not* a conversion to us-ascii, see "remove_accents" for details.

Preliminary note about fields reading

Functions that are given a field name ("get_cell", "vlookup", "field_add_copy", ...) raise an error if the field requested does not exist.

SO WILL THE HASHREFS GIVEN BY Text::AutoCSV: when a function returns a hashref ("search_1hr", "get_row_hr", ...), the hash is locked with the lock_keys function of Hash::Util. Any attempt to read a non-existing key from the hash causes a croak. This feature is de-activated if you specified croak_if_error => 0 when creating Text::AutoCSV object.

in_file

The name of the file to read CSV data from.

If not specified or empty, read standard input.

Example:

        my $csv = Text::AutoCSV->new(in_file => 'in.csv');
inh

File handle to read CSV data from. Normally you don't want to specify this attribute.

inh is useful if you don't like the way Text::AutoCSV opens the input file for you.

Example:

        open my $inh, "producecsv.sh|";
        my $csv = Text::AutoCSV->new(inh => $inh);
encoding

Comma-separated list of encodings to try to read input.

Note that finding the correct encoding of any given input is overkill. This script just tries encodings one after the other, and selects the first one that does not trigger a warning during reading of input. If all produce warnings, select the first one.

The encoding chosen is used in output, unless attribute "out_encoding" is specified.

Value by default: 'UTF-8,latin1'

IMPORTANT

If one tries something like encoding => 'latin1,UTF-8', it'll almost never detect UTF-8 because latin1 rarely triggers warnings during reading. It tends to be also true with encodings like UTF-16 that can remain happy with various inputs (sometimes resulting in Western languages turned into Chinese text).

Ultimately this attribute should be used with a unique value. The result when using more than one value can produce weird results and should be considered experimental.

Example:

        my $csv = Text::AutoCSV->new(in_file => 'w.csv', encoding => 'UTF-16');
via

Adds a via to the file opening instruction performed by Text::AutoCSV. You don't want to use it under normal circumstances.

The value should start with a ':' character (Text::AutoCSV won't add one for you).

Value by default: none

Example:

        my $csv = Text::AutoCSV->new(in_file => 'in.csv', via => ':raw:perlio:UTF-32:crlf');
dont_mess_with_encoding

If true, just ignore completely encoding and don't try to alter I/O operations with encoding considerations (using binmode instruction). Note that if inh attribute is specified, then Text::AutoCSV will consider the caller manages encoding for himself and dont_mess_with_encoding will be automatically set, too.

IMPORTANT

This attribute does not mean perl will totally ignore encoding and would consider character strings as bytes for example. The meaning of "dont_mess_with_encoding" is that Text::AutoCSV itself will totally ignore encoding matters, and leave it entirely to Perl' default.

Value by default:

        0 if inh attribute is not set
        1 if inh attribute is set

Example:

        my $csv = Text::AutoCSV->new(in_file => 'in.csv', dont_mess_with_encoding => 1);
sep_char

Specify the CSV separator character. Turns off separator auto-detection. This attribute is passed as is to Text::CSV->new().

Example:

        my $csv = Text::AutoCSV->new(in_file => 'in.csv', sep_char => ';');
quote_char

Specify the field quote character. This attribute is passed as is to Text::CSV->new().

Value by default: double quote ('"')

Example:

        my $csv = Text::AutoCSV->new(in_file => 'in.csv', quote_char => '\'');
escape_char

Specify the escape character. Turns off escape character auto-detection. This attribute is passed as is to Text::CSV->new().

Value by default: backslash ('\\')

Example:

        my $csv = Text::AutoCSV->new(in_file => 'in.csv', escape_char => '"');
in_csvobj

Text::CSV object to use. Normally you don't want to specify this attribute.

By default, Text::AutoCSV will manage creating such an object and will work hard to detect the parameters it requires.

Defining in_csvobj attribute turns off separator character and escape character auto-detection.

Using this attribute workarounds Text::AutoCSV philosophy a bit, but you may need it in case Text::AutoCSV behavior is not suitable for Text::CSV creation.

Example:

        my $tcsv = Text::CSV->new();
        my $acsv = Text::AutoCSV->new(in_file => 'in.csv', in_csvobj => $tcsv);
has_headers

If true, Text::AutoCSV assumes the input has a header line.

Value by default: 1

Example:

        my $csv = Text::AutoCSV->new(in_file => 'in.csv', has_headers => 0);
fields_hr

(Only if input has a header line) Hash ref that contains column internal names along with a regular expression to find it in the header line. For example if you have:

        my $csv = Text::AutoCSV->new(in_file => 'in.csv',
                fields_hr => {'PHONE OFFICE' => '^office phone nu',
                              'PHONE PERSONAL' => '^personal phone nu'});

And the header line is

        'Personal Phone Number,Office Phone Number'

the column name 'PHONE OFFICE' will designate the second column and the column name 'PHONE PERSONAL' will designate the first column.

You can choose column names like 'Phone Office' and 'Phone Personal' as well.

The regex search is case insensitive.

fields_ar

(Only if input has a header line) Array ref that contains column internal names. The array is used to create a hash ref of the same kind as "fields_hr", by wrapping the column name in a regex. The names are surrounded by a leading '^' and a trailing '$', meaning, the name must match the entire field name.

For example

        fields_ar => ['OFFICENUMBER', 'PERSONALNUMBER']

is strictly equivalent to

        fields_hr => {'OFFICENUMBER' => '^officenumber$', 'PERSONALNUMBER' = '^personalnumber$'}

The regex search is case insensitive.

fields_ar is useful if the internal names are identical to the file column names. It avoids repeating the names over and over as would happen if using "fields_hr" attribute.

NOTE

You might wonder why using fields_ar as opposed to Text::AutoCSV default' mechanism. There are two reasons for that:

1- Text::AutoCSV removes spaces from column names, and some people may want another behavior. A header name of 'Phone Number' will get an internal column name of 'PHONENUMBER' (default behavior, if none of fields_hr, fields_ar and fields_column_names attributes is specified), and one may prefer 'PHONE NUMBER' or 'phone number' or whatsoever.

2- By specifying a list of columns using either of fields_hr or fields_ar, you not only map column names as found in the header line to internal column names: you also request these columns to be available. If one of the requested columns cannot be found, Text::AutoCSV will croak (default) or print an error and return an undef object (if created with croak_if_error => 0).

fields_column_names

Array ref of column internal names, in the order of columns in file. This attribute works like the column_names attribute of Text::CSV. It'll just assign names to columns one by one, regardless of what the header line contains. It'll work also if the file has no header line.

Example:

        my $csv = Text::AutoCSV->new(in_file => 'in.csv',
                fields_column_names => ['My COL1', '', 'My COL3']);
out_file

Output file when executing the "write" method.

If not specified or empty, write to standard output.

Example:

        my $csv = Text::AutoCSV->new(in_file => 'in.csv', out_file => 'out.csv');
outh

File handle to write CSV data to when executing the "write" method. Normally you don't want to specify this attribute.

outh is useful if you don't like the way Text::AutoCSV opens the output file for you.

Example:

        my $outh = open "myin.csv', ">>";
        my $csv = Text::AutoCSV->new(in_file => 'in.csv', has_headers => 0, outh => $outh);
out_encoding

Enforce the encoding of output.

Value by default: input encoding

Example:

        my $csv = Text::AutoCSV->new(in_file => 'in.csv', out_file => 'out.csv',
                out_encoding => 'UTF-16');
out_utf8_bom

Enforce BOM (Byte-Order-Mark) on output, when it is UTF8. If output encoding is not UTF-8, this attribute is ignored.

NOTE

UTF-8 needs no BOM (there is no Byte-Order in UTF-8), and in practice, UTF8-encoded files rarely have a BOM.

Using this attribute is not recommended. It is provided for the sake of completeness, and also to produce Unicode files Microsoft EXCEL will be happy to read.

At first sight it would seem more logical to make EXCEL happy with something like this:

        out_encoding => 'UTF-16'

But... While EXCEL will identify UTF-16 and read it as such, it will not take into account the BOM found at the beginning. In the end the first cell will have 2 useless characters prepended. The only solution the author knows to workaround this issue if to use UTF-8 as output encoding, and enforce a BOM. That is, use:

        ..., out_encoding => 'UTF-8', out_utf8_bom => 1, ...
out_sep_char

Enforce the output CSV separator character.

Value by default: input separator

Example:

        my $csv = Text::AutoCSV->new(in_file => 'in.csv', out_file => 'out.csv', out_sep_char => ',');
out_quote_char

Enforce the output CSV quote character.

Value by default: input quote character

Example:

        my $csv = Text::AutoCSV->new(in_file => 'in.csv', out_file => 'out.csv', out_quote_char => '"');
out_escape_char

Enforce the output CSV escape character.

Value by default: input escape character

Example:

        my $csv = Text::AutoCSV->new(in_file => 'in.csv', out_file => 'out.csv',
                out_escape_char_char => '\\');
out_always_quote

If true, quote all fields of output (set always_quote of Text::CSV).

If false, don't quote all fields of output (don't set always_quote of Text::CSV).

Value by default: same as what is found in input

While reading input, Text::AutoCSV works out whether or not all fields were quoted. If yes, then the output Text::CSV object has the always_quote attribute set, if no, then the output Text::CSV object does not have this attribute set.

Example:

        my $csv = Text::AutoCSV->new(in_file => 'in.csv', out_file => 'out.csv', out_always_quote => 1);
out_has_headers

If true, when writing output, write a header line on first line.

If false, when writing output, don't write a header line on first line.

Value by default: same as has_headers attribute

Example 1

Read standard input and write to standard output, removing the header line.

        Text::AutoCSV->new(out_has_headers => 0)->write();

Example 2

Read standard input and write to standard output, adding a header line.

        Text::AutoCSV->new(fields_column_names => ['MYCOL1', 'MYCOL2'], out_has_headers => 1)->write();
no_undef

If true, non-existent column values are set to an empty string instead of undef. It is also done on extra fields that happen to have an undef value (for example when the target of a linked field is not found).

Note this attribute does not work on callback functions output set with "in_map": for example empty DateTime values (on fields identified as containing a date/time, see dates_* attributes below) are set to undef, even while no_undef is set. Indeed setting it to an empty string while non-empty values would contain a Datetime object would not be clean. An empty value in a placeholder containing an object must be undef.

Since version 1.1.5 of Text::AutoCSV, no_undef is examined when sending parameter ($_) to "in_map" callback: an undef value is now passed as is (as undef), unless no_undef is set. If no_undef is set, and field value is undef, then $_ is set to the empty string ('') when calling callback defined by "in_map". This new behavior was put in place to be consistent with what is being done with DateTime values.

Value by default: 0

Example:

        my $csv = Text::AutoCSV->new(in_file => 'in.csv', no_undef => 1);
read_post_update_hr

To be set to a ref sub. Each time a record is read from input, call read_post_update_hr to update the hash ref of the record. The sub is called with 2 arguments: the hash ref to the record value and the hash ref to stats.

The stats allow to count events and are printed in the end of reading in case Text::AutoCSV is called in verbose mode (verbose => 1).

For example, the read_post_update_hr below will turn column 'CITY' values in upper case and count occurences of empty cities in stat display:

        Text::AutoCSV->new(in_file => 'addresses.csv', read_post_update_hr => \&updt, verbose => 1)
                ->write();
        sub updt {
                my ($hr, $stats) = @_;
                $hr->{'CITY'} =~ s/^.*$/\U$&/;
                $stats->{'empty city encountered'}++ if $hr->{'CITY'} eq '';
        }

IMPORTANT

You cannot create a field this way. To create a field, you have to use the member functions "field_add_link", "field_add_copy" or "field_add_computed".

NOTE

If you wish to manage some updates at field level, consider registering update functions with "in_map" and "out_map" member functions. These functions register callbacks that work at field level and with $_ variable (thus the callback function invoked is AutoCSV-agnostic).

"in_map" updates a field after read, "out_map" updates the field content before writing it.

walker_hr

To set to a sub ref that'll be executed each time a record is read from input. It is executed after "read_post_update_hr". The sub is called with 2 arguments: the hash ref to the record value and the hash ref to stats.

Note "read_post_update_hr" is meant for updating record fields just after reading, whereas "walker_hr" is read-only.

The stats allow to count events and are printed in the end of reading in case Text::AutoCSV is called in verbose mode (verbose => 1). If the "verbose" attribute is not set, the stats are not displayed, however you can get stats by calling the get_stats function.

The example below will count in the stats the number of records where the 'CITY' field is empty. Thanks to verbose => 1 attribute, at the end of reading the stats are displayed.

        my $csv = Text::AutoCSV->new(in_file => 'addresses.csv', walker_hr => \&walk1,
                verbose => 1)->read();
        sub walk1 {
                my ($hr, $stats) = @_;
                $stats->{'empty city'}++ if $hr->{'CITY'} eq '';
        }
walker_ar

To set to a sub ref that'll be executed each time a record is read from input. It is executed after "read_post_update_hr". The sub is called with 2 arguments: the array ref to the record value and the hash ref to stats.

Note "read_post_update_hr" is meant for updating record fields just after reading, whereas walker_hr is read-only.

The stats allow to count events and are printed in the end of reading in case Text::AutoCSV is called in verbose mode (verbose => 1). If the "verbose" attribute is not set, the stats are lost.

The array ref contains values in their natural order in the CSV. To be used with the column names, you have to use "get_fields_names" member function.

The example below will count in the stats the number of records where the 'CITY' field is empty. Thanks to verbose => 1 attribute, at the end of reading the stats are displayed. It produces the exact same result as the example in walker_hr attribute, but it uses walker_ar.

        use List::MoreUtils qw(first_index);
        my $csv = Text::AutoCSV->new(in_file => 'addresses.csv', walker_ar => \&walk2, verbose => 1);
        my @cols = $csv->get_fields_names();
        my $idxCITY = first_index { /^city$/i } @cols;
        die "No city field!??" if $idxCITY < 0;
        $csv->read();
        sub walk2 {
                my ($ar, $stats) = @_;
                $stats->{'empty city'}++ if $ar->[$idxCITY] eq '';
        }
write_filter_hr

Alias of "out_filter".

out_filter

To be set to a ref sub. Before writing a record to output, out_filter is called and the record gets writen only if out_filter return value is true. The sub is called with 1 argument: the hash ref to the record value.

For example, if you want to output only records where the 'CITY' column value is Grenoble:

        Text::AutoCSV->new(in_file => 'addresses.csv', out_file => 'grenoble.csv',
                out_filter => \&filt)->write();
        sub filt {
                my $hr = shift;
                return 1 if $hr->{'CITY'} =~ /^grenoble$/i;
                return 0;
        }
write_fields

Alias of "out_fields".

out_fields

Set to an array ref. List fields to write to output.

Fields are written in their order in the array ref, the first CSV column being the first element in the array, and so on. Fields not listed in out_fields are not written in output.

You can use empty field names to have empty columns in output.

Value by default: none, meaning, all fields are output in their natural order. What is natural order? It is the input order for fields that were read from input, and the order in which they got created for created fields.

Example:

        Text::AutoCSV->new(in_file => 'allinfos.csv', out_file => 'only-addresses.csv',
                out_fields => [ 'NAME', 'ADDRESS' ] )->write();
out_orderby

Array reference to a list of fields to sort output with.

At the moment this feature is a bit of a hack (no option to make sort descending or ascending, numeric or text, and it is not part of test plan).

Example:

        Text::AutoCSV->new(in_file => 'names.csv', out_file => 'sortednames.csv',
                out_orderby => [ 'LASTNAME', 'FIRSTNAME']);
search_case

If true, searches are case sensitive by default. Searches are done by the member functions "search", "search_1hr", "vlookup", and linked fields ("field_add_link").

The search functions can also be called with the option "case", that takes precedence over the object-level search_case attribute value. See "vlookup" help.

Value by default: 0 (by default searches are case insensitive)

Example:

        my $csv = Text::AutoCSV->new(in_file => 'in.csv', search_case => 1);
search_trim

If true, searches ignore the presence of leading or trailing spaces in values.

The search functions can also be called with the option "trim", that takes precedence over the object-level search_trim attribute value. See "vlookup" help.

Value by default: 1 (by default searches ignore leading and trailing spaces)

Example:

        my $csv = Text::AutoCSV->new(in_file => 'in.csv', search_trim => 0);
search_ignore_empty

If true, empty fields are not included in the search indexes.

The search functions can also be called with the option "ignore_empty", that takes precedence over the object-level search_ignore_empty attribute value. See "vlookup" help.

Value by default: 1 (by default, search of the value '' will find nothing)

Example:

        my $csv = Text::AutoCSV->new(in_file => 'in.csv', search_ignore_empty => 0);
search_ignore_accents

If true, accents are ignored by search indexes.

The search functions can also be called with the option "ignore_accents", that takes precedence over the object-level search_ignore_accents attribute value. See "vlookup" help.

Value by default: 1 (by default, accents are ignored by search functions)

Example:

        my $csv = Text::AutoCSV->new(in_file => 'in.csv', search_ignore_accents => 0);
search_value_if_not_found

When a search is done with a unique value to return (field_add_link member function behavior or return value of vlookup), default value of option "value_if_not_found". See "vlookup".

search_value_if_found

When a search is done with a unique value to return (field_add_link member function behavior or return value of vlookup), default value of option "value_if_found". See "vlookup".

IMPORTANT

This attribute is extremly unusual. Once you've provided it, all vlookups and the target field value of fields created with field_add_link will all be populated with the value provided with this option.

Don't use it unless you know what you are doing.

search_ignore_ambiguous

When a search is done with a unique value to return (field_add_link member function behavior or return value of search_1hr and vlookup), default value of option "ignore_ambiguous". See "vlookup".

search_value_if_ambiguous

When a search is done with a unique value to return (field_add_link member function behavior or return value of vlookup), default value of option "value_if_ambiguous". See "vlookup".

fields_dates

Array ref of field names that contain a date.

Once the formats of these fields is known (auto-detection by default), each of these fields will get a specific "in_map" sub that converts the text in a DateTime object and a "out_map" sub that converts back from DateTime to text.

NOTE

The "out_map" given to a DateTime field is "defensive code": normally, "in_map" converts text into a DateTime object and "out_map" does the opposite, it takes a DateTime object and converts it to text. If ever "out_map" encounters a value that is not a DateTime object, it'll just stringify it (evaluation in a string context), without calling its DateTime formatter.

If the format cannot be detected for a given field, output an error message and as always when an error occurs, croak (unless "croak_if_error" got set to 0).

Value by default: none

Example:

        my $csv = Text::AutoCSV->new(in_file => 'logins.csv',
                fields_dates => ['LASTLOGIN', 'CREATIONDATE']);
fields_dates_auto

Boolean value. If set to 1, will detect dates formats on all fields. Fields in which a DateTime format got detected are then managed as if they had been being listed in "fields_dates" attribute: they get an appropriate "in_map" sub and a "out_map" sub to convert to and from DateTime (see "fields_dates" attribute above).

fields_dates_auto looks for DateTime on all fields, but it expects nothing: it won't raise an error if no field is found that contains DateTime.

Value by default: 0

Example:

        my $csv = Text::AutoCSV->new(in_file => 'logins.csv', fields_dates_auto => 1);
fields_dates_auto_optimize

Relevant only if "fields_dates_auto" is set.

Normally when "fields_dates_auto" is set, the input is read completely to make sure auto-detection produces a reliable result. If fields_dates_auto_optimize is set, this reading pass will stop as soon as there is no ambiguity left. That is, for every fields in input, the date format (or the fact that no date format is suitable) is known.

Using this option is a bit risky because it could trigger a date format detection that later in the input, would turn out to be wrong. Should that be the case, strange errors will occur, that are not easy to understand. Use it at your own risk.

Value by default: 0

Example:

        my $csv = Text::AutoCSV->new(in_file => 'logins.csv', fields_dates_auto => 1,
                fields_dates_auto_optimize => 1);
dates_formats_to_try

Array ref of string formats.

Text::AutoCSV has a default built-in list of 20 date formats to try and 6 time formats (also it'll combine any date format with any time format).

dates_formats_to_try will replace Text::AutoCSV default format-list will the one you specify, in case the default would not produce the results you expect.

The formats are written in Strptime format.

Value by default (see below about the role of the pseudo-format ''):

        [ '',
        '%Y-%m-%d',
        '%Y.%m.%d',
        '%Y/%m/%d',
        '%m.%d.%y',
        '%m-%d-%Y',
        '%m.%d.%Y',
        '%m/%d/%Y',
        '%d-%m-%Y',
        '%d.%m.%Y',
        '%d/%m/%Y',
        '%m-%d-%y',
        '%m/%d/%y',
        '%d-%m-%y',
        '%d.%m.%y',
        '%d/%m/%y',
        '%Y%m%d%H%M%S',
        '%b %d, %Y',
        '%b %d %Y',
        '%b %d %T %Z %Y',
        '%d %b %Y',
        '%d %b, %Y' ]

IMPORTANT

The empty format (empty string) has a special meaning: when specified, Text::AutoCSV will be able to identify fields that contain only a time (not preceeded by a date).

Note

Format identification is over only when there is no more ambiguity. So the usual pitfall of US versus French dates (month-day versus day-month) gets resolved only when a date is encountered that disambiguates it (a date of 13th of the month or later).

Example with a weird format that uses underscores to separate elements, using either US (month, day, year), French (day, month, year), or international (year, month, day) order:

        my $csv = Text::AutoCSV->new(in_file => 'logins.csv',
                dates_formats_to_try => ['%d_%m_%Y', '%m_%d_%Y', '%Y_%m_%d']);
dates_formats_to_try_supp

Same as "dates_formats_to_try" but instead of replacing the default list of formats used during detection, it is added to this default list.

You want to use this attribute if you need a specific DateTime format while continuing to benefit from the default list.

IMPORTANT

Text::AutoCSV will identify a given Datetime format only when there is no ambiguity, meaning, one unique Datetime format matches (all other failed). Adding a format that already exists in the default list will prevent the format from being identified, as it'll always be ambiguous. See "dates_formats_to_try" for the default list of formats.

Example:

        my $csv = Text::AutoCSV->new(in_file => 'logins.csv',
                dates_formats_to_try_supp => ['%d_%m_%Y', '%m_%d_%Y', '%Y_%m_%d']);
dates_ignore_trailing_chars

If set to 1, DateTime auto-detection will ignore trailing text that may follow detected DateTime-like text.

Value by default: 1 (do ignore trailing chars)

        my $csv = Text::AutoCSV->new(in_file => 'logins.csv', dates_ignore_trailing_chars => 0);
dates_search_time

If set to 1, look for times when detecting DateTime format. That is, whenever a date format candidate is found, a longer candidate that also contains a time (after the date) is tested.

Value by default: 1 (do look for times when auto-detecting DateTime formats)

Example:

        my $csv = Text::AutoCSV->new(in_file => 'logins.csv', dates_search_time => 0);
dates_locales

Comma-separated string of locales to test when detecting DateTime formats. Ultimately, Text::AutoCSV will try all combinations of date formats, times and locales.

Value by default: none (use perl default locale)

Example:

        my $csv = Text::AutoCSV->new(in_file => 'logins.csv', dates_locales => 'fr,de,en');
dates_zeros_ok

Boolean. If true, a date made only of 0s is regarded as being empty.

For example if dates_zeros_ok is False, then a date like 0000-00-00 will be always incorrect (as the day and month are out of bounds), therefore a format like '%Y-%m-%d' will never match for the field.

Conversely if dates_zeros_ok is true, then a date like 0000-00-00 will be processed as if being the empty string, thus the detection of format will work and when parsed, this "full of zeros" dates will be processed the same way as the empty string (= value will be undef).

IMPORTANT

"0s dates" are evaluated to undef when parsed, thus when converted back to text (out_map), they are set to an empty string, not to the original value.

Value by default: 1

Example:

        my $csv = Text::AutoCSV->new(in_file => 'in.csv', dates_zeros_ok => 0);
out_dates_format

Enforce the format of dates in output, for all fields that contain a DateTime value.

The format is written in Strptime format.

Value by default: none (by default, use format detected on input)

Example:

                # Detect any field containing a DateTime value and convert it to yyyy-mm-dd whatever the
                # input format is.
        Text::AutoCSV->new(in_file => 'in.csv', out_file => 'out.csv', fields_dates_auto => 1,
                out_dates_format => '%F')->write();
out_dates_locale

Taken into account only if "out_dates_format" is used.

Sets the locale to apply on "out_dates_format".

Value by default: none (by default, use the locale detected on input)

Example:

                # Detect any field containing a DateTime value and convert it to a US DateTime whatever the
                # input format is.
        Text::AutoCSV->new(in_file => 'in.csv', out_file => 'out.csv', fields_dates_auto => 1,
                out_dates_format => '%b %d, %Y, %I:%M:%S %p', out_dates_locale => 'en')->write();
croak_if_error

If true, stops the program execution in case of error.

IMPORTANT

Value by default: 1

If set to zero (croak_if_error => 0), errors are displayed as warnings. This printing can then be affected by setting the "quiet" attribute.

verbose

If true, get Text::AutoCSV to be a bit talkative instead of speaking only when warnings and errors occur. Verbose output is printed to STDERR by default, this can be tuned with the "infoh" attribute.

Value by default: 0

Example:

        my $csv = Text::AutoCSV->new(in_file => 'in.csv', verbose => 1);
infoh

File handle to display program's verbose output. Has effect *mainly* with attribute verbose => 1.

Note infoh is used to display extra information in case of error (if a field does not exist, Text::AutoCSV will display the list of existing fields). If you don't want such output, you can set infoh to undef.

Value by default: \*STDERR

Example:

        open my $infoh, ">", "log.txt";
        my $csv = Text::AutoCSV->new(in_file => 'in.csv', infoh => $infoh);
quiet

If true, don't display warnings and errors, unless croaking.

If "croak_if_error" attribute is set (as per default), still, Text::AutoCSV will produce output (on STDERR) when croaking miserably.

When using croak_if_error => 0, errors are processed as warnings and if "quiet" is set (in addition to "croak_if_error" being set to 0), there'll be no output. Note this way of working is not recommended, as things can go wrong without any notice to the caller.

Example:

        my $csv = Text::AutoCSV->new(in_file => 'in.csv', quiet => 1);
one_pass

If true, Text::AutoCSV will perform one reading of the input. If other readings are triggered, it'll raise an error and no reading will be done. Should that be the case (you ask Text::AutoCSV to do something that'll trigger more than one reading of input), Text::AutoCSV will croak as is always the case if an error occurs.

Normally Text::AutoCSV will do multiple reads of input to work out certain characteristics of the CSV: guess of encoding and guess of escape character.

Also if member functions like "field_add_link", "field_add_copy", "field_add_computed", "read" or "write" are called after input has already been read, it'll trigger further reads as needed.

If one wishes a unique read of the input to occur, one_pass attribute is to be set.

When true, encoding will be assumed to be the first one in the provided list ("encoding" attribute), if no encoding attribute is provided, it'll be the first one in the default list, to date, it is UTF-8.

When true, and if attribute "escape_char" is not set, escape_char will be assumed to be '\\' (backslash).

By default, one_pass is set if inh attribute is set (caller provides the input file handle of input) or if input file is stdin (in_file attribute not set or set to an empty string).

Value by default:

        0 if inh attribute is not set and in_file attribute is set to a non empty string
        1 if inh attribute is set or in_file is not set or set to an empty string

Example:

        my $csv = Text::AutoCSV->new(in_file => 'in.csv', one_pass => 1);

read

        $csv->read();

Read input entirely.

Return value

Returns the object itself in case of success. Returns undef if error.

Callback functions (when defined) are invoked, in the following order:

"read_post_update_hr", intended to do updates on fields values after each record read

"walker_ar", called after each record read, with an array ref of fields values

"walker_hr", called after each record read, with a hash ref of fields values

Example:

                # Do nothing - just check CSV can be read successfully
        Text::AutoCSV->new(in_file => 'in.csv')->read();

read_all_in_mem

        $csv->read_all_in_mem();

Created in version 1.1.5. Before, existed only as _read_all_in_mem, meaning, was private.

Read input entirely, as with "read" function, but enforcing content to be kept in-memory.

Having the content kept in-memory is implied by search functions ("vlookup" for example). With read_all_in_mem you can enforce this behavior without doing a fake search.

reset_next_record_hr

        $csv->reset_next_record_hr();

Reset the internal status to start from the beginning with "get_next_record_hr". Used in conjunction with "get_next_record_hr".

get_next_record_hr

        my $hr = $csv->get_next_record_hr(\$opt_key);

Get the next record content as a hash ref. $hr is undef when the end of records has been reached.

When specified, $opt_key is set to the current (returned) record key.

NOTE

You do not need to call "reset_next_record_hr" once before using get_next_record_hr.

Therefore "reset_next_record_hr" is useful only if you wish to restart from the beginning before you've reached the end of the records.

NOTE bis

"walker_hr" allows to execute some code each time a record is read, and it better fits with Text::AutoCSV philosophy. Using a loop with get_next_record_hr is primarily meant for Text::AutoCSV internal usage. Also when using this mechanism, you get very close to original Text::CSV logic, that makes Text::AutoCSV less useful.

Return value

A hashref of the record, or undef once there's no more record to return.

Example:

        while (my $hr = $csv->get_next_record_hr()) {
                say Dumper($hr);
        }

write

        $csv->write();

Write input into output.

Return value

Returns the object itself in case of success. Returns undef if error.

- If the content is not in-memory at the time write() is called:

Each record is read (with call of "read_post_update_hr", "walker_ar" and "walker_hr") and then written. The read-and-write is done in sequence, each record is written to output before the next record is read from input.

- If the content is in-memory at the time write() is called:

No "read" operation is performed, instead, records are directly written to output.

If defined, "out_filter" is called for each record. If the return value of "out_filter" is false, the record is not written.

Example:

                # Copy input to output.
                # As CSV is parsed in-between, this copy also checks a number of characteristics about the
                # input, as opposed to a plain file copy operation.
        Text::AutoCSV->new(in_file => 'in.csv', out_file => 'out.csv')->write();

out_header

        $csv->out_header($field, $header);

Set the header text of $field to $header.

By default, the input header value is rewritten as is to output. out_header allows you to change it.

Return value

Returns the object itself.

Example:

        Text::AutoCSV->new(in_file => 'in.csv', out_file => 'out.csv')
                ->out_header('LOGIN', 'Login')
                ->out_header('FULLNAME', 'Full Name')
                ->write();
        $csv->print_id();

Print out a description of input. Write to \*STDERR by default or to "infoh" attribute if set.

The description consists in a list of a few characteristics (CSV separator and the like) followed by the list of columns with the details of each.

Example of output:

If you go to the utils directory of this module and execute the following:

        ./csvcopy.pl -i f1.csv -l "1:,A->B,f2.csv" --id

You will get this output:

        -- f1.csv:
        sep_char:         ,
        escape_char:      \
        in_encoding:      UTF-8
        is_always_quoted: no

        # FIELD     HEADER    EXT DATA                            DATETIME FORMAT DATETIME LOCALE
        - -----     ------    --------                            --------------- ---------------
        0 TIMESTAMP timestamp                                     %Y%m%d%H%M%S
        1 A         a
        2 B         b
        3 C         c
        4 D         d                                             %d/%m/%Y
        5 1:SITE    1:SITE    link: f2.csv, chain: A->B->* (SITE)
        6 1:B       1:B       link: f2.csv, chain: A->B->* (B)

field_add_computed

        $csv->field_add_computed($new_field, $subref);

$new_field is the name of the created field.

$subref is a reference to a sub that'll calculate the new field value.

Return value

Returns the object itself in case of success. Returns undef if error.

Add a field calculated from other fields values. The subref runs like this:

        sub func {
                        # $new_field is the name of the field (allows to use one subref for more than one field
                        # calculation).
                        # $hr is a hash ref of fields values.
                        # $stats is a hash ref that gets printed (if Text::AutoCSV is created with verbose => 1)
                        # in the end.
                my ($new_field, $hr, $stats) = @_;

                my $field_value;
                # ... compute $field_value

                return $field_value;
        }

Example:

                # Add a field for the full name, made of the concatenation of the
                # first name and the last name.
        Text::AutoCSV->new(in_file => 'dirpeople.csv', out_file => 'dirwithfn.csv', verbose => 1)
                ->field_add_computed('FULLNAME', \&calc_fn)->write();
        sub calc_fn {
                my ($new_field, $hr, $stats) = @_;
                die "Man, you are in serious trouble!" unless $new_field eq 'FULLNAME';
                my $fn = $hr->{'FIRSTNAME'} . ' ' . uc($hr->{'LASTNAME'});
                $stats->{'empty full name'}++ if $fn eq ' ';
                return $fn;
        }

field_add_copy

        $csv->field_add_copy($new_field, $src_field, $opt_subref);

$new_field if the name of the new field.

$src_field is the name of the field being copied.

$opt_subref is optional. It is a reference to a sub that takes one string (the value of $src_field) and returns a string (the value assigned to $new_field).

Return value

Returns the object itself in case of success. Returns undef if error.

field_add_copy is a special case of "field_add_computed". The advantage of field_add_copy is that it relies on a sub that is Text::AutoCSV "unaware", just taking one string as input and returning another string as output.

IMPORTANT

The current field value is passed to field_add_copy in $_.

A call to

        $csv->field_add_copy($new_field, $src_field, $subref);

is equivalent to

        $csv->field_add_computed($new_field, \&subref2);
        sub subref2 {
                my (undef, $hr) = @_;
                local $_ = $hr->{$src_field};
                return $subref->();
        }

Example of a field copy + pass copied field in upper case and surround content with <<>>:

        my $csv = Text::AutoCSV->new(in_file => 'dirpeople.csv', out_file => 'd2.csv');
        $csv->field_add_copy('UCLAST', 'LASTNAME', \&myfunc);
        $csv->write();
        sub myfunc { s/^.*$/<<\U$&>>/; $_; }

Note that the calls can be chained as most member functions return the object itself upon success. The example above is equivalent to:

        Text::AutoCSV->new(in_file => 'dirpeople.csv', out_file => 'd2.csv')
                ->field_add_copy('UCLAST', 'LASTNAME', \&myfunc)
                ->write();
        sub myfunc { s/^.*$/<<\U$&>>/; $_; }
        $csv->field_add_link($new_field, $chain, $linked_file, \%opts);

$new_field is the name of the new field.

$chain is the CHAIN of the link, that is: 'LOCAL->REMOTE->PICK' where:

LOCAL is the field name to read the value from.

REMOTE is the linked field to find the value in. This field belongs to $linked_file.

PICK is the field from which to read the value of, in the record found by the search. This field belongs to $linked_file.

If $new_field is undef, the new field name is the name of the third field of $chain (PICK).

$linked_file is the name of the linked file, that gets read in a Text::AutoCSV object created on-the-fly to do the search on. $linked_file can also be a Text::AutoCSV object that you created yourself, allowing for more flexibility. Example:

        my $lcsv = Text::AutoCSV->new(in_file => 'logins.csv', case => 1);
        $csv->field_add_link('NAME', 'ID->LOGIN->DISPLAYNAME', $lcsv);

\%opts is a hash ref of optional attributes. The same values can be provided as with vlookup.

trim

If set to 1, searches will ignore leading and trailing spaces. That is, a LOCAL value of ' x ' will match with a REMOTE value of 'x'.

If option is not present, use "search_value_if_not_found" attribute of object (default value: 1).

Example:

        $csv->field_add_link('NAME', 'ID->LOGIN->DISPLAYNAME', 'logins.csv',
                { trim => 0 });
case

If set to 1, searches will take the case into account. That is, a LOCAL value of 'X' will not match with a REMOTE value of 'x'.

If option is not present, use "search_case" attribute of object (default value: 0).

Example:

        $csv->field_add_link('NAME', 'ID->LOGIN->DISPLAYNAME', 'logins.csv',
                { case => 1 });
ignore_empty

If set to 1, empty values won't match. That is, a LOCAL value of '' will not match with a REMOTE value of ''.

If option is not present, use "search_ignore_empty" attribute of object (default value: 1).

Example:

        $csv->field_add_link('NAME', 'ID->LOGIN->DISPLAYNAME', 'logins.csv',
                { ignore_empty => 0 });
value_if_not_found

If the searched value is not found, the value of the field is undef, that produces an empty string at write time. Instead, you can specify the value.

If option is not present, use "search_value_if_not_found" attribute of object (default value: undef).

Example:

        $csv->field_add_link('NAME', 'ID->LOGIN->DISPLAYNAME', 'logins.csv',
                { value_if_not_found => '<not found!>' });
value_if_found

If the searched value is found, you can specify the value to return.

If option is not present, use "search_value_if_found" attribute of object (default value: none).

NOTE

Although the PICK field is ignored when using this option, you must specify it any way.

Example:

        $csv->field_add_link('NAME', 'ID->LOGIN->DISPLAYNAME', 'logins.csv',
                { value_if_not_found => '0', value_if_found => '1' });
value_if_ambiguous

If the searched value is found in more than one record, the value of the field is undef, that produces an empty string at write time. Instead, you can specify the value.

If option is not present, use "search_value_if_ambiguous" attribute of object (default value: undef).

Example:

        $csv->field_add_link('NAME', 'ID->LOGIN->DISPLAYNAME', 'logins.csv',
                { value_if_ambiguous => '<ambiguous!>' });
ignore_ambiguous

Boolean value. If ignore_ambiguous is true and the searched value is found in more than one record, then, silently fall back on returning the value of the first record. Obviously if ignore_ambiguous is true, then the value of "value_if_ambiguous" is ignored.

If option is not present, use "search_ignore_ambiguous" attribute of object (default value: 1).

Example:

        $csv->field_add_link('NAME', 'ID->LOGIN->DISPLAYNAME', 'logins.csv',
                { ignore_ambiguous => 1 });

Example with multiple options:

        $csv->field_add_link('NAME', 'ID->LOGIN->DISPLAYNAME', 'logins.csv',
                { value_if_not_found => '?', ignore_ambiguous => 1 });

Return value

Returns the object itself in case of success. Returns undef if error.

Example of field_add_link usage:

        my $nom_compose = 0;
        my $zip_not_found = 0;
        Text::AutoCSV->new(in_file => 'pers.csv', walker_hr => \&walk)
                ->field_add_link('MYCITY', 'ZIP->ZIPCODE->CITY', 'zips.csv')->read();
        sub walk {
                my $hr = shift;
                $nom_compose++ if $hr->{'NAME'} =~ m/[- ]/;
                $zip_not_found++ unless defined($hr->{'MYCITY'});
        }
        print("Number of persons with a multi-part name: $nom_compose\n");
        print("Number of persons with unknown zipcode: $zip_not_found\n");
        $csv->links($prefix, $chain, $linked_file, \%opts);

$prefix is the name to add to joined fields

$chain is the JOINCHAIN of the link, that is: 'LOCAL->REMOTE' where:

LOCAL is the field name to read the value from.

REMOTE is the linked field to find the value in. This field belongs to $linked_file.

As opposed to "field_add_link", there is no PICK part, as all fields of target are read.

As opposed to Text::AutoCSV habits of croaking whenever a field name is duplicate, here, the duplicates are resolved by appending _2 to the joined field name if it already exists. If _2 already exists, too, then _3 is appended instead, and so on, until a non-duplicate is found. This mechanism is executed given the difficulty to control all field names when joining CSVs.

$linked_file and \%opts work exactly the same way as for "field_add_link", see "field_add_link" for help.

Return value

Returns the object itself in case of success. Returns undef if error.

NOTE

This function used to be called join but got renamed to avoid clash with perl' builtin join.

Example:

        Text::AutoCSV->new(in_file => 'pers.csv', out_file => 'pers_with_city.csv')
                ->links('Read from zips.csv:', 'ZIP->ZIPCODE', 'zips.csv')->write();

get_in_encoding

        my $enc = $csv->get_in_encoding();

Return the string of input encoding, for example 'latin2' or 'UTF-8', etc.

get_in_file_disp

        my $f = $csv->get_in_file_disp();

Return the printable name of in_file.

get_sep_char

        my $s = $csv->get_sep_char();

Return the string of the input CSV separator character, for example ',' or ';'.

get_escape_char

        my $e = $csv->get_escape_char();

Return the string of the input escape character, for example '"' or '\\'.

get_is_always_quoted

        my $a = $csv->get_is_always_quoted();

Return 1 if all fields of input are always quoted, 0 otherwise.

get_coldata

        my @cd = get_coldata();

Return an array that describes each column, from the first one (column 0) to the last.

Each element of the array is itself an array ref that contains 5 elements:

        0: Name of the field (as accessed in *_hr functions)
        1: Content of the field in the header line (if input has a header line)
        2: Column content type, shows some meta-data of fields created with field_add_* functions
        3: Datetime format detected, if ever, in the format Strptime
        4: Locale of DateTime format detected, if ever
        5: Multiline field: '1' if not, 'm' if newlines encountered in the field

get_pass_count

        my $n = $csv->get_pass_count();

Return the number of input readings done. Useful only if you're interested in Text::AutoCSV internals.

get_in_mem_record_count

        my $m = $csv->get_in_mem_record_count();

Return the number of records currently stored in-memory. Useful only if you're interested in Text::AutoCSV internals.

get_max_in_mem_record_count

        my $mm = $csv->get_max_in_mem_record_count();

Return the maximum number of records ever stored in-memory. Indeed this number can decrease: certain functions like field_add* member-functions discard in-memory content. Useful only if you're interested in Text::AutoCSV internals.

get_fields_names

        my @f = $csv->get_fields_names();

Return an array of the internal names of the columns.

get_field_name

        my $name = $csv->get_field_name($n);

Return the $n-th column name, the first column being number 0.

Example:

                # Get the field name of the third column
        my $col = $csv->get_field_name(2);

get_stats

        my %stats = $csv->get_stats();

Certain callback functions provide a parameter to record event count: "field_add_computed", "read_post_update_hr", "walker_ar" and "walker_hr". By default, these stats are displayed if Text::AutoCSV got created with attribute verbose => 1. get_stats() returns the statistics hash of the object.

IMPORTANT

As opposed to most functions that trigger input reading automatically (search functions and other get_* functions), get_stats just returns you the stats as it is, regardless of whether some execution already occured.

get_nb_rows

        my $nb_rows = $csv->get_nb_rows();

Gives the number of rows of the input. Does not trigger any reading - just provides the number of rows as known at the moment of the call. If unknown, return undef. Typically, the number of rows is known after doing the initial detection of CSV options (escape character, etc.), or, after doing one complete reading.

The header line counts for one row.

IMPORTANT

As some fields can contain new lines, this number is not necessarily identical to the number of lines.

set_walker_ar

        $csv->set_walker_ar($subref);

Normally one wants to define it at object creation time using "walker_ar" attribute. set_walker_ar allows to assign the attribute walker_ar after object creation.

See attribute "walker_ar" for help about the way $subref should work.

Return value

Returns the object itself in case of success. Returns undef if error.

Example:

                # Calculate the total of the two first columns, the first column being money in and the
                # second one being money out.
        my ($actif, $passif) = (0, 0);
        $csv->set_walker_ar(sub { my $ar = $_[0]; $actif += $ar->[0]; $passif += $ar->[1]; })->read();
        print("Actif =  $actif\n");
        print("Passif = $passif\n");

set_walker_hr

        $csv->set_walker_hr($subref);

Normally one wants to define it at object creation time using "walker_hr" attribute. set_walker_hr allows to assign the attribute "walker_hr" after object creation.

See attribute "walker_hr" for help about the way $subref should work.

Return value

Returns the object itself in case of success. Returns undef if error.

Example:

        my $csv = Text::AutoCSV->new(in_file => 'directory.csv', verbose => 1);

        # ...

        $csv->set_walker_hr(
                sub {
                        my ($hr, $stat) = @_;
                        $stat{'not capital name'}++, return if $hr->{'NAME'} ne uc($hr->{'NAME'});
                        $stat{'name is capital letters'}++;
                }
        )->read();

set_out_file

        $csv->set_out_file($out_file);

Normally one wants to define it at object creation time using "out_file" attribute. set_out_file allows to assign the attribute "out_file" after object creation. It is set to $out_file value.

Return value

Returns the object itself in case of success. Returns undef if error.

Example:

        $csv->set_out_file('mycopy.csv')->write();

get_keys

        my @allkeys = $csv->get_keys();

Returns an array of all the record keys of input. A record key is a unique identifier that designates the record.

At the moment it is just an integer being the record number, the first one (that comes after the header line) being of number 0. For example if $csv input is made of one header line and 3 records (that is, a 4-line file typically, if no record contains a line break), $csv->get_keys() returns

        (0, 1, 2)

IMPORTANT

If not yet done, this function causes the input to be read entirely and stored in-memory.

get_hr_all

        my @allin = $csv->get_hr_all();

Returns an array of all record contents of the input, each record being a hash ref.

IMPORTANT

If not yet done, this function causes the input to be read entirely and stored in-memory.

get_row_ar

        my $row_ar = $csv->get_row_ar($record_key);

Returns an array ref of the record designated by $record_key.

Example:

                # Get content (as array ref) of last record
        my @allkeys = $csv->get_keys();
        my $lastk = $allkeys[-1];
        my $lastrec_ar = $csv->get_row_ar($lastk);

IMPORTANT

If not yet done, this function causes the input to be read entirely and stored in-memory.

get_row_hr

        my $row_hr = $csv->get_row_hr($record_key);

Returns a hash ref of the record designated by $record_key.

Example:

                # Get content (as hash ref) of first record
        my @allkeys = $csv->get_keys();
        my $firstk = $allkeys[0];
        my $firstrec_hr = $csv->get_row_hr($firstk);

IMPORTANT

If not yet done, this function causes the input to be read entirely and stored in-memory.

get_cell

        my $val = $csv->get_cell($record_key, $field_name);

Return the value of the cell designated by its record key ($record_key) and field name ($field_name).

Example:

        my @allkeys = $csv->get_keys();
        my $midk = $allkeys[int($#allkeys / 2)];
        my $midname = $csv->get_cell($midk, 'NAME');

Note the above example is equivalent to:

        my @allkeys = $csv->get_keys();
        my $midk = $allkeys[int($#allkeys / 2)];
        my $midrec_hr = $csv->get_row_hr($midk);
        my $midname = $midrec_hr->{'NAME'};

IMPORTANT

If not yet done, this function causes the input to be read entirely and stored in-memory.

get_values

        my @vals = $csv->get_values($field_name, $opt_filter_subref);

Return an array made of the values of the given field name ($field_name), for every records, in the order of the records.

$opt_filter_subref is an optional subref. If defined, it is called with every values in turn (one call per value) and only values for which $opt_filter_subref returned True are included in the returned array. Call to $opt_filter_subref is done with $_ to pass the value.

Example:

        my @logins = $csv->get_values('LOGIN");

This is equivalent to:

        my @allkeys = $csv->get_keys();
        my @logins;
        push @logins, $csv->get_cell($_, 'LOGIN') for (@allkeys);

Example bis

                # @badlogins is the list of logins that contain non alphanumeric characters
        my @badlogins = Text::AutoCSV->new(in_file => 'logins.csv')
                ->get_values('LOGIN', sub { m/[^a-z0-9]/ });

This is equivalent to:

                # @badlogins is the list of logins that contain non alphanumeric characters
                # This method leads to carrying all values of a given field across function calls...
        my @badlogins = grep { m/[^a-z0-9]/ } (
                Text::AutoCSV->new(in_file => 'logins.csv')->get_values('LOGIN')
        );

IMPORTANT

If not yet done, this function causes the input to be read entirely and stored in-memory.

get_recnum

        my $r = $csv->get_recnum();

Returns the current record identifier, if a reading is in progress. If no read is in progress, return undef.

in_map

read_update_after

read_update_after is an alias of in_map.

        $csv->in_map($field, $subref);

After reading a record from input, update $field by calling $subref. The value is put in $_. Then the field value is set to the return value of $subref.

This feature is originally meant to manage DateTime fields: the input and output CSVs carry text content, and in-between, the values dealt with are DateTime objects.

See "out_map" for an example.

out_map

write_update_before

write_update_before is an alias of out_map.

        $csv->out_map($field, $subref);

Before writing $field field content into the output file, pass it through out_map. The value is put in $_. Then the return value of $subref is written in the output.

Example:

Suppose you have a CSV file with the convention that a number surrounded by parenthesis is negative. You can register corresponding "in_map" and "out_map" functions. During the processing of data, the field content will be just a number (positive or negative), while in input and in output, it'll follow the "negative under parenthesis" convention.

In the below example, we rely on convention above and add a new field converted from the original one, that follows the same convention.

        sub in_updt {
                return 0 if !defined($_) or $_ eq '';
                my $i;
                return -$i if ($i) = $_ =~ m/^\((.*)\)$/;
                $_;
        }
        sub out_updt {
                return '' unless defined($_);
                return '(' . (-$_) . ')' if $_ < 0;
                $_;
        }
        sub convert {
                return ;
        }
        Text::AutoCSV->new(in_file => 'trans-euros.csv', out_file => 'trans-devises.csv')
                ->in_map('EUROS', \&in_updt)
                ->out_map('EUROS', \&out_updt)
                ->out_map('DEVISE', \&out_updt)
                ->field_add_copy('DEVISE', 'EUROS', sub { sprintf("%.2f", $_ * 1.141593); } )
                ->write();
        my $found_ar = $csv->search($field_name, $value, \%opts);

Returns an array ref of all records keys where the field $field_name has the value $value.

\%opts is an optional hash ref of options for the search. See help of "vlookup" about options.

IMPORTANT

An unsuccessful search returns an empty array ref, that is, [ ]. Thus you cannot check for definedness of search return value to know whether or not the search found something.

On the other hand, you can always examine the value search(...)->[0], as search is always an array ref. If the search found nothing, then, search(...)->[0] is not defined.

IMPORTANT bis

If not yet done, this function causes the input to be read entirely and stored in-memory.

Example:

        my $linux_os_keys_ar = $csv->search('OS', 'linux');

search_1hr

        my $found_hr = $csv->search_1hr($field_name, $value, \%opts);

Returns a hash ref of the first record where the field $field_name has the value $value.

\%opts is an optional hash ref of options for the search. See help of "vlookup" about options.

Note the options "value_if_not_found" and "value_if_ambiguous" are ignored. If not found, return undef. If the result is ambiguous (more than one record found) and ignore_ambiguous is set to a false value, return undef.

The other options are taken into account as for any search: "ignore_ambiguous", "trim", "case", "ignore_empty".

IMPORTANT

As opposed to "search", an unsuccessful search_1hr will return undef.

IMPORTANT bis

If not yet done, this function causes the input to be read entirely and stored in-memory.

Example:

        my $hr = $csv->search_1hr('LOGIN', $login);
        my $full_name = $hr->{'FIRSTNAME'} . ' ' . $hr->{'LASTNAME'};

vlookup

        my $val = $csv->vlookup($searched_field, $value, $target_field, \%opts);

Find the first record where $searched_field contains $value and out of this record, returns the value of $target_field.

\%opts is optional. It is a hash of options for vlookup:

trim

If true, ignore spaces before and after the values to search.

If option is not present, use "search_trim" attribute of object (default value: 1).

case

If true, do case sensitive searches.

If option is not present, use "search_case" attribute of object (default value: 0).

ignore_empty

If true, ignore empty values in the search. The consequence is that you won't be able to find empty values by searching it.

If option is not present, use "search_ignore_empty" attribute of object (default value: 1).

ignore_accents

If true, ignore accents in searches. For exampe, if ignore_accents is set, a string like "élémentaire" will match "elementaire".

If option is not present, use "search_ignore_accents" attribute of object (default value: 1).

NOTE

This option uses the function "remove_accents" to build its internal hash tables. See "remove_accents" help for more details.

value_if_not_found

Return value if vlookup finds nothing.

If option is not present, use "search_value_if_not_found" attribute of object (default value: undef).

value_if_found

Return value if vlookup finds something.

If option is not present, use "search_value_if_found" attribute of object (default value: none).

This option is to just check whether a value exists, regardless of the target value found.

NOTE

Although the $target_field is ignored when using this option, you must specify it any way.

value_if_ambiguous

Return value if vlookup find more than one result. Tune it only if ignore_ambiguous is unset.

If option is not present, use "search_value_if_ambiguous" attribute of object (default value: undef).

ignore_ambiguous

If true, then if more than one result is found, silently return the first one.

If option is not present, use "search_ignore_ambiguous" attribute of object (default value: 1).

IMPORTANT

If not yet done, this function causes the input to be read entirely and stored in-memory.

Example:

        my $name = $csv->vlookup('LOGIN', $id, 'NAME', { value_if_not_found => '<login not found>' });

remove_accents

        my $t = $csv->remove_accents($s);

Take the string $s as argument and return the string without accents. Uses a Unicode decomposition followed by removal of every characters that have the Unicode property Nonspacing_Mark.

NOTE

Only accents are removed. It is not a whatever-encoding -> us-ascii conversion. For example, the French œ character (o followed by e) or the German ß (eszett) are kept as is.

NOTE bis

Tested with some latin1 and latin2 characters.

NOTE ter

There is no language-level transformation during accents removal. For example Jürgen is returned as Jurgen, not Juergen.

This function is not exported by default.

Example:

        use Text::AutoCSV qw(remove_accents);
        my $s = remove_accents("Français: être élémentaire, Tchèque: služba dům");
        die "This script will never die" if $s ne 'Francais: etre elementaire, Tcheque: sluzba dum';

AUTHOR

Sébastien Millet <milletseb@laposte.net>

COPYRIGHT AND LICENSE

This software is copyright (c) 2016, 2017 by Sébastien Millet.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.