Data::Validate::CSV - read and validate CSV
CSV Schema (JSON):
{ "@context": "http://www.w3.org/ns/csvw", "url": "countries.csv", "tableSchema": { "columns": [{ "name": "country", "datatype": { "base": "string", "length": 2 } },{ "name": "country group", "datatype": "string" },{ "name": "name (en)", "datatype": "string" },{ "name": "name (fr)", "datatype": "string" },{ "name": "name (de)", "datatype": "string" },{ "name": "latitude", "datatype": { "base": "number", "maximum": 90, "minimum": -90 } },{ "name": "longitude", "datatype": { "base": "number", "maximum": 180, "minimum": -180 } }] } }
CSV Data:
"at","eu","Austria","Autriche","Österreich","47.6965545","13.34598005" "be","eu","Belgium","Belgique","Belgien","50.501045","4.47667405" "bg","eu","Bulgaria","Bulgarie","Bulgarien","42.72567375","25.4823218"
Perl:
use Path::Tiny qw(path); use Data::Validate::CSV; my $table = Data::Validate::CSV::Table->new( schema => path('countries.csv-metadata.json'), input => path('countries.csv'), has_header => !!0, ); while (my $row = $table->get_row) { for my $e (@{$row->errors}) { warn $e; } printf( "%s is at latitude %f, longitude %f.\n", $row->get("name (en)")->value, $row->get("latitude")->value, $row->get("longitude")->value, ); }
There's not really a lot of documentation right now.
Mostly there's three interfaces you need to know about: tables, rows, and cells. (There are also columns, schemas, and notes, but for most day-to-day usage, those can be considered internal implementation details.)
The table is constructed with the following attributes:
schema
A schema for the table. Can be a hashref, a JSON string, a scalar ref to a JSON string, or a Path::Tiny path to a file containing the schema.
input
The CSV data for the table. Can be a filehandle, a scalar ref to a string of data, or a Path::Tiny path to a file.
has_header
A boolean indicating whether the CSV contains a header row. This will be used to supply any column names missing from the schema, and will be skipped from being returned by get_row.
get_row
reader
A coderef which, if given a filehandle, will return a parsed line of CSV. The default is basically something like:
sub { Text::CSV_XS->new->getline($_[0]) }
That's probably sufficient for most cases, but you may need to supply your own reader for handling tab-delimited files.
skip_rows
An integer, number of additional rows to skip before the header. Some CSV files contain a title or credit line. Defaults to 0.
skip_rows_after_header
An integer, number of additional rows to skip after the header. Defaults to 0.
The table provides the following methods:
Returns a row object for the next row of the table.
all_rows
Gets all the rows as a list.
row_count
The number of non-skipped, non-header lines read so far.
The rows returned by get_row and all_rows are blessed objects. They provide the following methods:
raw_values
The values returned by Text::CSV_XS without any further processing.
values
The values returned by Text::CSV_XS, processed by datatype. Date and time datatypes will be reformatted from any CLDR-based format to ISO 8601. Booleans using non-standard representations will be changed to "1" and "0". Fields that have a separator defined will be split into an arrayref. Numbers given as percentages will be divided by 100. And so forth.
cells
Returns the same values as values but wrapped in cell objects. The following are equivalent:
$row->values->[0]; $row->cells->[0]->value; $row->[0]; # $row overloads @{}
Why fetch a cell instead of directly fetching the value? The cell object offers a few other useful methods.
get($name)
Gets a single cell from the row by its name. Names are defined in the schema, or the header row if missing from the schema.
$row->get("country")->value;
row_number
The row number for this row in the table. Rows are numbered starting at 1. Headers and skipped rows are not counted.
key_string
For tables that has a primary key, this returns a string formed by joining together the primary key columns. It ought to be a unique identifier for this row within the table, and if it is not, this will be raised as an error.
errors
An arrayref of strings of errors associated with this row. This includes data validation problems.
It is possible to bypass using the cell interface and access cell values directly from the rows, but if accessing cells, these are the methods they provide:
raw_value
The value returned by Text::CSV_XS without any further processing.
value
The value returned by Text::CSV_XS, processed by datatype.
inflated_value
Like value but inflates some values to blessed objects. Date and time related datatypes will be returned as DateTime, DateTime::Incomplete, or DateTime::Duration objects. Booleans will be returned as JSON::PP::Boolean objects.
The row number for the cell's parent row in the table. Rows are numbered starting at 1. Headers and skipped rows are not counted.
col_number
The column number of this cell within the parent row. Columns are numbered starting at 1.
datatype
The datatype for this cell as a hashref.
Please report any bugs to http://rt.cpan.org/Dist/Display.html?Queue=Data-Validate-CSV.
https://www.w3.org/TR/2016/NOTE-tabular-data-primer-20160225/.
Toby Inkster <tobyink@cpan.org>.
This software is copyright (c) 2019 by Toby Inkster.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.
THIS PACKAGE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.
To install Data::Validate::CSV, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Data::Validate::CSV
CPAN shell
perl -MCPAN -e shell install Data::Validate::CSV
For more information on module installation, please visit the detailed CPAN module installation guide.