NAME
PDL::IO::CSV - Load/save PDL from/to CSV file (optimized for speed and large data)
SYNOPSIS
use PDL;
use PDL::IO::CSV ':all';
my $pdl = rcsv2D('input.csv');
$pdl *= 2;
wcsv2D($pdl, 'double.csv');
my ($pdl1, $pdl2, $pdl3) = rcsv1D('input.csv', [0, 1, 6]);
wcsv1D($pdl1, 'col2.csv');
#or
$pdl2->wcsv1D('col2.csv');
$pdl2->wcsv1D('col2_tabs.csv', {sep_char=>"\t"});
DESCRIPTION
The traditional way of creating PDL piddle from CSV data is via rcols function.
my $pdl = rcols("data.csv", [1..4], { DEFTYPE=>double, COLSEP=>"," });
This module provides alternative implementation based on Text::CSV_XS which should be significantly faster than traditional approach.
PDL::IO::CSV supports reading CSV data and creating PDL piddle(s) as well as saving PDL data to CSV file.
FUNCTIONS
By default, PDL::IO::CSV doesn't import any function. You can import individual functions like this:
use PDL::IO::CSV qw(rcsv2D wcsv2D);
Or import all available functions:
use PDL::IO::CSV ':all';
rcsv1D
Loads data from CSV file into 1D piddles (separate for each column).
my ($pdl1, $pdl2, $pdl3) = rcsv1D($csv_filename_or_filehandle);
#or
my ($pdl1, $pdl2, $pdl3) = rcsv1D($csv_filename_or_filehandle, \@column_ids);
#or
my ($pdl1, $pdl2, $pdl3) = rcsv1D($csv_filename_or_filehandle, \%options);
#or
my ($pdl1, $pdl2, $pdl3) = rcsv1D($csv_filename_or_filehandle, \@column_ids, \%options);
Parameters:
- csv_filename_or_filehandle
-
Path to CSV file to be loaded or a filehandle open for reading.
- column_ids
-
Optional column indices (0-based) defining which columns to load from CSV file. Default is
undef
which means to load all columns.
Items supported in options hash:
- type
-
Defines the type of output piddles:
double
,float
,longlong
,long
,short
,byte
. Default value isdouble
. BEWARE: type `longlong` can be used only on perls with 64bitint support.You can set one type for all columns/piddles:
my ($a, $b, $c) = rcsv1D($csv, {type => double});
or separately for each column/piddle:
my ($a, $b, $c) = rcsv1D($csv, {type => [long, double, double]});
- fetch_chunk
-
We do not try to load all CSV data into memory at once; we load them in chunks defined by this parameter. Default value is
40000
(CSV rows). - reshape_inc
-
As we do not try to load the whole CSV file into memory at once, we also do not know at the beginning how many rows there will be. Therefore we do not know how big piddle to allocate, we have to incrementally (re)allocated the piddle by increments defined by this parameter. Default value is
80000
.If you know how many rows there will be you can improve performance by setting this parameter to expected row count.
- empty2bad
-
Values
0
(default) or1
- convert empty cells to BAD values (there is a performance cost when turned on). If not enabled the BAD values are silently converted into0
. - text2bad
-
Values
0
(default) or1
- convert values that don't pass looks_like_number check to BAD values (there is a significant performance cost when turned on). If not enabled these non-numerical values are silently converted into0
. - header
-
Values
0
(default) or1
- consider the first line as column headers (it is ignored but must have the same count of columns as the rest of the CSV file). - decimal_comma
-
Values
0
(default) or1
- accept,
(comma) as a decimal separator (there is a performance cost when turned on). - encoding
-
Optional enconding e.g.
:utf8
(defaultundef
) that will be applied on input filehandle. - debug
-
Values
0
(default) or1
- turn on/off debug messages - sep_char
-
Value separator, default value
,
(comma). - and all other options valid for new method of Text::CSV_XS
rcsv2D
Loads data from CSV file into 2D piddle.
my $pdl = rcsv2D($csv_filename_or_filehandle);
#or
my $pdl = rcsv2D($csv_filename_or_filehandle, \@column_ids);
#or
my $pdl = rcsv2D($csv_filename_or_filehandle, \%options);
#or
my $pdl = rcsv2D($csv_filename_or_filehandle, \@column_ids, \%options);
Parameters and items supported in options
hash are the same as by "rcsv1D".
wcsv1D
Saves data from one or more 1D piddles to CSV file.
wcsv1D($pdl1, $pdl2, $pdl3, $csv_filename_or_filehandle, \%options);
#or
wcsv1D($pdl1, $pdl2, $pdl3, $csv_filename_or_filehandle);
#or
wcsv1D($pdl1, $pdl2, \%options); #prints to STDOUT
#or
wcsv1D($pdl1, $pdl2);
# but also as a piddle method
$pdl1D->wcsv1D("file.csv");
Parameters:
- piddles
-
One or more 1D piddles. All has to be 1D but may have different count of elements.
- csv_filename_or_filehandle
-
Path to CSV file to write to or a filehandle open for writing. Default is STDOUT.
Items supported in options hash:
- header
-
Arrayref with values that will be printed as the first CSV line.
- bad2empty
-
Values
0
or1
(default) - convert BAD values into empty strings (there is a performance cost when turned on). - encoding
-
Optional enconding e.g.
:utf8
(defaultundef
) that will be applied on output filehandle. - debug
-
Values
0
(default) or1
- turn on/off debug messages - sep_char
-
Value separator, default value
,
(comma). - eol
-
New line separator, default value
\n
(UNIX newline). - and all other options valid for new method of Text::CSV_XS
wcsv2D
Saves data from one 2D piddle to CSV file.
wcsv2D($pdl, $csv_filename_or_filehandle, \%options);
#or
wcsv2D($pdl, $csv_filename_or_filehandle);
#or
wcsv2D($pdl, \%options); #prints to STDOUT
#or
wcsv2D($pdl);
# but also as a piddle method
$pdl->wcsv2D("file.csv");
Parameters and items supported in options
hash are the same as by "wcsv1D".
SEE ALSO
LICENSE
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
COPYRIGHT
2014+ KMX <kmx@cpan.org>