DBIx::AutoUpgrade::NativeStrings - automatically upgrade Perl native strings to utf8 before sending them to the database
use utf8; use DBI; use DBIx::AutoUpgrade::NativeStrings; use Encode; my $injector = DBIx::AutoUpgrade::NativeStrings->new(native => 'cp1252'); my $dbh = DBI->connect(@dbi_connection_params); $injector->inject_callbacks($dbh); # these strings are semantically equal, but have different internal representations my $str_utf8 = "il était une bergère, elle vendait ses œufs en ¥, ça paie 5¾ ‰ de mieux qu’en €", my $str_native = decode('cp1252', $str_utf8, Encode::LEAVE_SRC); # Oracle example : check if strings passed to the database are equal my $sql = "SELECT CASE WHEN ?=? THEN 'EQ' ELSE 'NE' END FROM DUAL"; my ($result) = $dbh->selectrow_array($sql, {}, $str_native, $str_utf8); # returns 'EQ'
This module intercepts calls to DBI methods for automatically converting Perl native strings to utf8 strings before they go to the DBD driver.
There are two situations where it is useful :
Some DBD drivers do not comply with this DBI specification :
Perl supports two kinds of strings: Unicode (utf8 internally) and non-Unicode (defaults to iso-8859-1 if forced to assume an encoding). Drivers should accept both kinds of strings and, if required, convert them to the character set of the database being used. Similarly, when fetching from the database character data that isn't iso-8859-1 the driver should convert it into utf8.
For example with DBD::Oracle v1.83 and with a client charset set to AL32UTF8, native string with characters in the range 128 .. 255 are not converted to utf8 strings; therefore characters in that range become Unicode code points in block C1 control codes, without any graphical display, which is not their intended meaning.
AL32UTF8
Drivers that do attempt to comply with the DBI specification, like for example DBD::SQLite or DBD::Pg, perform an automatic upgrade of native strings ... assuming that the native character set is iso-8859-1 (Latin-1). However some platforms have different native character sets; in particular, the default "codepage" on Windows machines is Windows-1252, where code points in the range 128-159 are mapped to various graphical characters. So if your native strings assume Windows-1252 encoding, such characters will not be stored correctly within the database server.
With the present module, clients explicitly specify at initialization time what is the native encoding. From that, the module automatically converts native strings to their proper Unicode counterpart before sending them to the database.
Of course this only makes sense when the connection to the database is in Unicode mode. Each DBD driver has its own specific way of setting the character set used for the connection; so be sure to properly tune your DBD driver when using the present module.
my $injector = DBIx::AutoUpgrade::NativeStrings->new(%options);
Constructor for a callback injector object. Options are :
The name of the native encoding. This should be either
a valid Perl encoding name, as listed in Encode::Encodings. Strings will be converted through "decode" in Encode;
the string 'locale', which will invoke Encode::Locale to automatically guess what is the native encoding;
'locale'
the string 'default', which will use the default Perl upgrading mechanism through "utf8::upgrade" in utf8. This is the default value. It works well for latin-1 (iso-8859-1), but not for other native encodings.
'default'
A bitmask passed as third argument to "decode" in Encode (see "List of CHECK values" in Encode). Default is undef.
undef
An optional coderef that will be called as $debug->($message). Default is undef. A simple debug coderef could be :
$debug->($message)
my $injector = DBIx::AutoUpgrade::NativeStrings->new(debug => sub {warn @_, "\n"});
An optional arrayref containing the list of $dbh method names that will receive a callback. The default list is :
$dbh
do prepare selectrow_array selectrow_arrayref selectrow_hashref selectall_arrayref selectall_array selectall_hashref selectcol_arrayref
An optional arrayref containing the list of $sth method names that will receive a callback. The default list is :
$sth
bind_param bind_param_array execute execute_array
An optional coderef that decides what to do with calls to the ternary form of "bind_param" in DBI, i.e.
$sth->bind_param($position, $value, $bind_type);
If $coderef->($bind_type) returns true, the $value is treated as a string and will be upgraded if needed, like arguments to other method calls; if the coderef returns false, the $value is left intact.
$coderef->($bind_type)
$value
The default coderef returns true when the $bind_type is one of the DBI constants SQL_CHAR, SQL_VARCHAR, SQL_LONGVARCHAR, SQL_WLONGVARCHAR, SQL_WVARCHAR, SQL_WCHAR or SQL_CLOB.
$bind_type
SQL_CHAR
SQL_VARCHAR
SQL_LONGVARCHAR
SQL_WLONGVARCHAR
SQL_WVARCHAR
SQL_WCHAR
SQL_CLOB
$injector->inject_callbacks($dbh);
Injects callbacks into the given database handle. If that handle already has callbacks for the same methods, the system will arrange for those other callbacks to be called after all string arguments have been upgraded to utf8.
Although I'm a big fan of Moose and its variants, the present module is implemented in POPO (Plain Old Perl Object) : since the object model is extremely simple, there was no ground for using a sophisticated object system.
String arguments to DBI methods are modified in-place. It is unlikely that this would affect your client program, but if it does, you need to make your own string copies before passing them to the DBI methods.
DBI does not precisely document which of its public methods call each other. For example, one would think that execute() internally calls bind_param(), but this does not seem to be the case. So, to be on the safe side, callbacks installed here make no assumptions about string transformations performed by other callbacks. There might be some redundancies, but it does no harm since strings are never upgraded twice.
execute()
bind_param()
The bind_param_inout() method is not covered -- the client program must do the proper updates if that method is used to send strings to the database.
bind_param_inout()
Laurent Dami, <dami at cpan.org>
Copyright 2023 by Laurent Dami.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
To install DBIx::AutoUpgrade::NativeStrings, copy and paste the appropriate command in to your terminal.
cpanm
cpanm DBIx::AutoUpgrade::NativeStrings
CPAN shell
perl -MCPAN -e shell install DBIx::AutoUpgrade::NativeStrings
For more information on module installation, please visit the detailed CPAN module installation guide.