NAME
DBIx::Oracle::UpgradeUtf8 - automatically upgrade Perl strings to utf8 before sending them to DBD::Oracle
SYNOPSIS
use DBI;
use DBIx::Oracle::UpgradeUtf8;
my $injector = DBIx::Oracle::UpgradeUtf8->new;
my $dbh = DBI->connect(@oracle_connection_params); # see L<DBD::Oracle> for details
$injector->inject_callbacks($dbh);
# these strings are semantically equal, but have different internal representations
my $str = "il était une bergère";
my $str_native = $str; utf8::downgrade($str_native);
my $str_utf8 = $str; utf8::upgrade($str_utf8);
# Check if strings passed to Oracle are equal
my $sql = "SELECT CASE WHEN ?=? THEN 'EQ' ELSE 'NE' END FROM DUAL";
my ($result) = $dbh->selectrow_array($sql, {}, $str_native, $str_utf8); # returns 'EQ'
DESCRIPTION
This module is a workaround for a deficiency in DBD::Oracle. As of v1.83, the driver doesn't comply with this specification in the DBI documentation :
Perl supports two kinds of strings: Unicode (utf8 internally) and non-Unicode (defaults to iso-8859-1 if forced to assume an encoding). Drivers should accept both kinds of strings and, if required, convert them to the character set of the database being used. Similarly, when fetching from the database character data that isn't iso-8859-1 the driver should convert it into utf8.
DBD drivers like DBD::Sqlite and DBD::Pg comply with the specification: non-Unicode strings in Perl programs are correctly encoded into utf8 before being passed to the database. By contrast, DBD::Oracle behaves as follows when the client character set is Unicode (as set through the NLS_LANG
environment variable) :
strings coming from the database are properly flagged as utf8 for Perl;
Perl Unicode strings are properly sent to the database;
Perl non-Unicode strings (i.e. without the utf8 flag) are not encoded into utf8 before being sent to the database. As a result, characters in range 126-255 in native strings are not properly treated on the server side.
This problem has been signaled in a github issue and in a StackOverflow question. It is not clear when (if ever) it will be fixed.
The present module implements a workaround, thanks to the callbacks facility in DBI's architecture : callbacks intercept method calls at the DBI level, and force all string arguments to be in utf8 before passing them to DBD::Oracle.
Actually this module could also be used with other DBD drivers; in spite of the module's name, there is nothing in the code that is specially bound to Oracle. I do not know if otther Perl DBD drivers suffer from the same deficiency.
METHODS
new
my $injector = DBIx::Oracle::UpgradeUtf8->new(%options);
Constructor for a callback injector object. Options are :
- debug
-
An optional coderef that will be called as
$debug->($message)
. Default isundef
. A simple debug coderef could be :my $injector = DBIx::Oracle::UpgradeUtf8->new(debug => sub {warn @_, "\n"});
- dbh_methods
-
An optional arrayref containing the list of
$dbh
method names that will receive a callback. The default list is :do prepare selectrow_array selectrow_arrayref selectrow_hashref selectall_arrayref selectall_array selectall_hashref selectcol_arrayref
- sth_methods
-
An optional arrayref containing the list of
$sth
method names that will receive a callback. The default list is :bind_param bind_param_array execute execute_array
inject_callbacks
$injector->inject_callbacks($dbh);
Injects callbacks into the given database handle. If that handle already has callbacks for the same methods, the system will arrange for those other callbacks to be called after all string arguments have been upgraded to utf8.
ARCHITECTURAL NOTES
Object-orientedness
Although I'm a big fan of Moose and its variants, the present module is implemented in POPO (Plain Old Perl Object) : since the object model is extremely simple, there was no ground for using a sophisticated object system.
Strings are modified in-place
String arguments to DBI methods are modified through utf8::upgrade()
, which modifies strings in-place. It is very unlikely that this would affect your client program, but if it does, you need to make your own string copies before passing them to the DBI methods.
Possible redundancies
DBI does not precisely document which of its public methods call each other. For example, one would think that execute()
internally calls bind_param()
, but this does not seem to be the case. So, to be on the safe side, callbacks installed here make no assumptions about string transformations performed by other callbacks. There might be some redundancies, but it does no harm since strings are never upgraded twice.
Caveats
The bind_param_inout()
method is not covered -- the client program must do the proper updates if that method is used to send strings to the database.
AUTHOR
Laurent Dami, <dami at cpan.org>
COPYRIGHT AND LICENSE
Copyright 2023 by Laurent Dami.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.