The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

DBIx::Oracle::UpgradeUtf8 - automatically upgrade Perl strings to utf8 before sending them to DBD::Oracle

SYNOPSIS

  use DBI;
  use DBIx::Oracle::UpgradeUtf8;
  
  my $injector = DBIx::Oracle::UpgradeUtf8->new;
  my $dbh = DBI->connect(@oracle_connection_params); # see L<DBD::Oracle> for details
  $injector->inject_callbacks($dbh);
  
  # these strings are semantically equal, but have different internal representations
  my $str        = "il était une bergère";
  my $str_native = $str; utf8::downgrade($str_native);
  my $str_utf8   = $str; utf8::upgrade($str_utf8);
  
  # Check if strings passed to Oracle are equal
  my $sql = "SELECT CASE WHEN ?=? THEN 'EQ' ELSE 'NE' END FROM DUAL";
  my ($result) = $dbh->selectrow_array($sql, {}, $str_native, $str_utf8); # returns 'EQ'

DESCRIPTION

This module is a workaround for a deficiency in DBD::Oracle. As of v1.83, the driver doesn't comply with this specification in the DBI documentation :

    Perl supports two kinds of strings: Unicode (utf8 internally) and non-Unicode (defaults to iso-8859-1 if forced to assume an encoding). Drivers should accept both kinds of strings and, if required, convert them to the character set of the database being used. Similarly, when fetching from the database character data that isn't iso-8859-1 the driver should convert it into utf8.

DBD drivers like DBD::Sqlite and DBD::Pg comply with the specification: non-Unicode strings in Perl programs are correctly encoded into utf8 before being passed to the database. By contrast, DBD::Oracle behaves as follows when the client character set is Unicode (as set through the NLS_LANG environment variable) :

  • strings coming from the database are properly flagged as utf8 for Perl;

  • Perl Unicode strings are properly sent to the database;

  • Perl non-Unicode strings (i.e. without the utf8 flag) are not encoded into utf8 before being sent to the database. As a result, characters in range 126-255 in native strings are not properly treated on the server side.

This problem has been signaled in a github issue and in a StackOverflow question. It is not clear when (if ever) it will be fixed.

The present module implements a workaround, thanks to the callbacks facility in DBI's architecture : callbacks intercept method calls at the DBI level, and force all string arguments to be in utf8 before passing them to DBD::Oracle.

Actually this module could also be used with other DBD drivers; in spite of the module's name, there is nothing in the code that is specially bound to Oracle. I do not know if otther Perl DBD drivers suffer from the same deficiency.

METHODS

new

  my $injector = DBIx::Oracle::UpgradeUtf8->new(%options);

Constructor for a callback injector object. Options are :

debug

An optional coderef that will be called as $debug->($message). Default is undef. A simple debug coderef could be :

  my $injector = DBIx::Oracle::UpgradeUtf8->new(debug => sub {warn @_, "\n"});
dbh_methods

An optional arrayref containing the list of $dbh method names that will receive a callback. The default list is :

  do
  prepare
  selectrow_array
  selectrow_arrayref
  selectrow_hashref
  selectall_arrayref
  selectall_array
  selectall_hashref
  selectcol_arrayref
sth_methods

An optional arrayref containing the list of $sth method names that will receive a callback. The default list is :

  bind_param
  bind_param_array
  execute
  execute_array

inject_callbacks

  $injector->inject_callbacks($dbh);

Injects callbacks into the given database handle. If that handle already has callbacks for the same methods, the system will arrange for those other callbacks to be called after all string arguments have been upgraded to utf8.

ARCHITECTURAL NOTES

Object-orientedness

Although I'm a big fan of Moose and its variants, the present module is implemented in POPO (Plain Old Perl Object) : since the object model is extremely simple, there was no ground for using a sophisticated object system.

Strings are modified in-place

String arguments to DBI methods are modified through utf8::upgrade(), which modifies strings in-place. It is very unlikely that this would affect your client program, but if it does, you need to make your own string copies before passing them to the DBI methods.

Possible redundancies

DBI does not precisely document which of its public methods call each other. For example, one would think that execute() internally calls bind_param(), but this does not seem to be the case. So, to be on the safe side, callbacks installed here make no assumptions about string transformations performed by other callbacks. There might be some redundancies, but it does no harm since strings are never upgraded twice.

Caveats

The bind_param_inout() method is not covered -- the client program must do the proper updates if that method is used to send strings to the database.

AUTHOR

Laurent Dami, <dami at cpan.org>

COPYRIGHT AND LICENSE

Copyright 2023 by Laurent Dami.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.