The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Net::Wire10 - Pure Perl driver for MySQL, Sphinx and Drizzle.

DESCRIPTION

Net::Wire10 is a Pure Perl connector that talks to MySQL, Sphinx and Drizzle servers.

Net::Wire10 implements the low-level network protocol, alias the MySQL wire protocol version 10, necessary for talking to one of the aforementioned servers without using an external client library such as for example libmysqlclient or libdrizzle.

SYNOPSIS

  use Net::Wire10;

  my $wire = Net::Wire10->new(
      host     => 'localhost',
      user     => 'test',
      password => 'test',
      database => 'test'
  );

  $wire->connect;

  # CREATE TABLE example
  $wire->query(
      "CREATE TABLE foo (id INT, message TEXT)"
  );

  # INSERT example
  $wire->query(
      "INSERT INTO foo (id, message) VALUES (1, 'Hello World')"
  );
  printf "Affected rows: %d\n", $wire->get_no_of_affected_rows;

  # SELECT example
  my $results = $wire->query("SELECT * FROM foo");
  while (my $row = $results->next_hash) {
    printf
      "Id: %s, Message: %s\n",
      $row->{id},
      $row->{message}
        ;
  }

  $wire->disconnect;

INSTALLATION

Net::Wire10 is installed like any other CPAN perl module:

  $ perl -MCPANPLUS -eshell
  CPAN Terminal> install Net::Wire10

For Perl installations where the CPAN module (used above) is missing, you can also just download the .tar.gz from this site and drop the Net folder in the same folder as the Perl file you want to use the driver from.

Some (particularly commercial) Perl distributions may have their own package management systems. Refer to the documentation that comes with your particular Perl distribution for details.

USAGE

From Perl you can begin using the driver like this:

  use Net::Wire10;

After that you can connect to a server and start interacting.

Take a look at SYNOPSIS for a complete code example, or individual examples below.

Example: connect

Connection parameters are specified in the constructor to Net::Wire10. After constructing an instance using new(), call connect() to begin talking to the server.

  $wire = Net::Wire10->new(
    host     => $host,
    user     => $user,
    password => $password,
  );

  $wire->connect();

The most frequently used parameters are shown above. For additional parameters, refer to new.

Example: create table

  $wire->query(
    "CREATE TABLE foo (id INT, message TEXT)"
  );

Example: insert data

  $wire->query(
    "INSERT INTO foo (id, message) VALUES (1, 'Hello World')"
  );

Example: retrieve data

  my $results = $wire->query(
    "SELECT id, message FROM foo"
  );

Data is traversed in a row-by-row fashion. To retrieve each row of data, call next_hash() or next_array():

  my $row = $results->next_hash;

Do so in a loop to process all rows:

  while (my $row = $results->next_hash) {
    printf
      "Id: %d, Message: %s",
      $row->{id},
      $row->{message};
  }

Example: disconnect

  $wire->disconnect;

FEATURES

Features in the Net::Wire10 driver

new

Creates a new driver instance using the specified parameters.

  use Net::Wire10;
  use strict;
  use warnings;

  # Connect to Sphinx server on localhost
  my $wire = Net::Wire10->new(
      host       => 'localhost',
      port       => 9306,
      user       => 'test',
      password   => 'test',
    );

The argument hash can contain the following parameters:

host

Host name or IP address of the server to which a connection is desired.

port

TCP port where the server daemon listens. The port differs depending on the server type, so that you can have more than one type of server installed on the same machine.

  MySQL uses port 3306,
  Sphinx uses port 9306,
  Drizzle uses port 4427.

The default is 3306 (MySQL), which is the most commonly used at the moment.

Sphinx needs to be at least version 0.9.9-rc2 for SphinxQL to work. There's more information about SphinxQL here: http://sphinxsearch.com/docs/current.html#sphinxql.

database

Name of the initial default database, eg the database used for a query when that query does not specifically mention a database. The server may deny login if a database is given and the user does not have the necessary privileges to access that database.

user

Username for identifying the service or user to the database server. This will show up in the process list, which is useful if you need to see what or who is hogging resources on the server.

password

Password for authenticating the service or user to the database server.

connect_timeout

How long to wait before a connection attempt fails.

command_timeout

How long to wait before a query is aborted.

debug

Various informational messages will be printed to the console if a value of 1 is given. The exchanged network protocol messages ("mackets") will be printed to the console if a value of 2 is given. The exchanged TCP packets will be printed to the console if a value of 4 is given.

connect

Establishes or re-establishes a connection to the server.

ping

Pings the daemon on the connected server over the wire protocol.

After connecting, a ping() is useful to ensure that the connection is still alive. The current status of the connection can be looked up using is_connected().

query

The query() method transmits the specified SQL string to the server and obtains the response, including the full set of results.

A result set is returned. The obtained result set will be disconnected from the driver. Being disconnected means that after retrieving the result set, you can fire more queries or close the connection.

Use query() when a small amount of records is needed for use at an arbitrary later point in time. If you want to stream data to a live result set, including large amounts of data, see stream().

stream

The stream() method transmits the specified SQL string to the server, and obtains initial information about the response, but does not begin downloading data.

A result set is returned. The obtained result set will be live with the driver. After retrieving the result set, you must traverse all of its rows before you can fire another query.

Use stream() for large result sets, as it has a smaller memory footprint compared to query(). If you want to download data to a disconnected result set, use query().

Note that stream() will lock the driver until the whole result set has been retrieved. To fetch another set of results while streaming to a live result set, create another driver object.

Also note that if you are using MySQL with the default storage engine, MyISAM, the entire table on the server will be locked for the duration of the live result set, that is until all rows have been retrieved.

prepare

Given a SQL string, returns a prepared statement.

Prepared statements are useful for:

Inserting binary data in the database.

For each parameter, an indication of whether the parameter should be considered text or binary data can be given.

Avoiding bugs that could lead to SQL injections.

Separating query logic and parameters can be beneficial in securing against SQL injections. Because query parameters do not have to be manually quoted, there's less risk of forgetting to quote or using an insecure quoting mechanism.

Reusing query logic with multiple sets of parameters.

Prepared statements can be re-executed with a new set of parameters.

Database and database object names cannot be parameterized, because schema identifiers need to be escaped in a different way than literal strings, and there is currently no additional marker implemented for this purpose. Identifiers can be manually quoted using Net::Wire10::Util::quote_identifier().

cancel (thread safe)

Cancels the running query.

Safe to call asynchronously from a signal handler. Example:

  # Press CTRL-C to cancel this query.
  {
    local $SIG{INT} = sub { $wire->cancel; };
    $wire->query("SELECT SLEEP(10)");
  }

Safe to call asynchronously from another thread (thread safe). Example:

  use threads;
  use threads::shared;

  # we've chosen to use the threads::shared library, therefore
  # we need to share the driver object with this library.
  share $wire->{cancelling};

  # abort on some condition (sleep() for demonstration purposes)
  threads->create(sub {
    $wire->cancel if sleep 2;
  })->detach;

  # run query for 10 seconds (will be interrupted by above thread after 2 sec)
  $wire->query("SELECT SLEEP(10)");

Run the example above to see how cancel() works to interrupt the SLEEP(10) statement before it finishes, after only 2 seconds instead of the full 10.

Notice that the entire $wire object is not shared above. This is because threads::shared fails to share some of the driver's internal structures. Luckily, cancel() only needs the $wire-{cancelling}> variable to be shared between threads for it to work.

Another common way to kill a running query is to create another connection to the server, if possible, and use KILL QUERY (or KILL CONNECTION for older servers). Using the KILL commands require a number of things, for example that the server has available connection slots, server privileges, etc.

Cancel is normally used for interactive applications, where the user can press a key such as ESC to cancel a running query. For automated tasks, there is query_timeout that can be used instead. query_timeout can also be disabled, for example for interactive applications, by setting it to 0.

is_connected

Returns true after connect() has been called, for as long as no fatal errors has occurred.

After a fatal error, is_connected returns false.

A successful call to connect() causes is_connected to return true once again.

get_error_info

Returns a Net::Wire10::Error object if an error code and message is available from the server or the driver.

get_server_version

After connecting, this returns the server version.

get_connection_id

After connecting, this returns a unique identifier for the thread on the server that handles the current connection.

The connection identifier is useful for logging and debugging, killing queries or connections, for naming pseudo-temporary tables or other objects, and more.

disconnect

Transmits a goodbye message to the server to indicate intentions, then closes the underlying socket.

Features in Net::Wire10::PreparedStatement

set_parameter

Sets the parameter at a given index to a given value. The index corresponds to the relative position of a "?" marker in the prepared statement, with the first "?" marker being 1.

  $ps->set_parameter(2, 'Hello World');

There is no need to quote the parameter value, this is done automatically.

Binary data such as for example JPEG image files can also be added:

  $ps->set_parameter(3, $bindata, DATA_BINARY);

If the last parameter is specified as DATA_BINARY, the value given is taken as a binary string. Otherwise, the value is taken as either an iso8859-1 string or a Unicode string, depending on what Perl thinks about the string.

clear_parameter

Clears one or more parameters, or all parameters if nothing was specified.

  $ps->clear_parameter(1, 3);

get_token_count

Returns the number of "?" tokens found in the SQL initially used to create the prepared statement.

query

Execute the prepared statement using the parameters previously set with set_parameter(). All results are spooled before the call returns.

stream

Execute the prepared statement using the parameters previously set with set_parameter(). As soon as the initial metadata arrives from the database server, the call returns, and the results can be traversed in a streaming fashion.

Features in the Net::Wire10::Results iterator

A Net::Wire10::Results object is returned when calling query() or stream(). Depending on which was used to execute the SQL, either a disconnected result set or a live (streaming) result set is returned.

next_array

The next_array() method returns a whole row, with individual field values packed into an array. undef is returned once all rows has been extracted.

  while (my $row = $results->next_array) {
    printf
      "Value 1: %s Value 2: %s Value 3: %s\n",
      $row->[0],
      $row->[1],
      $row->[2];
  }

When the retrieved columns has been specifically named in the SELECT statement (rather than using the SELECT * wildcard), the position of each individual field in result set rows are known, and next_array() can be used to access field values based on their position.

Column name and order for a SELECT * query can be retrieved using get_column_info("name").

After calling next_array(), the row has been consumed. There is currently no way to rewind and re-read a row, even for a disconnected result set.

next_hash

The next_hash() method returns a whole row, with individual field values packed into a hash. The key of the hash is the name of the column that each field belongs to. undef is returned once all rows has been extracted.

  while (my $row = $results->next_hash) {
    printf
      "Id: %s Name: %s Data: %s\n",
      $row->{id},
      $row->{name},
      $row->{data};
  }

Using next_hash() instead of next_array() usually makes the code a bit more readable, especially in cases where a SELECT with column names is not nearby.

After calling next_hash(), the row has been consumed. There is currently no way to rewind and re-read a row, even for a disconnected result set.

spool

Reads the remaining rows of the result set, in effect turning a streaming result set into a disconnected result set.

flush

Reads the remaining rows of the result set and discards them. When done on a live result set, this frees the driver for use.

get_column_info

Return the names and other information of the result set's columns as an array.

If you want all information, call get_column_info with no parameters:

  my $column_info = $wire->get_column_info;
  my $first_col_name = $column_info[0]->{name};

If you just need the count of columns, you can ask Perl to reduce the array with "scalar":

  my $nr_cols = scalar $wire->get_column_info;

If you need just the name of each column, specify the wanted metadata item:

  my $column_names = $wire->get_column_info("name");
  my $first_col_name = $column_names[0];

The available metadata is: * name (column name or alias) * object (database object containing column, typically a TABLE or VIEW name) * orig_database (source database where data originally came from, if available) * orig_table (source table where data originally came from, if available) * orig_column (source column where data originally came from, if available) * collation (either binary, or utf8 with a given sort order and case sensitivity) * data_type (SQL data type) * decimal_scale (number of digits after the decimal point) * flags (SQL data type specification and other items)

has_results

Returns true if the query returned a set of results.

get_no_of_affected_rows

Returns the number of rows influenced by the UPDATE, DELETE or similar query.

  my $affected = $wire->get_no_of_affected_rows;

get_no_of_selected_rows

Returns the number of rows in the result set of a SELECT or similar query.

  my $selected = $wire->get_no_of_selected_rows;

After consuming a row, the number of available rows will decrease, and get_no_of_selected_rows() will return one less.

Supported only for disconnected result sets, live/streaming result sets are unaware of the total number of records. A streaming result set can be turned into a disconnected result set using spool(), after which the number of available rows becomes known.

get_insert_id

MySQL and Drizzle has the ability to choose unique key values automatically, by enabling auto_increment for a column. When this happens, the newly assigned id value for the last inserted row is stored in this attribute.

get_warning_count

After a query, this returns the number of warnings generated on the server. If the query is streaming to a live result set, an additional warning count is available after the last row of data has been read.

Features in the Net::Wire10::Error object

After an error has occurred, call $wire->get_error_info() to retrieve an Error object containing details of the problem.

get_error_code

Returns the server-reported (1xxx) error code, or a client (2xxx) error code.

get_error_state

Returns the SQL state code sent by the server when an error occurs, or HY000 for errors that does not have a standardized designation, or an empty string for client errors and some server errors.

get_error_message

Returns an error string sent by the server. Usually contains a mix of text corresponding to the error code (see above), in whatever language the server is started with, and some factual information from the query that caused the error or similar context.

ERROR HANDLING

There are two kind of errors in Net::Wire10, fatal and non-fatal errors. Fatal errors causes the connection to close, while non-fatal errors do not.

Non-fatal errors cause the active query to abort.

A query may also cause warnings and informational messages, these are per default only reported summarily via get_warning_count(). The actual messages can be retrieved out-of-band from the server with the SHOW WARNINGS command.

Example: catching errors

All errors can be caught with an eval {} construct. To differentiate between a fatal and a non-fatal error, use is_connected().

  # Create driver and connect
  $wire = Net::Wire10->new(host=>'localhost', user=>'test', password=>'test');
  $wire->connect;

  # Execute nonsensical query
  eval { $wire->query('Argle-bargle, glyp-glof?!'); };
  warn $@ if $@;
  print ($wire->is_connected ? "is" : "is not") . " connected.";

Example: recognizing fatal errors

Here's a query that causes a fatal error:

  # Execute query that kills the current connection
  eval { $wire->query('KILL CONNECTION CONNECTION_ID()'); };
  warn $@ if $@;
  print ($wire->is_connected ? "is" : "is not") . " connected.";

After running the above code, it is necessary to reconnect the driver before doing additional work.

Example: reconnecting

Once a fatal error has happened, it is trivial to reestablish a connection to the server.

  # Reconnect if necessary
  $wire->connect unless $wire->is_connected;

Notice that the connection id changes after this.

Example: multiple jobs

If stability is sought, always wrap your code with guard blocks in reasonable spots. For example:

  foreach $job (@processing_items) {
    # Process job
    eval {
      $wire->connect unless $wire->is_connected;
      $wire->query("SELECT ... interesting data ...");
      $wire->query("SELECT ... more interesting data ...");
      ... more processing ...
      $wire->query("UPDATE ... whatever ...");
    };
    # Handle errors
    warn "Failed to process item, continuing with next item: $@" if $@;
  }

In the above, if any of the jobs in processing_items fail, a warning is printed, and the program reconnects (if necessary) and continues with the next job.

Example: long term connections

If you have long-running jobs, always do a ping to check the connection after periods of inactivity. For example:

  # Process some job, then sleep and repeat
  while (1) {
    eval { $wire->ping if $wire->is_connected; };
    eval {
      # If the ping (above) failed, reconnect
      $wire->connect unless $wire->is_connected;
      # Actual process code
      $wire->query("SELECT ... interesting data ...");
      $wire->query("SELECT ... more interesting data ...");
      ... more processing ...
      $wire->query("UPDATE ... whatever ...");
    };
    # Handle errors
    warn "Something went wrong: $@" if $@;
    # Wait a while before going again
    sleep 300;
  }

In the above, some job is run once every 5 minutes. Before the job is run, the connection is checked with ping(). Should the connection have failed silently, the ping will provoke a fatal error and soon thereafter, connect() will reestablish the connection before processing starts.

These are much superior approaches to automatically reconnecting before firing each query (which some other drivers do). Automatic reconnect schemes hides error messages, looses connection state, drops transactions, can cause half a transaction to complete, can cause table locks to be silently dropped, and similar malfunctions.

If you really want, you can amend the driver object with an extra version of query() (give it a different name) that automatically reconnects, or you can wrap the entire driver in your own object to the same effect, but I much recommend against it. Proper guard blocks, with a connection check at the beginning (as above), is a much healthier practice.

Example: retrieving warnings

Here's an example of how to retrieve warnings generated by the server.

  # Random query that emits a warning
  my $sql = "SELECT CAST('abc' AS CHAR(1))";

  # Run query on an existing connection
  my $res = $wire->query($sql);

  # Check for warnings
  if ($res->get_warning_count > 0) {
    my $warnings = $wire->query("SHOW WARNINGS");
    while (my $msg = $warnings->next_hash) {
      printf
        "Code: %d, Message: %s",
        $msg->{Code},
        $msg->{Message};
    }
  }

TROUBLESHOOTING

Supported operating system and Perl versions

Over at CPAN Testers, there's a vast number of testers that do a very good job of figuring out which versions work together:

http://static.cpantesters.org/distro/N/Net-Wire10.html

Unsupported features

The following features are not supported.

Protocols other than version 10

Only protocol version 10 is supported.

Protocol revisions older than 4.1

The documentation, at the time of this writing, only covers the protocol as implemented in MySQL Server version 4.1. It is recommended to use the newest GA server release however, because version 4.1 has a number of Unicode bugs, particularly when using built-in SQL functions for text handling.

Transports other than TCP/IP

Shared memory is not supported. Might be easy to add, my guess would be that it's just a ring buffer and one or two locks to synchronize readers and writers.

Named pipe is not supported.

Unix socket is not supported.

Character sets other than UTF-8

Result data is binary strings for binary data and UTF-8 strings for textual data. Testing for a text versus a binary result value is accomplished using utf8::is_utf8().

SQL queries are given as strings, interpreted as either UTF-8 or iso8859-1 depending on what Perl thinks about the string. When using prepared statements, binary data is also supported for parameters. See set_parameter() for details.

It is possible to convert binary data to ASCII hex notation for inclusion directly in the query text:

  # Helper that converts binary to ASCII hex notation.
  sub raw_to_hex { return unpack("H*", shift); }

  # Convert to hex notation.
  $hex = raw_to_hex($binary_data);

  # Run a query containing hex data.
  my $res = $wire->query("SELECT $hex");

A more economical encoding than hex notation is BASE64. If the server does not support BASE64 natively, a decoder can be added using for example this stored procedure: http://wi-fizzle.com/downloads/base64.sql.

  # Contains helper that converts binary to BASE64 notation.
  use MIME::Base64 qw(encode_base64);

  # Convert to BASE64 notation.
  my $base64 = encode_base64($binary_data);

  # Run a query containing BASE64 data.
  my $res = $wire->query("SELECT BASE64_DECODE('$base64')");

An even more economical method is to use a prepared statement to send the binary data in its raw form.

Protocol compression via zlib

There is not much documentation regarding protocol compression, therefore it has not been implemented.

It is possible to add compression at the network level instead using a tunnel rather than the protocol's own support, similarly to stunnel described below, if network bandwidth is at a premium.

Another option is to selectively compress data with zlib before handing it off to the driver, and using the function UNCOMPRESS() to expand the data again once it reaches the server:

  # Contains helper that does zlib compression.
  use Compress::Zlib;

  # Compress some data.
  my $compressed = compress('Hello World!');

  # Create a prepared statement and bind the compressed data.
  my $ps = $wire->prepare('SELECT UNCOMPRESS(?)');
  $ps->set_parameter(1, $compressed, DATA_BINARY);

  # Run the query to decompress data on the server side.
  $res = $ps->query();

Verification of authenticity using SSL certificates

Verifying the server requires SSL support which is not currently implemented.

Verifying the client is not supported by the wire protocol.

SSL/TLS encrypted communication

There is not much documentation regarding protocol support for SSL/TLS, therefore it has not been implemented.

It is possible to add encryption at the network level by using a SSL/TLS wrapper such as "stunnel".

Stunnel provides a richer set of features than the current MySQL protocol supports, such as certificate-based authentication of the client in addition to the server.

Integrated support would be desirable because of simpler error handling and the possibility of performing key creation and certificate signing tasks via SQL.

Protocol-level prepared statements

The protocol design includes protocol-level support for server-side prepared statements. When implemented at a protocol level, these are faster than regular statements. Prepared statements execute faster because query parameters such as text does not need to be quoted before being sent to the server, and are not unquoted when arriving at the server.

Another performance benefit of prepared queries is that it is possible to perform multiple executions of the same query. The query logic is sent and parsed only once, saving a bit of network bandwidth and CPU cycles.

The current protocol design does not allow you to clear a single parameter at a time, but forces you to clear all parameters at once. When reusing a prepared statement for multiple executions, parameters that do not change from one execution to the next can therefore not be kept and has to be resent to the server for each execution of the query. This inefficiency in the design of the protocol can in some cases be alleviated somewhat with a workaround involving user variables.

Some drivers do not pipeline the prepare, parameter, execute and deallocate steps, instead waiting for a response from the server after each step. Such drivers may actually run some queries slower when using prepared statements.

A detail in the protocol specification is that results from non-prepared statements are pushed to the client, whereas results from prepared statements are pulled by the client. The pull mechanism makes it possible to stop in the middle of a result set and cancel receiving any more rows.

Another, related detail in the protocol specification is that prepared statements are assigned a number each, whereas non-prepared statements are not. Multiple prepared statements can therefore be active, running over the same connection at the same time, while only one non-prepared statement can be running at any one time.

In short, protocol-level prepared statements would be very nice to have, but are currently not implemented. Protocol details are described in the manual, but with gaps in the documentation.

There is also a purely SQL-driven interface to prepared statements, using the PREPARE, EXECUTE and DEALLOCATE PREPARE commands. The SQL-driven interface does not have the same performance benefits as the protocol-level one does. It can however be useful for assembling SQL statements on the fly inside stored procedures and for checking SQL queries for correct syntax and identifiers without actually running the query.

There is also a client-side prepared statement interface in the driver, which it is highly recommended to use.

High-granularity streaming

Streaming data along the way as it is consumed or delivered by the client application can lead to dramatical decreases in memory usage.

Streaming outgoing data can be accomplished with server-side prepared statements, because the wire protocol allows prepared statement parameters to be sent one at a time, and even in chunks. Chunking data presumably allows you to interleave data to different fields when sending a row to the server. (See also "Protocol-level prepared statements".)

The driver API currently allows client applications to stream incoming data one row at a time. The highest supported granularity for streaming is one whole row at a time. Streaming at a higher granularity and interleaving chunks of incoming data is not part of the current protocol design.

Streaming incoming data in a row-by-row fashion is also known in some drivers as "use_result mode".

Server-side cursors

Cursors provide a way of keeping open, navigating and optionally modifying a result set on the server, rather than having to transfer the entire result set to the client.

MySQL Server supports only a particular class of cursors known as read-only, forward-only cursors. As the name implies, this kind of cursor cannot be navigated (also called "scrolled") or modified.

The semantics are exactly the same as when streaming incoming data with the stream() call. With cursors however, the server automatically creates a temporary table behind the scenes from which the data is read, thereby always using extra resources for a temporary table, but consequently also releasing locks on original tables as soon as possible.

There is also a purely SQL-driven interface to cursors, which can be useful inside stored procedures.

Server-side cursors are part of the prepared statement protocol features (see above), and are therefore currently not supported.

Multiple statements per query

Currently unsupported. Multiple statements can often cause more trouble than gain by making SQL injection (a security risk) much easier and in return providing diminutive performance gains when compared to other approaches.

If you want to run multiple queries, one method is to create two separate connections. This can also give a performance boost, because one query does not wait for the other to finish. Another related advantage is that multi-core servers can actually perform both queries simultaneously.

Two or more queries can be started simultaneously without resorting to multiple threads on the client. This is done by starting the queries using the stream() call, which does not wait for the server to return data.

See also notes on connection pooling, which can be a useful technique to avoid initialization overhead when creating multiple connections.

Multiple result sets per query

Currently unsupported. Multiple result sets per query is documented to be incompatible with prepared statements (but the reason why is not).

Various non-query-related protocol commands that may be useful are not currently supported, for example COM_SHUTDOWN to initiate a server shutdown and COM_DEBUG to trigger the server into dumping a bunch of debug information.

Automatic reconnect

It is often desirable to be able to restart a database server without having all of its clients drop the ball.

Ensuring that the connection works before starting processing in the client often fixes a majority of the problem. Client code can do a ping() and if necessary a connect() when it starts. For web applications, this is one of the things that Apache::DBI does automatically.

The protocol design does not include a feature to notify clients when the server is going away. They will usually only find out when performing a query or a ping. In cases where the method mentioned above is not good enough, other tactics are available.

An additional measure of reliability can be added by enclosing groups of database calls that are restartable in a guard block that catches database errors. If a lost connection occurs, the entire group of calls can be restarted.

Queries that are not normally restartable include those that write data to the database with INSERT, UPDATE or similar. A simple solution may be to wrap these in a transaction so that changes are automatically cancelled, thus allowing these queries to also be restartable.

Some drivers have an automatic reconnect feature. This is problematic because only the simplest of SQL queries are repeatable without causing unintended side effects. Queries have to be repeated, because the protocol does not offer a mechanism for checking whether queries issued in the past succeeded or not after a reconnect.

In addition, the protocol does not normally carry information about transactions, and doing an automatic reconnect will silently terminate any running transactions.

If you can live with these problems, and want to avoid writing restartable client code or using transactions, there is the possibility of writing a wrapper around the database class which simply catches all calls to query() and wraps them in something that reconnects and reruns the query whenever the connection is lost.

The driver does not have such a feature built-in.

Out of scope features

Connection pooling

Connection pooling can improve performance by removing the need for a TCP handshake and SSL/TLS handshake (for encrypted sessions) when opening a connection, and can also alleviate TIME_WAIT problems in environments with many short-lived connections.

There is no particular advantage to putting pooling code in the driver core itself, but a standard implementation that comes along with the driver would be nice. Currently no such thing exists for this driver.

If you need connection pooling for a web site, the Apache::DBI module will provide this when used together with DBD::Wire10. It actually pools entire processes, including database connections which become persistent by virtue of Apache::DBI hooking the connect() and disconnect() DBI calls.

The LOAD DATA LOCAL INFILE client statement

Poorly documented and therefore unsupported.

A simple workaround is to parse the CSV data on the client using Text::CSV (or even DBD::CSV), and upload it to the server in reasonably sized large chunks with the "extended insert" syntax.

If necessary, you can also emulate the LOAD DATA LOCAL INFILE client statement using two other queries in unison, namely SELECT INTO DUMPFILE and LOAD DATA INFILE.

  use Net::Wire10;

  # Connect to database server
  $wire = Net::Wire10->new(host=>'localhost', user=>'test', password=>'test');
  $wire->connect;

  # Go!
  upload_file('mydata.csv', '`test`.`sometable`');

  sub upload_file {
    my $filename = shift;
    my $table = shift;

    # Load file
    open(FILE, $filename) or die "$!";
    binmode FILE;
    $rawdata = join('', <FILE>);

    # Upload file to server
    my $ps = $wire->prepare("SELECT ? INTO DUMPFILE '/tmp/upload.csv'");
    $ps->set_parameter(1, $rawdata, DATA_BINARY);
    $ps->query;

    # Load data - notice that this is server-side, there is no LOCAL keyword.
    $wire->query("LOAD DATA INFILE '/tmp/upload.csv' INTO TABLE $table");

    # Reclaim disk space used by temporary file.
    # (Prohibited by some servers for security reasons; use an alternative method.)
    $wire->query("SELECT '' INTO DUMPFILE '/tmp/upload.csv'");
  }

When a similar solution was benchmarked, the performance was identical to a client-side version using the LOCAL feature (tested using the C API).

Be careful to escape data in the CSV file. MySQL Server does not use the same CSV format as the de facto standard format use by Office applications. One difference is that backslashes in CSV data are considered escape characters by MySQL Server.

For optimal network performance, compress the data using zlib before sending it to the driver. See also notes on compression further above.

When bulk loading data, it can be useful to temporarily disable keys with ALTER TABLE ... DISABLE KEYS, and/or to wrap the entire operation inside a transaction.

The DELIMITER client statement

Mostly useful for CLIs, and requires a lot of client-side SQL parsing. Can be implemented in a CLI if necessary, there is no advantage to putting this in the driver.

Miscellaneous other limitations

Due to the way the Perl interpreter works, the following limitations may also apply.

  • Result sets limited to ~2 billion results on 32-bit Perl or ~9 quintillion results on 64-bit Perl.

  • Field values limited to under 2 GiB on 32-bit Perl or under 8 EiB on 64-bit Perl.

  • get_insert_id() values limited to ~2 billion on 32-bit Perl or ~9 quintillion on 64-bit Perl.

If you need BIGINT UNSIGNED for AUTO_INCREMENT columns on 64-bit Perl (or INT UNSIGNED on 32-bit Perl), use "SELECT LAST_INSERT_ID()" rather than get_insert_id().

LAST_INSERT_ID() returns the sequence number as a string instead of as an integer, avoiding any overflow issues with Perl's built-in types.

Known bugs

There is a bug tracker at CPAN where you can report new bugs: http://rt.cpan.org/NoAuth/Bugs.html?Dist=Net-Wire10

Bugs in the design of the over-the-wire protocol or the server may affect how the driver works. For example:

  • No character set info is sent with metadata such as column names, forcing the driver to assume that the client and result character sets are constant, when in fact they are not.

  • Selecting data larger than max_packet_size server variable causes silent truncation on the server side (inserting does not).

  • Using tokens for LIMITs in prepared statements fails because the server chokes on quoted integers after LIMIT.

  • Server sometimes fail to reap dead connections.

As a workaround for the last issue, this driver scans all values given as parameters to prepared statements and skips quoting integer values.

Refer to the MySQL issue tracker for more server and protocol issues: http://bugs.mysql.com/

Bugs and design issues in Perl itself may also affect the driver. For example:

  • Perl integer scalars are always signed, unlike MySQL integers.

  • Perl select() interferes with active alarm().

A workaround is in place for the last issue, see the source code for details.

Refer to the Perl issue tracker for any core Perl issues: http://rt.perl.org/rt3/Public/

Bugs in dependencies, for example IO::Socket or IO::Select, may also affect the driver. Refer to the CPAN issue tracker for each package.

SEE ALSO

DBD::Wire10

AUTHOR

Driver core hacked together by Kim Christensen and David Dindorp at Dubex A/S. Password hashing routines by Hiroyuki OYAMA E, Japan.

COPYRIGHT AND LICENCE

Copyright (C) 2002 and (C) 2009 as described in AUTHORS.

This is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

WARRANTY

Because this software is licensed free of charge, there is absolutely no warranty of any kind, expressed or implied.