The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Neo4j::Driver::Net - Explains the design of the networking modules

VERSION

version 0.23

OVERVIEW

Each Neo4j::Driver::Session has exactly one networking helper instance that is used by the session and all of its transactions to communicate with the Neo4j server. This document discusses the features and known limitations of the networking helpers.

Unless you're planning to develop custom networking modules for the driver, you probably don't need to read this document.

The helpers don't communicate with the server directly. Instead, they control another module that has responsibility for the actual network transmissions. This module can be customised by using the driver's net_module config option. The API that custom networking modules need to implement is described in "EXTENSIONS" below.

Network responses received from the server will be parsed for Neo4j statement results by the appropriate result handler for the response format used by the server. A custom networking module can also provide custom response parsers, for example implemented in XS code.

Please note that the division of labour between sessions or transactions on the one hand and networking helpers on the other hand is an internal implementation detail of the driver and as such is subject to unannounced change. While some of those details are explained in this document, this is done only to help contributors and users of public APIs better understand the driver's design. See "USE OF INTERNAL APIS" for more information on this topic.

SYNPOSIS

 $helper = Neo4j::Driver::Net::HTTP->new({
   net_module => 'Local::Neo4jUserAgentHTTP',
 });
 
 $helper->_set_database($db_name);
 $helper->{active_tx} = ...;
 @results = $helper->_run($tx, @statements);
 
 # Parsing a JSON result
 die unless $helper->{http_agent}->http_headers->{success};
 $json_coder = $helper->{http_agent}->json_coder;
 $json_coder->decode( $helper->{http_agent}->fetch_all );

WARNING: Some of these calls are private APIs. See "USE OF INTERNAL APIS".

WARNING: EXPERIMENTAL

The design of the networking helper APIs is not entirely finalised. While no further major changes are expected, you should probably let me know if you already are creating networking extensions, so that I can try to accommodate your use case and give you advance notice of changes.

The driver's net_module config option is experimental as well.

FEATURES

The networking helpers primarily deal with the following tasks:

  • Establish a database connection.

  • Provide Neo4j::Driver::ServerInfo.

  • Handle certain generic protocol requirements (such as HTTP content negotiation).

  • Sync state between server transactions and driver transaction objects.

  • Control the translation of Cypher statements to network transmissions, and of network transmissions to statement results.

HTTP connections use proactive content negotiation (RFC 7231) to obtain a suitable response from the Neo4j server. The driver supports both Jolt and JSON as result formats. There is also a fallback result handler, which is used to parse error messages out of text/* responses. The HTTP result handlers are individually queried for the media types they support. This information is cached by the networking helper.

All result handlers inherit a common interface from Neo4j::Driver::Result. They provide methods to initialise and bless result data records as Neo4j::Driver::Record objects. Result handlers are also responsible that all values returned from Neo4j are provided to users in the format that is documented for "get" in Neo4j::Driver::Record. For backwards compatibility, a lot of the internal data structures currently match the format of Neo4j JSON responses.

The first HTTP connection to a Neo4j server is always made to the Discovery API, which is used to obtain Neo4j::Driver::ServerInfo and the transaction endpoint URI template. These are the only GET requests made by the driver. Because of a known issue with Neo4j, the Accept request header field needs to be varied by HTTP request method (#12644).

With HTTP being a stateless protocol, Neo4j supports multiple concurrent transactions by using a different URL for each one in a REST-like fashion. For requests made to such explicit transaction endpoints, the Neo4j Transactional HTTP API always provides transaction status information in the response. Transactions that remain open include an expiration time. The networking helper parses and stores this timestamp and uses it to track which transactions are still open and which have timed out. The origination Date field is used to synchronise the clocks of the driver and the Neo4j server (RFC 7231).

Bolt, on the other hand, currently only supports a single open transaction per connection. While a Bolt connection can be viewed as a simple state machine in the backend Bolt library (see Bolt Protocol Server State Spec), Neo4j::Bolt currently doesn't allow users to directly observe state changes, so it is currently somewhat difficult to determine the Bolt connection state. The driver attempts to infer it based on the behaviour of Neo4j::Bolt, and mostly gets it right, but there may be some as-yet-unknown issues. Bug reports are welcome.

One key difference between HTTP and Bolt is the handling of transaction state in case of Neo4j errors. According to Neo4j Status Codes, the effect of errors is always a transaction rollback. On HTTP, these rollbacks take place immediately. On Bolt, however, the transaction is typically only marked as uncommittable on the Neo4j server, but the Bolt connection is not actually put into the FAILED state immediately. To try and work around this difference between HTTP and Bolt, this driver's Bolt networking handler always attempts an explicit transaction rollback if faced with any error condition. Again, this approach mostly gets it right, but there may be some remaining issues, particularly when network errors and server errors happen simultaneously.

COMPATIBILITY

Neo4j::Driver version 0.21 is compatible with Neo4j::Bolt 0.01 or later.

When using HTTP, Neo4j::Driver 0.21 supports determining the version of any Neo4j server via "server" in Neo4j::Driver::Session. This even works on Neo4j 1.x, but running statements on Neo4j 1.x will fail, because it lacks the transactional API.

Neo4j::Driver 0.21 is compatible with Neo4j versions 2.x, 3.x, and 4.x. It supports HTTP responses in the formats JSON and Jolt (both strict mode and sparse mode).

For Bolt as well as HTTP, future versions of the driver will tend to implement new requirements in order to stay compatible with newer versions of Neo4j. Support for old Neo4j or library versions is likely to only be dropped with major updates to the driver (such as 0.x to 1.x).

BUGS AND LIMITATIONS

As described above, there may be cases in which the state of a Bolt connection is not determined correctly, leading to unexpected failures. If such bugs do exist, they are expected to mostly happen in rare edge cases, but please be sure to report any problems.

Clock synchronisation using the HTTP Date header does not take into account network delays. In case of high network latency, the driver may treat transactions as open even though they have already expired on the server. To address this, you could either increase the transaction idle timeout in neo4j.conf or manipulate the return value of date_header() in a custom networking module.

The metadata in HTTP JSON responses is often insufficient to fully describe the response data. In particular:

  • Path metadata doesn't include node labels or relationship type (#12613).

  • Records with fields that are maps or lists have unparsable metadata (#12306).

  • Byte arrays are coded as lists of integers in JSON results.

As of Neo4j 4.2, the Jolt documentation for byte arrays doesn't match the implementation (#12660). Future Neo4j versions might fix the implementation to match the docs.

Neo4j spatial and temporal types are not currently implemented in all response format parsers.

EXTENSIONS

Custom Bolt networking modules

By default, Bolt networking uses the amazing XS module Neo4j::Bolt by Mark A. Jensen (MAJENSEN), which in turn uses the C library libneo4j-client to actually connect to the Neo4j server. Updates and improvements are quite possibly best made directly in those libraries, so that not only Neo4j::Driver, but also other users benefit from them.

If the driver's net_module config option is used with a Bolt connection, the module name provided will be used in place of Neo4j::Bolt and will have to match its API exactly. It is possible to provide a factory object instead.

Results will be handled by Neo4j::Driver::Result::Bolt, unless a custom net_module provides a method named result_handlers(). If it does, it's expected to return a list containing a single module name, which will be used as a result handler instead. See "Custom result handlers" below.

Custom HTTP networking modules

Neo4j::Driver includes a single HTTP networking module that will be used if the net_module config option is set to "" or undef (the default). If another module name is given as net_module, that module will be used instead of the included module. Make sure you always use a custom networking module. If you extend the included module through inheritance, you also must use parent.

The included module may change in future. As of version 0.21, the default module uses LWP directly. Earlier versions used REST::Client.

It is possible to set a factory object as net_module instead of providing a module name. The factory object must have a new() method returning an object that implements the interface described in the following section.

If you look at the source of existing networking modules for inspiration, please note that they may use internal APIs. Please make sure you read "USE OF INTERNAL APIS" before you start copying existing code.

API of an HTTP networking module

The driver primarily uses HTTP networking modules by first calling the request() method, which initiates a request on the network, and then calling other methods to obtain information about the response.

 $net_module = $driver->config('net_module');
 $agent = $net_module->new($driver);
 
 $agent->request('GET', '/', undef, 'application/json');
 $status  = $agent->http_header->{status};
 $type    = $agent->http_header->{content_type};
 $content = $agent->fetch_all;

HTTP networking modules must implement the following methods.

The driver will make all method calls using the arrow operator (->). The method descriptions below use a syntax similar to that of use feature 'signatures'; however, the first argument ($class or $self) is omitted from the signatures for clarity.

date_header
 sub date_header () { $date }

Return the HTTP Date: header from the last response as string. If the server doesn't have a clock, the header will be missing; in this case, the value returned must be either the empty string or (optionally) the current time in non-obsolete RFC5322:3.3 format. May block until the response headers have been fully received.

fetch_all
 sub fetch_all () { $response_content }

Block until the response to the last network request has been fully received, then return the entire content of the response buffer.

This method must generally be idempotent, but the behaviour of this method if called after fetch_event() has already been called for the same request is undefined.

fetch_event
 sub fetch_event () { $next_event }

Return the next Jolt event from the response to the last network request as a string. When there are no further Jolt events, this method returns an undefined value. If the response hasn't been fully received at the time this method is called and the internal response buffer does not contain at least one event, this method will block until at least one event is available.

The behaviour of this method is undefined for responses that are not in Jolt format. The behaviour is also undefined if fetch_all() has already been called for the same request.

http_header
 sub http_header () { \%headers }

Return a hashref with the following entries, representing headers and status of the last response.

  • content_typee. g. "application/json"

  • location – URI reference

  • status – status code, e. g. "404"

  • success – truthy for 2xx status codes

All of these entries must exist and be defined scalars. Unavailable values must use the empty string. Blocks until the response headers have been fully received.

http_reason
 sub http_reason () { $reason_phrase }

Return the HTTP reason phrase (e. g. "Not Found" for status 404). If unavailable, "" is returned instead. May block until the response headers have been fully received.

json_coder
 sub json_coder () { $json_coder }

Return a JSON::XS-compatible coder object (for result parsers). It must offer a method decode() that can handle the return values of fetch_event() and fetch_all() (which may be expected to be a byte sequence that is valid UTF-8) and should produce $JSON::PP::true and $JSON::PP::false for booleans.

The default module included with the driver returns an instance of JSON::MaybeXS.

new
 sub new ($driver) { $self }

Initialises the object. May or may not establish a network connection. May access $driver config options using the method "config" in Neo4j::Driver only.

As of version 0.21, not all aspects of the configuration of Neo4j::Driver instances can be queried using config(). This issue will be addressed soon.

protocol
 sub protocol () { $http_version }

Return the HTTP version (e. g. "HTTP/1.1") from the last response, or just "HTTP" if the version can't be determined. May block until the response headers have been fully received.

request
 sub request ($method, $url, $json, $accept) { }

Start an HTTP request on the network. The following positional parameters are given:

  • $method – HTTP method, e. g. "POST"

  • $url – string with request URL

  • $json – reference to hash of JSON object

  • $accept – string with value for the Accept: header

The request $url is to be interpreted relative to the server base URL given in the driver config.

The $json hashref must be serialised before transmission. It may include booleans encoded as the values \1 and \0. For requests to be made without request content, the value of $json will be undef.

$accept will have different values depending on $method; this is a workaround for a known issue in the Neo4j server (#12644).

The request() method may or may not block until the response has been received.

result_handlers
 sub result_handlers () { @module_names }

Return a list of result handler modules to be used to parse Neo4j statement results delivered through this module. The module names returned will be used in preference to the result handlers built into the driver.

See "Custom result handlers" below.

uri
 sub uri () { $uri }

Return the server base URL as string or URI object (for Neo4j::Driver::ServerInfo). At least scheme, host, and port must be included.

Custom result handlers

The result handler API is currently not formally specified. It is an internal API that is still evolving and may be subject to unannounced change.

Even so, it's fully possible to implement a custom result handler. You should probably drop me a line when you begin work on one; see "USE OF INTERNAL APIS".

USE OF INTERNAL APIS

Public APIs generally include everything that is documented in POD. However, this document may contain some mentions of private APIs (where it does, it tries to be explicit about it). The section "Custom HTTP networking modules" describes a public API.

Private internals, on the other hand, include all package-global variables (our ...), all methods with names that begin with an underscore (_) and all cases of accessing the data structures of blessed objects directly (e. g. $session->{net}). Additionally, the new() methods of packages without POD documentation of their own are to be considered private internals.

You are of course free to use any driver internals in your own code, but if you do so, you also bear the sole responsibility for keeping it working after updates to the driver. Changes to internals are usually not announced in the Changes list, so you should consider watching GitHub commits. It is discouraged to try this approach if your code is used in production.

If you have difficulties achieving your goals without the use of driver internals or private APIs, you are most welcome to file a GitHub issue about that (or write to my CPAN email address with your concerns; make sure you mention Neo4j in the subject to beat the spam filters).

I can't promise that I'll be able to accommodate your use case, but I am going to try.

AUTHOR

Arne Johannessen <ajnn@cpan.org>

COPYRIGHT AND LICENSE

This software is Copyright (c) 2016-2021 by Arne Johannessen.

This is free software, licensed under:

  The Artistic License 2.0 (GPL Compatible)