The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

TODO

Address open issues on GitHub

See https://github.com/johannessen/neo4j-driver-perl/issues.

Functionality and API

  • Implement spatial and temporal types.

  • Add timers to Neo4j::Driver::ResultSummary (see Neo4j::Bolt). Method names should be result_available_after and result_consumed_after, both returning milliseconds with a default of -1 when unavailable (HTTP).

  • croak() error objects (e. g. Exception::Class) instead of strings. (However, this may be difficult to do, see "BUGS" in Carp. Also, p5p has plans to add real error objects to a future version of Perl.) It seems there are about four types that would need to be distinguished: illegal usage errors, internal driver errors, Network errors, and Neo4j server errors. See also #7.

Experimental features

Tests, code quality, documentation

  • Test roundtrip of special numeric values (very large integers, -0.0, ±Inf, ±NaN).

  • Improve test coverage:

    • Many yet uncovered code paths are obviously fine, but difficult or impossible to cover. In some of these cases, it may be possible to refactor the code, such as by banking on autovivification (i.e. don't defend against undefined $a in expressions like $a->{b}; see "Using References" in perlref).

    • The deep_bless subs contain a lot of assertions and other checks that are not normally necessary. Since this logic seems to work fine, it should be simplified. (This may also make it easier to unroll the recursion later.)

    • Documenting each attribute in Neo4j::Driver::SummaryCounters as individual methods might be a quick way to bring up the pod coverage stats a bit.

  • Write new unit tests for all modules.

  • Optimise the simulator for $hash = 0. Use of << causes statements to end with \n, which the simulator could filter out. The internals test "transaction: REST 404 error handling" should run a distinct statement.

  • Consider migrating the LWP tests away from Mock::Quick to reduce the dependency chain (Mock::MonkeyPatch and Mock::Sub look extremely light-weight, alternatives might perhaps be Test::MockObject::Extends or Test::LWP::UserAgent).

  • Verify that "get" in Neo4j::Driver::Type::Node and "get" in Neo4j::Driver::Type::Relationship really do return undef (as documented), even when called in list context.

  • Check that after a server error, the next statement will succeed (there might be issues with perlbolt; see majensen/libneo4j-client#8).

  • List possible croak output in "DIAGNOSTICS" in Neo4j::Driver, allowing for indexing by search engines.

  • We need complete working code examples, such as that Neo4j movies app. Some of the deprecated functionality could also be implemented as a custom module named e. g. Neo4j::Driver::Net::HTTP::Extra. Such a module might be directly useful to some users, but most importantly, it would serve to demonstrate some of the net_module functionality and how that can be used.

Other ideas for specific modules

Neo4j::Driver

  • The neo4j scheme should work for a community edition server out of the box. For starters, neo4j should check if Neo4j::Bolt is loaded into %INC and if the specified server actually supports Bolt. If so, it should behave identically to bolt, otherwise if should behave identically to either http or https, depending on the encryption config. This behaviour is different than that of the official drivers; see also https://community.neo4j.com/t/different-between-neo4j-and-bolt/18498. It should be sufficient to document that the behaviour is subject to change (clusters are enterprise-only and therefore currently out of scope, and the conditions under which Bolt is chosen may also change in future).

  • Change the default URI scheme from http to neo4j. Because a use Neo4j::Bolt; statement has so far never been necessary for the driver and the driver's behaviour won't change it it is not present, and also because HTTP-only features have so far been experimental features only, this should not be a breaking change.

Neo4j::Driver::Session

  • Add transaction functions. These should probably consist of subrefs passed to methods called write_transaction and read_transaction. These access modes are only an optimisation for Enterprise features. We don't target those at present, but read_transaction could then eventually be routed to a high-performance read-only server (for Bolt connections) once clusters are supported.

Neo4j::Driver::Transaction

  • Consider supporting re-using Record objects for query parameters in run. The Java and C# drivers do this.

  • Run statements lazily: Just like with the official drivers, statements passed to run should be gathered until their results are actually accessed. Then, and only then, all statements gathered so far should be sent to the server using a single request. Challenges of this approach include that notifications are not associated with a single statement, so there must be an option to disable this behaviour; indeed, disabled should probably be the default when stats are requested. Additionally, there are some bugs with multiple statements (see tests non-arrayref individual statement and include empty statement). Since stats are now requested by default, this item might mean investing time in developing an optimisation feature that is almost never used. Since the server is often run on localhost anyway where latency is very close to zero, this item should not have high priority.

Neo4j::Driver::Record

Neo4j::Driver::Result

  • Improve documentation, esp. wrt. side-effects of e. g. has_next()

  • Perhaps fetch() should always buffer two records instead of just one. With the current implementation, the bolt connection might remain attached longer than desirable in cases where the client knows in advance how many records there will be and calls fetch() exactly that number of times. (In theory, such a change might even slightly improve performance if the driver uses Perl threads to fill the buffer in the background.)

  • Consider unrolling deep_bless recursion. Based on initial profiling, this may save up to about 5% CPU time (for a specific HTTP test query cached in RAM, performance went from about 2700/s to 2850/s when skipping the call to deep_bless entirely). However, when accessing the database, the bottleneck is typically I/O (querying Neo4j itself instead of the RAM-cached response let the performance for the very same query drop down to 650/s when executed over HTTP). So this optimisation may not be worth it (OTOH, Bolt performance was something like 7000/s, so optimising deep_bless may be more useful there).

Neo4j::Driver::ResultColumns

  • The entire package can probably be removed now.

Neo4j::Driver::ResultSummary

  • Profile the server-side performance penalty of requesting stats for various kinds of queries. If the penalty turns out to be high, stats should perhaps have to be requested explicitly by clients (rather than being obtained by default, as with 0.13 and higher). However, using Bolt always provides stats, and different APIs for HTTP and Bolt seem like a bad idea.

  • Recent Neo4j versions have added database.

Neo4j::Driver::SummaryCounters

  • use Class::Accessor::Fast 0.34;

  • It seems Neo4j 4 added new counters for system updates. These are contains_system_updates and system_updates, the latter returning the number of system updates. Neither is available in Neo4j::Bolt 0.4201 according to the docs; need to check source.

Neo4j::Driver::Net

  • The lingo for the different components is not very clear. Options to be considered:

    Should "net helper" be renamed to "net controller"?
    Should "net module" be renamed to "net agent"?

    Problem is, "agent" is already an overloaded term.

    Should "net module" be renamed to "net plugin"?

    The primary problem would seem to be that usually zero or multiple plugins are supported, but we require exactly one plugin here. Also, currently we include a net module with the distribution, so that's not really a plugin; however, we could probably just describe that as the "default plugin".

  • As a micro-optimisation for the HTTP net_module API, it could be guaranteed for the http_header() method that the returned hashref will not be used by the driver/controller after the next call to request(), so that the underlying hash may be reused by the net module.

Neo4j::Driver::Net::Bolt

Neo4j::Driver::Net::HTTP

  • If a 201 is received without a Location header, it is currently simply ignored by _parse_tx_status(). (The simulator requires this.) According to RFC 7231, such a response means the location hasn't changed, i. e. the resource has been created at the default transaction endpoint. That should never happen; in fact, it should only ever happen for a PUT request, but we don't use those here. So ignoring this is probably the right choice. But it may still be useful to revisit this logic later on.

Neo4j::Driver::Net::HTTP::LWP

  • Consider externalsing this as a distribution of its own now rather than later. Eventually we might get rid of this new dependency by including an all-new adapter for HTTP::Tiny. Even before then, this new structure will make more clear that the driver in principle supports other backends besides just LWP.

Neo4j::Driver::Type::*

  • The eq and ne operators should be overloaded to allow for ID comparison on nodes and relationships.

  • Consider whether to use use overload '%{}' to allow direct access to e. g. properties using the pre-0.13 hashref syntax. See https://metacpan.org/pod/overload#Two-face-References for an example. perltie might perhaps also work. Note that overload might be a memory hog; see "BUGS" in Types::Serialiser.

  • Try to refactor Neo4j::Driver::Type::Path's internal representation to allow either elements or nodes+rels. Have one autogenerate from the other, then cache the results. May not actually have advantages for deep_bless though.

  • Add property() as alias for get() in Neo4j::Driver::Type::Node and Neo4j::Driver::Type::Relationship, enabling clients to avoid the possibly confusing $record->get->get pattern. Another way to avoid this pattern would be $record->get->properties->{}. Aside from being longer, this is also 50 % slower because of the defensive copy that needs to be created. We should probably cache that one at least. In the long run, the internal representation of these types will change; the properties() hash ref will then actually be the fastest way to get a single property (about 8% faster than get()).