The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

Bif Design

This document attempts to describe why and how the Bif project management tool x does what it does.

Version

0.1.0_20 (2014-05-08)

Background

The development of Distributed Version Control System (DVCS) software enabled developers to discover the joys of working with a fully-featured, always available local repository. A number of benefits related to productivity (working when not connected) and efficiency (fast executing commands) appeared. Shortly afterwards developers re-discovered the pain of still having to use centralised, web-based bug/issue tracking systems, which kind of muted the joy somewhat.

The first open attempts at a Distributed Project Management System (DPMS) such as Bugs Everywhere, ScmBug, DisTract, DITrack, ticgit and ditz, were implemented on top of a DVCS. With a bit of hindsight, one can theorize that their failure to gain real traction was in part due to not understanding that DVCS and DPMS systems do not have the same information models. With time, most projects have also realized that there are users other than developers who need to interact with such a tool.

Later, DPMS systems were built on different models, but offered a non-UNIXy implementation (Fossil) or suffered documentation and implementation issues (Simple Defects - SD). As of late 2011 the Debian BTS (debbugs), SD and Launchpad (Canonical) appear to be the only systems that provide some kind of interproject cooperation (one issue with different status), but neither debbugs nor Launchpad are really distributed.

It was in this context that Bif was started in with the vague aim of doing something better. While I can say I was trying to learn from earlier efforts, I had nothing like the clarity of mind at the beginning that the above paragraphs imply. Like several others I actually started out building Bif on top of Git. This is no surprise because like everyone else I found the easy replication and plumbing toolset attractive. Experience finally taught me too that the Git model was the wrong one for this application, so after briefly playing with SQLite + Git I settled on SQLite alone with a purpose-built schema and a completely new synchronisation protocol.

Design Goals

The user manual says that Bif aims to be:

    "... a distributed communication system that carries both
    conversations and structured meta-data."

To put it another way, the goal of Bif is to provide users with a fully functional local issue tracker that interacts with remote instances as needed.

Requirements/Constraints

Note that some of the distributed requirements for a *project* tracking system are actually quite different to the distributed requirements for a *software* tracking system. DVCS focus on managing a multi-tentacled, pick-from-anywhere, many-versions-at-a-time set of changes to files. DPMS focus is on tracking items of work at the organisational or personal level.

Command-line Interface (At Least)

Context switching from the shell to the browser is costly. Good engineering means that the CLI is anyway a thin layer over the database, meaning that adding other interfaces later is relatively easy.

CLI should be consistent, semi-similar to other CLI programs. Oh yeah, I almost forgot. The CLI should be responsive enough to be almost instantaneous. There will be no Moose in this CLI tool. Even though I upgraded my laptop recently with an SSD I still can't believe how much it affects the startup time in things like App::TimeTracker. If I release something like that the thoughts of the first (non-Perl-aware) user will be "Do we need to rewrite this in Go?"

Powerful Querying

For management-style reports, for custom queries, for dealing with the whole interproject cooperation requirement. Users need to quickly see summaries of the current status as well as the change history.

Distributed/Offline Operation

As much as possible, the tool should work everywhere that you can. In effect that means data replication.

Fast Delta Synchronisation

There is no way that a sequential scan and check for matching rows in databases should be done each time a user wants to synchronise.

A RESTful object API just doesn't seem suitable either for working with large collections of objects like bugs or projects, and how does one not lose all the benefits associated with database transactions?

A project history is not a hierarchical tree al-la Git trees. Updates can be merged without needing to reparent anything.

Universally Unique Identifiers

Necessary for exchanging updates between systems that have their own requirements for locally unique identifiers.

Locally Unique Integers

We cannot subject users to identifiers that look like abb382f3c.

Interproject Cooperation

If we are going to distribute things, we should do it properly. That means an issue can be tracked by multiple projects, and that each project could consider it to have a different status, and that each project can see the status in the other projects.

No Universal Status

No reason that every project in the whole Bif ecosystem must use the same status types.

Extreme Documentation

Aside from functionality, for this to be successful it has to be useable, approachable, and understandable. In many ways this comes down to the quality of the documentation.

Data Model

Tables for current state of topics, table for updates to topics, tables to track meta data (Merkle trees).

Bif is not implementing a distributed database, or at least not in the classical sense where all nodes need to agree on what the "current" or "latest" values for objects are, based on some kind of consensus achieved real-time. What bif does is simply distribute *updates*. The state of a particular node is the result of the updates it has, and it doesn't care what the other nodes are doing, or when it will get missing updates. I.e. there is no consensus. This works because the users do not need a real-time global view of projects, in the same way they don't need real-time emails.

Updates, or Changesets

A Bif update can actually be composed of many operations in the database, but everything relates to a single row in the updates table. The updates table has an integer primary key which is used for local operations and foreign key targets. It also has a 40 character Universally Unique ID (UUID).

The UUIDs of updates (same for UUIDs of topics) are SHA1 hashes calculated from the content of the update (or topic). This provides a builtin checksum mechanism that is useful during synchronisation to indicate a full and accurate transfer, and potentially simplifies signing updates in the future. The main purposes of the UUID however is for looking up local IDs when inserting updates with foreign key requirements.

Operations happen like this:

  • create a row in the updates table that identifies the author, time, timezone, message

  • add the changes in the *_updates tables for each topic

  • insert a row into func_merge_updates that calculates the hashes of everything.

Updates are immutable, and they can't be easily deleted from everywhere. For the moment at least. Possibly thinking about updates to an update...

Network operations

Protocol

Bif communicates using JSON sentences terminated by a double-newline (\n\n). Communication is bi-directional, often asynchronous, and occurs on a single channel.

Commands or instructions are sent as array references with the first element being an UPPERCASE string, and the remaining elements vary depending on the instruction.

    ["SYNC","project","f6f4f48ef6846421a5d","82e38655"]

Status replies are array references generally containing a just a single CamelCase string element.

    ["ProjectMatch"]

Sometimes instead of a status reply a counter-instruction is sent, and sometimes multiple instructions are sent without waiting for a reply:

    ["NEW","update",{"mtime":14153122121,"message":"Hello"]
    ["NEW","project",{"mtime":14153122121,"title":"todo"]
    ["NEW","project_status",{"mtime":14153122121,"title":"run"]
    ["UPDATE","project",{"status_uuid":"11eb3ba88ae0f"]

The server will generally keep the connection open and answering commands until the client sends a QUIT message:

    ["QUIT"]

Export/Import

Basically just copies everything relating to a project from one repository to another.

Merkle tree synchronisation

There is a Merkle tree associated with every project, representing all of the changes contained therein.

A sync operation compares the tree from two repositories top-down, saving the updates missing from each one. The updates are then replayed in the correct order.

Application Architecture

Bif is a Perl wrapper around an SQLite database, structured as follows:

App::bif::Context

A utility module responsible for finding the .bif repository and setting up debug, pagers, formatting tables, etc.

App::bif::*

A module for the implementation of each bif command.

App::bifsync

The implementation of the bifsync synchronization command.

Bif::DB, Bif::DBW

Database access.

Bif::Role::Sync[::Repo/Project]

Roles that implement the core of the bif network protocol.

Bif::Client, Bif::Server

Classes that provide client and server interfaces for the App::bif::* commands.

Commands are dispatched to Perl modules under the App::bif::* namespace by OptArgs. Execution happens like this:

  • The shell runs the bif file, which due to the #! hashbang line results in perl being executed on that file.

  • The bif script loads the Perl module OptArgs and calls the OptArgs::dispatch function against the App::bif namespace.

  • App::bif defines all of the subcommands, their arguments and options, which the dispatch function uses to dispatch to the appropriate App::bif::* module.

  • The App::bif::sub::command module run method is called.

  • Sub-command classes use functions from App::bif::Context to discover the location of the repository, the user configuration, access the database, render output, generate errors and so on.

  • the program ends.

Database access

Each command uses either Bif::DB or Bif::DBW (based on DBIx::ThinSQL, DBI, and DBD::SQLite) to access the database.

As much as possible is done inside the database with the goal of ensuring data consistency regardless of what application is entering the data. Preparing for some other tool than Bif to be making modifications.

Fake SQLite function calls

SQLite does not have a built-in procedural programming language with a function calling interface. We cheat by defining BEFORE INSERT triggers on normal tables that do their required work and then cancel the insert with a SELECT RAISE(IGNORE) statement.

Transport & Synchronisation

Bif uses ssh to run the bifsync program on remote hosts when exchanging updates with a hub, or else calls bifsync directly when exchanging updates with a local repository. Regardless, it ends up being Bif::Client that is talking to Bif::Server, although most of the functionality is in the parent Bif::Role::Sync class.

Client/Server is a bit of a misnomer, as the protocol is actually about exchanging updates equally, and not particularly about a user needing a resource like HTTP verbs imply.

Author

Mark Lawrence <nomad@null.net>

Copyright & License

Copyright 2013-2014 Mark Lawrence <nomad@null.net>

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.