Datahub::Factory is an application that extracts data from Collection Management Systems, converts it to LIDO and submits it to a Datahub for Museums.
It is written in Perl and uses Catmandu as the underlying framework.
Datahub::Factory can extract data from data dumps (usually in XML) or directly from the API of the Collection Management System (CMS for short). It does this by using specific Importer plugins, based around Catmandu modules.
At the moment, it includes support for:
- The Museum System: Datahub::Factory::Importer::TMS
- Adlib (API and dump): Datahub::Factory::Importer::Adlib
- Collective Access (API): Datahub::Factory::Importer::CollectiveAccess
By default, it will convert data to LIDO and attempt to submit it to a Datahub. However, this can be changed by changing the Exporter plugin:
- Datahub for Museums (the default)
- LIDO (an XML dump)
- YAML
To convert between data formats, we use the powerful Catmandu Fixing Language, so it is theoretically possible to convert between a limitless amount of formats.
Usage
The application (script/dhconveyor
) supports several commands that are provided as the first argument. Nevertheless, only the transport
command (to transport data from source to sink) is really supported.
To use the application, you need to define an Importer plugin and configure it. While Datahub::Factory supports the conversion between data formats, it won't do it by itself. You have to provide a Fix file that does the actual conversion. Example files can be found here. By default, the application will attempt to push to a Datahub. You can however, export to LIDO-XML or to YAML.
It is possible to extend the program by adding more plugins, see this guide.
All configuration (which plugin to use for importing and exporting, the location of the fixes file and any plugin-specific options) are set in a Pipeline file that is provided to the application via the --pipeline
switch. For more information, consult the pipeline documentation.
Puppet
A puppet module exists for this application and can be used to create and manage pipeline configuration files.
Under the hood
Datahub::Factory is built on Catmandu and uses its Fix language and plugin architecture to support its operation.
A more technical (and complete) explanation can be found here.