- COPYRIGHT & LICENSE
Web::DataService::Introduction - introduction to Web::DataService and how to use it
This document provides a basic introduction to Web::DataService, a framework for implementing data services for the World Wide Web.
The purpose of Web::DataService is to provide a comprehensive framework on which to implement web data services. By this, we mean web services that are primarily oriented toward fetching and storing data, including what are usually called "APIs". Such a service can provide controlled access to a backend data system via HTTP requests and responses.
In order to implement a capable web data service (or API), a server needs to be able to handle the following tasks:
Parse HTTP requests
Talk to the backend system (whether it be a data store, instrument, control system, etc.)
Assemble representations of data
Serialize those representations in formats such as JSON, XML, etc.
Set HTTP response headers
Generate appropriate error messages
Provide documentation about itself
Most people who implement data services in Perl base them on web application frameworks such as Dancer, Catalyst, etc. These frameworks provide a good start, but then the authors must then implement most or all of the functionality listed above. The basic idea behind this module is the realization that all of the steps listed above except (3) can be handled by a common code base, configured according to the requirements of a particular data service by a common set of directives. This leaves "Talk to the backend system" as the only part that must be implemented using code directly written by the data service author.
The remainder of this document describes the various concepts behind Web::DataService. It is true that this framework is a complicated one, and that is because the problem it attempts to solve is a complicated problem. In order to make sense of these concepts, you may wish to examine the example data service discussed in Web::DataService::Tutorial. This example application demonstrates the full power of Web::DataService, and can be used as a basis for your own projects.
The Web::DataService framework is based on the following fundamental concepts:
It is designed to allow you to create a data service that will provide exactly the functionality you desire. The behavior of the data service is extremely configurable, and also includes a number of hooks that you can use if necessary to further modify the behavior. For more information, see Web::DataService::Configuration.
In a similar vein, the service is designed to be extensible, with plug-in modules that can be added to provide additional output formats, backend systems, etc.
One basic principle of this framework is that data output is organized around the abstract idea of a "record" defined by a set of data fields. A request is satisfied by generating one or more records, each of which is a hash of field names and values. These records are then passed to a seprate module, which serializes them in the selected output format. This de-coupling of data generation from data serialization simplifies the data generation code immensely, and allows a user of the service to ask for their output in any of the available formats. It also makes it easy for the data service developer to add new output formats as needed.
Documentation is an extremely important part of the functionality provided by this framework. To a great extent, the documentation pages for a web data service can be auto-generated. While defining the various elements of a data service (see below) you have the ability to include documentation strings that will be used as the basis for the generated documentation sections. This makes it easy to keep the documentation up-to-date as the configuration of the data service changes over time.
A data service implemented under the Web::DataService framework is composed of the following elements. See Web::DataService::Tutorial for an example of how they fit together in an actual application.
A Web::DataService application is built on top of a foundation framework, which provides the basic functionality for a web service such as receiving and assembling HTTP messages. Currently, the only such framework that can be used with Web::DataService is Dancer. We hope to expand this set in the future. (Please let us know if you are interested in creating a plugin module to work with one of the other available frameworks).
Each data service is represented by an instance of the class Web::DataService. The basic attributes of the data service are either provided at the time of instantiation, or are read from the application configuration file provided by the foundation framework. These attributes are documented in Web::DataService::Configuration.
A data service application starts by creating a new data service instance and then calling its methods to define the other elements discussed below. Once this is done, control is turned over to the foundation framework until a data service request arrives and is recognized. For more information about this process, see Web::DataService::Tutorial.
Each distinct data service operation or documentation page is associated with a data service node, generated at startup time by the
define_node method of the data service instance. Each node is keyed by a unique path, which in a typical data service will correspond to one of the request URL paths accepted by the service.
The space of nodes is hierarchical, in the same sense that the set of paths is. If your application creates the nodes "a", "a/b", and "a/c", then any attribute values you define for "a" will be inherited by "a/b" and "a/c" unless specifically overridden. Any attribute values assigned to the root node "/" will be inherited by all other nodes except where specifically overridden.
Each node in a data service definition will correspond to one of the following:
A data service operation and its associated documentation page.
A standalone documentation page.
A file or directory of files that can be retrieved upon request (e.g. a stylesheet for the documentation pages).
The data records returned by a Web::DataService application are built from a set of output blocks. These are defined at startup time by the
define_block method of the data service application. Each output block consists of a list of field definitions, processing steps, and other auxiliary declarations. Each data service node that represents a data-producing operation must select one or more of these output blocks, which are then used to generate the output records whenever this operation is requested.
A data service application must also define one or more output formats, using the
define_format method of the data service instance. Each of these format definitions configures one of the available serialization modules so that it can transform sets of data records into HTTP response bodies.
The Web::DataService installation includes two built-in serialization modules:
JSON, which serializes responses using the JSON format, and
Text, which can generate either tab-separated or comma-separated text responses. If you wish your data service to generate output in other formats, you can easily implement your own plug-in modules (see Web::DataService::Plugins).
If your data service provides multiple formats, clients can then choose which format best meets their needs and can vary the format from request to request as they choose.
A data service application may also define one or more vocabularies in which to express the output data. These are created by using the
define_vocab method of the data service instance. The output field definitions mentioned above can each include multiple field names, one for each relevant vocabulary. An output block can also include processing steps as necessary to transform the data values into the proper range for each vocabulary.
In this way, you can arrange for a single result to be expressed according to different data interchange standards. Each output format can be assigned a default vocabulary, and the users of the data service can override this if they wish by means of a special request parameter.
If no vocabularies are defined, a "null" vocabulary consisting of the field names and values provided by the backend system will be used.
A data service application may also define one or more rulesets for use in validating request parameters. These are created using the
define_ruleset method of the data service instance, which in turn calls the identically named method from HTTP::Validate. See the documentation of the latter module for more information, along with Web::DataService::Configuration.
A data service application may also define named sets of values using the
define_set method of the data service instance. These have a number of different uses. A set can be used in a ruleset definition, to specify the acceptable range of values of some parameter. A set can include a mapping of each value to some other value, and can thus be used to translate data values from one output vocabulary to another. Sets are also used to indicate optional output blocks that can be added to the basic output of a data service operation according to the value of a special request parameter.
When defining each of the elements listed above, you may follow each definition with one or more documentation strings. These strings are then used to auto-generate documentation pages for the operations provided by your data service. By documenting each data service element right where it is defined, you will be able to make sure that the documentation of each element reflects its actual definition, and you can easily adjust the documentation whenever you change the definition. The author of Web::DataService has not found any other strategy that works better for keeping the documentation of a data service up-to-date with what the data service actually accepts and produces.
This documentation is always generated in POD format, and is then translated into HTML by the module Web::DataService::PodParser. The documentation strings that you provide may contain POD markup and command paragraphs, and each command paragraph that you provide will be treated properly (i.e. preceded and followed by a blank line) in the generated documentation. The documentation engine will also auto-close any open lists, and does some other cleanup as well to make the documentation process as easy as possible.
You may want to arrange for your application to provide multiple data services through a single server. Reasons for doing this include:
Over time, you may wish to introduce a new protocol version (i.e. a new specification for parameter values and result fields) while still keeping the old version active so that older client software will not break.
You may wish to provide both a "production" and a "development" data service.
In either case, you can simply create multiple instances of Web::DataService, instantiate the necessary data service elements for each, and select between them using whatever criteria make the most sense for your application. Ways of doing this include:
Different URL path prefixes, i.e. "/data1.0/my/operation.json" vs. "/data2.0/my/operation.json"
A version parameter, i.e. "/my/operation.json?v=1.0" vs. "/my/operation.json?v=2.0"
mmcclenn "at" cpan.org
Please report any bugs or feature requests to
bug-web-dataservice at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Web-DataService. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
Copyright 2014 Michael McClennen, all rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.