The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Web::DataService::Tutorial - how to build an application using Web::DataService

SYNOPSIS

This file explains the process of creating a Web::DataService application, using the example code provided as part of this distribution. The sections below will include instructions for setting up an example Web::DataService application, along with a description of what each part does. You can then use this example code as a basis for your own application.

GETTING STARTED

In order to build an application using Web::DataService, you must start with a foundation framework. The only one you can use at the present time is Dancer, though we hope to soon release plugins that will enable Web::DataService applications to be written on top of other frameworks such as Catalyst, Mojolicious, Dancer2, etc.

  1. The first step, therefore, is to make sure that Dancer is installed on your system. You will also want to install Template-Toolkit, which is used for displaying documentation pages. And, of course, you must also install Web::DataService.

  2. Next, create a new Dancer project using the program dancer (which is included with that module):

        dancer -a dstest

    This will create a new directory called 'dstest' and install into it all of the necessary files for a Dancer application.

  3. Next, use the program wdsinstallfiles, which is included with the current module, to add the necessary files for the example data service. You must first change to the project directory:

        cd dstest
        wdsinstallfiles

    This invocation will install the data service application as lib/Example.pm. The stub program used to invoke the data service is bin/dataservice.pl.

  4. Run the application, and verify that it executes without error. It will listen for requests on port 3000, unless you override this using the "Port" directive in the configuration file.

        bin/dataservice.pl

    Once it is running, you can open a web browser and view the documentation for this data service under the following URL:

        http://localhost:3000/data1.0/

    You can test the data service by sending some requests such as:

        http://localhost:3000/data1.0/single.json?state=NY
        http://localhost:3000/data1.0/list.txt?region=NE,MW&show=hist,total

    Note: if you are getting errors when running this example application, try editing bin/dataservice.pl and removing the -T flag from the first line.

  5. Now try editing Example.pm and/or PopulationData.pm, and then re-running the application to see how the behavior changes. Here are some things to try:

    • Change the path prefix on Line 60 of lib/Example.pm.

    • Disable the feature 'format_suffix' and add the special parameter 'format' on lines 58-59 of lib/Example.pm:

          features => 'standard, no_format_suffix',
          special_params => 'standard, format',
    • Disable the feature 'documentation', on line 58 of Example.pm:

          features => 'standard, no_documentation',
    • Change the title, docstring, and/or usage of one or more of the nodes defined in lines 103-141 of lib/Example.pm. The docstrings are the strings that appear on lines 112, 121, etc., each one documenting the node whose definition it follows.

    • Change the names and/or docstrings of one or more of the output fields specified in lines 59-88 of lib/PopulationData.pm. You can change the name of any field by adding the key name to its definition hash:

          { output => 'pop2010', name => 'current' },
    • Change the names and/or docstrings of one or more of the parameters specified in lines 140-164 of PopulationData.pm.

    • Add a new value for the parameter "order", that will sort the states by their population in 1900 instead of by their population in 2010. This will require modifying the define_set call at line 116 of lib/PopulationData.pm, and also adding some code to lines 281-299.

  6. You can now use this example project as a base for creating your own data service. The various pieces of the code are described below, along with the function of each and some ideas for how you might want to modify them.

  7. For a production data service, you can run your Web::DataService application under any PSGI web server such as Starman or Starlet. A good way to set this up is to run it on (e.g.) port 3000, and then set up nginx or apache as a frontend server. The frontend should have a proxy pass directive to route data service requests to the backend server on port 3000. There are also lots of other ways to do it.

OVERVIEW

Under the Web::DataService framework, a data service application consists of the following components. In order to create your own data service using this framework, you will have to write each of the following:

Main application

The main application, written using a foundation framework such as Dancer, is responsible for initializing all of the necessary elements to define a data service and for providing the basic response loop. It must start by creating a new instance of Web::DataService and then define a variety of data service elements (see Web::DataService::Configuration) so as to configure each of the different operations that make up your data service.

The main application must respond appropriately to each incoming request, which in the case of Dancer means defining one or more route handlers that specify the response to each possible URL path. In the simplest case, as shown in the example application below, a single route handler simply accepts all requests and passes them off to the handle_request method of Web::DataService.

In addition, the main application is responsible for error handling. The example application hands off all errors to the Web::DataService code, rather than using the native Dancer error response.

Configuration file

The foundation framework includes a configuration file ("config.yml" in the case of Dancer) with which you can specify many of the attributes of your data service. Specifying them here rather than putting them in the code will allow others to easily find and change these attributes as necessary during the process of developing and maintaining your data service.

Data service roles and methods

The core of your data service consists of subroutines that talk to the backend data store to fetch and/or store the necessary data. You will need to write one of these for each different operation provided by your data service. These are called operation methods and, as the name suggests, are called as methods of a Web::DataService::Request object.

These operation methods must be organized into one or more modules that can function as Moo roles. These modules will then be automatically composed into appropriate subclasses of Web::DataService::Request. Your role files may include any other code that you wish, and your operation methods may also call any of the methods provided by Web::DataService::Request. For more information, see lib/PopulationData.pm.

Each of these role modules will typically include a method called initialize, which is called automatically at application startup. This method can then define data service elements relevant to this particular role, such as parameter rules, value sets, and output fields. A typical data service application will contain several role modules, one for each different kind of data to be returned.

Documentation templates

Each feature of your data service should be documented by an appropriate web page. To expedite this, Web::DataService provides facilities for auto-generating most of the necessary documentation based on the data service element definitions.

The documentation files take the form of templates, which can include a variety of predefined elements. See Web::DataService::Documentation for more information. At the time of template evaluation these are replaced by various sections of auto-generated documentation. The only templating system currently supported is Template-Toolkit. We plan to include others in the near future.

FILES

This section will cover the files from the example data service one at a time, hilighting the important features of each. You can use these files, and this example application in general, as a basis for your own projects. When going through this section, you may wish to open each file in turn so as to have the contents visible while reading the description.

config.yml

This is the main configuration file for the data service application. As you can see from the comments, some of the settings are used by Dancer and others by Web::DataService. In general, most of the data service attributes can be set in this file, and many of the node attributes can be given default values as well. The example demonstrates this with settings such as data_source and default_limit.

For a description of the configuration settings read by Dancer, see Dancer::Config.

bin/dataservice.pl

This is just a stub program; the actual application code is in lib/Example.pm, with the exception of the last line of this file which activates the main event loop.

This program can run standalone, in which case it listens on port 3000 (by default) and is able to respond to one request at a time. In order to deploy it as a full-scale web application, you have a number of different deployment options (see Dancer:Deployment).

Note that this program is run in taint mode, which is a good idea for any public server. If you are having trouble running this program because it cannot find required modules, this is probably because taint mode ignores the environment variables PERL5LIB and PERLLIB. In this case, you can either remove the -T flag, or add a second use lib line to add the missing directory to @INC.

lib/Example.pm

This file contains the main application code for the example data service. It starts out by declaring the main package for this application ("Example") and then loading the necessary modules. Dancer and Template are required before Web::DataService, so that the latter can configure itself to make use of them. Next, we require the module lib/PopulationData.pm. This defines the methods that will be used to execute the various data service operations for this application.

In lines 42-50, we specify what to do if this application is executed with the command-line argument 'GET'. In this case the second argument should be a URL path, and the optional third one (which should be single-quoted if given) should be a URL-encoded parameter string (without the '?'). The application will proceed to execute this single request and print the result to standard output. This functionality is useful primarily for debugging; you can run this under perl -d, and you can put $DB::single = 1; in your code wherever you wish to have a predefined breakpoint. For example:

    perl -d bin/dataservice.pl GET /data1.0/list.json 'region=MW&show=total'

In lines 55-60, we generate a new instance of Web::DataService. This instance will use the standard set of features and special parameters, and will use "data1.0/" as its path prefix. The path prefix attribute is not mandatory, but if specified it will be removed from all incoming requests before they are matched against the set of data service nodes. You may want to do this in order to distinguish the data service URLs from other URLs that will reside on the same website, or to provide for multiple data service versions.

Lines 67-79 specify the response formats that will be available from this data service. In the case of this example, these are: JSON, comma-delimited text, and tab-delimited text.

Lines 87-141 define a series of data service nodes. Each of these nodes corresponds to one of the following:

  1. A data service operation, and its accompanying documentation page

  2. A standalone documentation page

  3. A file or files that are available for retrieval (e.g. the stylesheet for the documentation pages).

As explained in Web::DataService::Configuration, nodes inherit their attributes hierarchically according to their "path" attributes. The node with path "/" provides default values for all of the other nodes, and also gives the attributes for the main documentation page. Note that all of the operation nodes use methods from the module PopulationData, which is described below.

Lines 153-156 define the route handler for all URL requests. This may be all that your application needs, but you are free to add additional routes as needed.

Lines 163-171 are boilerplate code that causes errors to be handled by Web::DataService rather than by Dancer.

This file is designed to be included from bin/dataservice.pl. Once it has been fully processed, the last line of that file initiates the Dancer main loop. This loop waits for incoming requests, and calls the matching route handler for each one.

lib/PopulationData.pm

This file defines the elements that make up the data service operations provided by the example application. These definitions include rules for validating request parameters, formats for the resulting output, and code that uses the request parameters to retrieve data from the backend. The calls to define_node in the main application combine these various elements into data service operation nodes. A full-scale data service will often have more than one file like this, one for each different type of data that it handles.

Line 18 defines the package name, which will be used with the node attribute role.

Lines 20-21 define the modules used by this one. Line 20 provides access to the predefined validator functions of HTTP::Validate, in case you wish to use them in rulesets.

Line 23 makes this module into a Moo role, which allows the methods defined here to be composed into an automatically generated subclass of Web::DataService::Request.

Lines 41-169 define an initialization method which will be called automatically at application startup time. It is passed two arguments, the first being the class name and the second being the instance of Web::DataService that is being initialized. The purpose of this method is two-fold:

  1. Execute any setup tasks that need to be done in order to access the backend data.

  2. Define all of the elements that will be referenced by data service node definitions which use this role.

Lines 50-54 accomplish the first of these tasks. In this simple example, we just read the data out of a file and store it in lexically scoped variables that can be accessed by the data service operation methods defined below. A more complex application might obtain a database connection using the get_connection method and then use it to read and cache important data.

Lines 59-88 define a set of output blocks that select and describe the various data fields that will be returned by data service operations. The names of these blocks will be used with the node attribute output. Note that the actual response to each request will be automatically generated by the appropriate serialization module, using the appropriate output block(s) and the set of data records generated by the appropriate operation method, as selected by the request attributes and by the data service node that matches the request.

Lines 93-97 specify a set of output blocks that can be optionally added to any request by the use of the special parameter show. The name of this output map (set) will be used with the node attribute optional_output.

Lines 102-124 define two more sets. The first gives the acceptable values for the parameter region, and the second for the parameter order.

Lines 128-132 define a validator function which will be used for the parameter state. This function makes use of a hash of state names that was generated from the data file at line 54 above.

Lines 136-168 define a series of rulesets for validating request parameters. The first of these validates the special parameters made available by Web::DataService. Note that in the main application file (see above) the data service is instantiated with the "standard" set of special parameters. If you modify the instantiation to add or remove some of these parameters, the ruleset will automatically reflect this.

The remaining rulesets validate the individual request parameters for each available data service operation. Their names correspond to the node paths defined in lib/Example.pm. If you wish to use different ruleset names, you would override this using the node attribute ruleset.

Lines 221-319 define the operation methods that implement the various data service operations. These are invoked in response to data service requests, and are called as methods of a request object that has been blessed into a subclass of Web::DataService::Request. As such, they are free to call any of the methods from that class in order to:

  • access attributes of the request such as request_url, node_path, result_limit, etc.

  • get a connection to the backend data service using get_connection

  • get the parameter values that were provided with this request using clean_param, etc.

  • specify the result of the operation using single_result, list_result, etc.

The first of these operation methods returns information about a single state, and is called by the node 'single' (see /lib/Example.pm). This simple operation proceeds by retrieving the cleaned value of the 'state' parameter and retrieving the corresponding data record. It finishes by calling the method single_result, setting the result of the operation to the single record that was retrieved.

The next operation method returns information about multiple states, and is called by the node 'list'. This is a much more complex task, involving a number of different possible parameters. This method retrieves the relevant parameter values, selects the matching records, orders them as requested, and adds an additional "total" record if that was requested. It ends by calling the list_result method, setting the result of the operation to this list of records.

The final operation method returns the set of region codes, and is called by the node 'regions'. This is included so that a client application can retrieve this information and, for example, use it in generating a web form to be used in making queries on this data service.

By using the power of Web::DataService, this small amount of code can generate a quite complex data service application along with all of the necessary documentation.

doc/doc_defs.tt

This file contains all of the definitions necessary for auto-generating the documentation pages. Before you mess with it, make sure you have a good understanding of the Template-Toolkit syntax.

doc/doc_strings.tt

This file contains the text strings used in the process of auto-generating the documentation pages. You can edit this file in whatever way you choose, but be careful with the syntax. Template-Toolkit does not provide very good error messages.

doc/doc_header.tt

This file defines a common header for the data service documentation pages. You can edit it in whatever way you choose, or use the node attribute doc_header to select a different file. You can even select different files for different pages, or use node attribute inheritance to select different files for hierarchical groups of pages.

See Web::DataService::Documentation for a list of the predefined template elements that you can use in this file and the ones described below.

doc/doc_footer.tt

This file defines a common footer for the data service documentation pages. You can edit it in whatever way you choose, or use the node attribute doc_footer to select a different file.

doc/operation.tt

This file provides a template for documentation pages for the operation nodes. You can edit it in whatever way you choose, or use the node attribute doc_default_op_template to select a different file. You can also create a specific documentation page for any node by creating a template file having the same path relative to the doc directory as the node path.

doc/doc_not_found.tt

This file provides a template for documentation pages for the non-operation nodes. You can edit it in whatever way you choose, or use the node attribute doc_default_template to select a different file. You can also create a specific documentation page for any node by creating a template file having the same path relative to the doc directory as the node path.

doc/index.tt

This file serves as the main documentation page for the data service. You can edit it in whatever way you choose.

doc/special_doc.tt

This file documents the special parameters. Note that it will automatically display just the parameters that are selected when the data service is instantiated.

doc/formats/index.tt

This file provides an overview of the available output formats.

doc/formats/json_doc.tt

This file documents the JSON output format in detail. If you are using this example as a basis for your own project, you will want to edit this file and the one listed next so that the example URLs are ones that actually work under the new data service definition.

doc/formats/text_doc.tt

This file documents the plain text output formats.

public/css/dsdoc.css

This file provides a stylesheet for the documentation pages. You can edit it in whatever way you choose.

data/population_data.txt

This file provides the data for the example application.

VERSIONING

Before you start developing a new application using Web::DataService, it is a good idea to come up with a plan for versioning. After you have released the initial version of your application, you may still want to keep adding new features, output fields, parameters, etc. At some point, you may find yourself in a position where you want to make changes that are not backward-compatible, but do not wish to break existing data service clients. Your best option at that point is to define a second version of your service, keeping the first one available for the old clients (and so that old URLs posted elsewhere on the Web will continue to work properly). The Web::DataService framework supports two different ways of doing this: by using path prefixes, and by using a version-selector parameter. You can choose either one, but we recommend that you make use of one of them from the very beginning of your data service. This way, you will be prepared should you want to add incompatible changes to the interface at any point in the future.

In order to create a second version of the data service, you will have to alter your application as follows:

  1. Move the route handler(s) to a separate file. Put them in the same package as your main application.

  2. Duplicate your main application file (without the route handlers), and change the name. Don't change the package name. The name attribute in the data service instantiation call must differ between the two files. You must also set up some means of selecting between the two data service instances, typically either differing path prefixes or version selectors (see below).

  3. Duplicate your role files, and change the names. Do change the package names, so that the new ones differ from the old ones.

  4. Change your stub program to include all of these files.

  5. You can now start making changes to the "new" versions of your application files, and the "old" version will remain active and will respond according to the old definitions.

Of course, you can repeat this pattern whenever a new version is needed.

Path prefixes

The example application is configured to use a path prefix of data1.0/. All URLs generated for this application will thus include the prefix, and can easily be distinguished from any future versions of the service. If you wish to use path prefixes to distinguish between versions, just give the "new" data service instance a prefix such as data1.1/ or data2.0/. The handle_request method will automatically select the proper data service instance based on the prefix of the URL.

Version selector parameters

If you wish to use a version selector parameter instead, you can enable the special parameter selector. Note that you must do this for every data service instance. You must then define a unique selector key for each instance, using the attribute key. The default name for this parameter is v; if the keys you define are, say, "1.0" and "2.0", then the handle_request method will automatically select the proper data service instance for each request based upon whether it contains the parameter v=1.0 or v=2.0. You may at your option define a common path prefix for all of the data service instances, or none at all. Note that a request with an empty path will always return the main documentation page of the first data service instance defined, so this should always be the most recent one.

Other mechanisms

You are free to define any other mechanism that you wish for selecting between data service instances. In this case, you will probably need to write your own route handler(s). The easiest way to do things in this case is to put each data service instance into a separate global variable (i.e. $ds1, $ds2, $ds3, ... ) and write your own code to select which one should be used. You can then call the handle_request method on the proper instance directly. For example:

    if ( param('foo') eq 'debug' )
    {
        $ds2->handle_request(request);
    }
    
    else
    {
        $ds1->handle_request(request);
    }