The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Web::DataService::Configuration - configuration attributes and how to use them

SYNOPSIS

This document lists the various attributes available for you to use in configuring a data service with Web::DataService.

SYNTAX

The various configuration methods provided by Web::DataService all use a consistent syntax. With the possible exception of an initial name parameter, all of the rest of the parameters must be either hashrefs or strings. The hashrefs each configure some object, and the strings each document the object that they follow. We refer to this mix of attribute hashrefs and documentation strings as a definition list.

    $ds->define_format(
        { name => 'json', content_type => 'application/json',
          doc_node => 'formats/json', title => 'JSON',
          default_vocab => 'com' },
            "The JSON format is intended primarily to support client applications,",
            "including the PBDB Navigator.  Response fields are named using compact",
            "3-character field names.",
        { name => 'xml', disabled => 1, content_type => 'text/xml', title => 'XML',
          doc_node => 'formats/xml',
          default_vocab => 'dwc' },
            "The XML format is intended primarily to support data interchange with",
            "other databases, using the Darwin Core element set.");

For example, the above call defines two response formats: one named 'json' and the other named 'xml'. Each of these formats is defined by the set of attributes contained in a hashref. All of the documentation strings are automatically collected (joined by newlines) as the attribute doc_string of the object whose definition they immediately follow.

Attribute value syntax

In general, whenever an attribute can take a list of values, you specify those values as a string with the items separated by commas and arbitrary whitespace. For example, the following are identical:

   output => 'basic , extra'
   output => 'basic,extra'

CONFIGURATION PROCESS

In order to fully define a data service using this framework, your code must carry out the following steps (see the example data service application for more about this):

1)

Load one or more modules ("operation modules") that can serve as Moo roles. The subroutines that implement your data service operations must be placed in these modules.

2)

Generate a new instance of Web::DataService.

3)

Define one or more output vocabularies using define_vocab. This step is optional, and a default vocabulary consisting of the field names and values obtained from the backend will be automatically used if you do not specify any.

4)

Define one or more response formats using define_format. This must follow any vocabulary definitions, and must precede the node definitions.

5)

Define some data service nodes using define_node.

6)

Define one or more output blocks using define_block. These may occur in any order with respect to the node definitions.

7)

Define value sets using define_set (or define_output_map). This step is optional, but you will need to do this if you wish to provide variable output. These definitions must occur before any output blocks that depend on them.

8)

Define one or more parameter validation rulesets using define_ruleset. These may occur in any order with respect to the other definitions.

If some or all of your operation modules define a subroutine called initialize, this will be called once for each module as soon as the module name is encountered as the value of a role attribute in a node definition. You can also trigger this explicitly by calling initialize_role. The routine will be called as a class method, so the module name will be the first argument. The data service instance will be the second, so you can use that to make further definitions.

You may find it convenient to put some or all of the definitions from steps 5-8 (define_node, define_block, define_set, define_output_map, define_ruleset) in these initialization routines. That will serve to locate these definitions together with the operations to which they apply.

You may instead find it convenient to put all of the node definitions together, either in the main application file or in some subsidiary module, so that the hierarchical relationships will be apparent. Exactly how you structure your applicaton is up to you.

CONFIGURATION DETAILS

The attributes that you can use in defining these different types of elements are listed in the following sections.

Data service instantiation

A new data service is instantiated by calling the new method of Web::DataService, as follows:

    my $ds = Web::DataService->new({ name => 'data1.0', ... });

The "..." in the above example represents some set of attributes chosen from the list below. With a few exceptions noted below, any attributes that you do not specify in the call to new will be looked up in the configuration file provided by the foundation framework (config.yml in the case of Dancer). Any not specified there will be given default values, as indicated in the documentation for the individual attributes. For most attributes, it is up to you whether to specify them in the instantiation call or in the configuration file.

When a new data service is instantiated, attributes that are not explicitly specified in the instantiation call are looked up in the configuration file under the value provided for the required attribute name. If not found, they are then looked up as direct attributes. For example, if the configuration file has the contents listed below, the above call will produce a data service with a default_limit of 1000 and a default_header of 1. This allows you to configure several different data services that share some attribute values but not others.

    default_limit: 500
    default_header: 1
    
    data1.0:
        default_limit: 1000
    
    data2.0:
        default_limit: 1200

Data service attributes

In the list below, entries indicated by [req] are required attributes. Those indicated by [inst] must be specified in the call to new rather than in the configuration file. Those indicated by [mod] have default values according to which modules have been loaded at the time the data service is instantiated.

All of the data service attributes have identically-named accessor methods. These are all read-only; the attributes may only be set at the time of instantiation.

name [req] [inst]

Specifies a unique identifier for this data service. You must specify this in the instantiation call, because it is used to find attribute values in the configuration file.

features [req] [inst]

Specifies the set of built-in features to be enabled for this data service. The value of this attribute must be a comma-separated list of feature names from the list given below. You can turn a feature off by prefixing its name with no_, and you can use 'standard' to enable all of the available features. So the following will enable all of the features except "doc_paths":

    features => 'standard, no_doc_paths'

while the following will enable just 'format_suffix' and 'documentation':

    features => 'format_suffix, documentation'

The individual features are as follows:

format_suffix

This feature causes the response format of any request to be set from the suffix on the URL path. If enabled, a request with the URL path "/my/operation.json" will select the operation corresponding to the data service node "my/operation" and will render the output using the "json" format.

documentation

This feature will auto-generate documentation pages for the various data service operations. If enabled, the URL path "/" will always generate a main documentation page, and a URL without any suffix will generate a documentation page corresponding to the selected data service node. You are also able to create additional documentation nodes and templates at will. In order to make use of this feature, you must also ensure that a templating plugin is loaded.

doc_paths

This feature will enable additional URL paths for accessing documentation. If enabled, a request with the URL path "/my/operation_doc" or (if format_suffix is also enabled) "/my/operation_doc.html" will produce the documentation page for the data service node "my/operation". The URL path "/my/operation", on the other hand, will execute the operation and return the result.

You can change the documentation suffix from "_doc" to something else by setting the attribute doc_suffix.

send_files

This feature will enable you to define data service nodes that respond with the contents of files from disk. Its primary purpose is to provide access to the CSS files that accompany the documentation pages. You can use it to provide access to other files as well. If you disable this feature but enable the 'documentation' feature, you will need to arrange for the documentation CSS file to be provided separately.

strict_params

If this feature is enabled, then any parameter names that are not recognized by the ruleset corresponding to the selected data service node will cause a request to be rejected with a result code of 400 (bad request). If disabled, then bad parameter names will generate warnings instead.

stream_output

If this feature is enabled, then any response body larger than the value of stream_threshold will be streamed to the client instead of being sent in a single chunk. This feature should be enabled for any service which can produce large responses, because otherwise the process of marshalling such responses will take up large amounts of server memory and CPU time, and may cause excessive paging.

special_params [req] [inst]

The Web::DataService module can process certain request parameters in special ways. Each of these special parameters has an internal name for use in the data service application code, and an external name which you can set to any string you choose. It is this external name which is recognized in actual requests to the data service.

The value of special_params must be a list of special parameter internal names. You can turn off any of these by prefixing the name with no_, and you can change the external name (i.e. the name actually used in requests) by adding =name. The name standard enables the following set of parameters:

    show, limit, offset, header, datainfo, count, vocab, linebreak, save

So the following attribute value would enable the parameters listed above except for 'datainfo', and would set the external name of the 'header' parameter to 'head'.

    special_params => 'standard, no_datainfo, header=head'

Once a set of special parameters is chosen, clients of the data service may include any of them (or none) in any request. The special parameters are as follows:

selector

If enabled, this special parameter is used to select which version of the data service should respond to the request. Its external name defaults to v unless overridden. If you enable this parameter, then you should give each data service a different value for the attribute key.

If you are running multiple versions of your data service from a single application, or if you think you may want to create a second version at some point, then you should either enable this parameter from the very beginning or use a different value of path_prefix for each of your data services. One or the other mechanism will ensure that the proper version of your service is selected to respond to each request.

format

If enabled, this special parameter is used to select the response format for the request. It is forbidden to enable this parameter and the feature "format_suffix" for the same data service. You can enable it if you prefer your clients to select the response format by means of a parameter rather than through a suffix on the URL path.

show

If enabled, this special parameter is used to select optional output blocks in addition to the default output for a particular request. In this way, clients can tailor the output of each request to provide just the information they need and leave out information they do not need.

limit

If enabled, this special parameter is used to limit the number of result records returned by a request. The data service attribute "default_limit" can be used to provide a default limit for any request that does not specify this attribute. The value of this parameter can be any positive integer, 0, or the string all. By using the latter value, a client can ensure that the entire result set is provided.

This parameter, in combination with default_limit, can be useful for data services that are able to generate large result sets. This combination prevents clients from accidentally sending in request URLs that generate enormous responses, while allowing the ability to acquire the full results when necessary. A client can also use this parameter in conjunction with "offset" to obtain a large result set using a series of requests, each of which returns a portion of the desired result.

offset

If enabled, this parameter indicates that the response should start at the indicated position in the result set rather than at the beginning. See also "limit".

count

If enabled, a true value for this parameter indicates that the response should include not only the result of the data service operation but also a count of the number of records found, the number returned, and the elapsed time taken in executing the operation. A false value indicates that this information should not be included. The attribute "default_count" specifies whether or not that information will be included when this parameter is not specified. This is a flag parameter (see below).

datainfo

If enabled, a true value for this parameter indicates that the response should include not only the result of the data service operation but also a set of descriptive information about the data. The attribute "default_datainfo" specifies whether or not that information will be included when this parameter is not specified. This is a flag parameter (see below). [Need to put in a link here to more info.]

If enabled, a true value for this parameter indicates that the response should include header material, the contents of which varies according to the output format and the values of the count and datainfo parameters (if enabled). If false, no header material should be included. This parameter is ignored by the JSON output module. With a text format response (tsv or csv), if this parameter is provided with a false value then all header material is suppressed and only the data records (one per line) are returned. This is a flag parameter (see below).

linebreak

If enabled, this parameter can be used to select the linebreak sequence used with text format responses. The accepted values are cr for a carriage return, lf for a linefeed, and crlf for a carriage return/linefeed combination. The default external name for this parameter is lb.

save

If enabled, this parameter can be used to indicate that the response should be saved to disk rather than displayed in a browser window. The server will provide the appropriate headers, but it is up to the web browser or other client software to decide how to handle them. If this parameter is provided with a value other than yes, no, on, off, 1, 0, true, or false, then this value will be used as the default filename with the selected response format appended as a suffix. You can also use the attribute "default_filename" to provide a default in case no filename was specified by the client.

vocab

If enabled, this parameter can be used by the client to specify which vocabulary to use in expressing the result of a data service operation. The client can use this to override the default vocabulary for the selected output format, or to select a vocabulary if the format does not specify a default. Obviously, this special parameter should only be enabled if you have defined additional output vocabularies.

foundation_plugin [req] [inst] [mod]

This attribute is not required if one of the known foundation frameworks (currently only Dancer) is already loaded. If you put use Dancer in your main application file before the call to instantiate your data service, then the plugin Web::DataService::Plugin::Dancer will be loaded automatically.

The purpose of this plugin module is to interact with the foundation framework, to carry out tasks such as: receiving HTTP requests, producing HTTP responses, and reading application configuration information. The only reason you might need to specify this attribute explicitly is if you wish to load a different plugin and override the default choice. If you do so, and the named module is not already loaded, it will be automatically loaded. See Web::DataService::Plugins for more about plugins.

templating_plugin [mod]

This attribute may be specified either at instantiation or in the configuration file. It must be the name of a Perl module, and will be loaded at instantiation time if it has not already been loaded. The purpose of this plugin module is to interface with a templating engine for the purpose of producing documentation pages and/or result pages [note: result pages are not yet implemented].

If this attribute is not specified, and if the module Template has already been loaded, then the plugin Web::DataService::Plugin::TemplateToolkit will be loaded automatically. If no templating plugin is loaded, then documentation pages cannot be produced. In that case, the features 'documentation' and 'doc_paths' will be disabled.

backend_plugin [mod]

This attribute may be specified either at instantiation or in the configuration file. It must be the name of a Perl module, and will be required if not already loaded. The purpose of this plugin module is to acquire a connection to a backend database or other system for the purpose of reading or modifying data in response to data service requests.

If this attribute is not specified, and if Dancer/Plugin/Database.pm has already been loaded, then the plugin Web::DataService::Plugin::Dancer will be used in this role.

Unlike the other two plugin attributes, this one is not essential. Your own code for implementing the data service operations may simply acquire a backend database connection in whatever manner is appropriate.

title [req]

Provides a title by which this data service can be referred to in documentation pages, etc. This attribute is required, but may be specified either at instantiation or in the configuration file.

version

If specified, the value of this attribute is included in the standard documentation template as part of the page header. You can increment this whenever you make a change to the interface. The value can be any string, i.e. "23" or "1.2b5".

path_prefix

If specfied, the value of this attribute must be a string. That string will be removed from the front of each request URL path before the path is matched to a data service node, and will be prepended to each URL path that is generated as part of the documentation.

If you are running more than one data service at a time (i.e. multiple versions) then one good way to arrange them is by setting a different path prefix for each one.

key

If specified, the value of this attribute must be a string. If you are running multiple data services and do not wish to use different path prefixes to differentiate them, you can instead enable the special parameter 'version' and set a different value of this attribute for each service.

You must write code as part of your main application to select the appropriate service using the value of the 'version' parameter (you can use the class method match_key). Generated URLs will include the value of this attribute as the value of the 'version' parameter automatically.

ruleset_prefix

If specified, the value of this attribute must be a string. It will be prepended to any auto-generated ruleset names.

doc_suffix

If specified, the value of this attribute must be a string or quoted regex. It is only relevant if the feature doc_paths is enabled. In that case, any URL path ending in this string will have the string removed, and if the resulting path matches a data service node then the response will be a documentation page for that node. If no node is matched, a 404 error will result.

If not specified, the default value is '_doc'.

doc_index

If specified, the value of this attribute must be a string or quoted regex. It is only relevant if the feature doc_paths is enabled. In that case, any URL path ending in '/' followed by this string will have that last part removed, and if the resulting path matches a data service node then the response will be a documentation page for that node. If no node is matched, a 404 error will result.

If not specified, the default value is 'index'.

doc_template_dir

If specified, the value of this attribute must be a directory path relative to the application root directory. Documentation template paths will be looked up relative to this directory.

If not specified, the default valueis 'doc' (relative to the application root directory) .

doc_output_dir

If specified, the value of this attribute must be a directory path relative to the application root directory. Output template paths will be looked up relative to this directory.

If not specified, then templated output will not be available. [Note: templated output is not yet implemented].

data_source

If specified, the value of this attribute must be a string. It will be reported, if requested by the 'datainfo' parameter, in the header of the response. Its purpose is to indicate the project, database, etc. from which the returned data has been drawn.

data_provider

If specified, the value of this attribute must be a string. It will be reported, if requested by the 'datainfo' parameter, in the header of the response. Its purpose is to indicate the organization which is providing this data.

data_license

If specified, the value of this attribute must be a string. It will be reported, if requested by the 'datainfo' parameter, in the header of the response. Its purpose is to indicate the name of the license under which this data is being made available.

license_url

If specified, the value of this attribute must be a valid URL. It will be reported, if requested by the 'datainfo' parameter, in the header of the response. Its purpose is provide a link by which more information about the license terms may be found.

admin_name

If specified, the value of this attribute will be reported in the standard documentation footer as the "contact person" to whom bug reports, feedback, or other queries about this service should be addressed.

admin_email

If specified, the value of this attribute will be reported in the standard documentation footer as the "contact address" to which bug reports, feedback, or other queries about this service should be addressed.

Node definitions

Data service nodes are the fundamental organizing elements of a data service definition under this framework. You define them by calling the method define_node on a data service instance. These nodes correspond to the various resources provided by the data service. Any node for which the attributes role and method are both defined will correspond to a data service operation (we call these "operation nodes") while other nodes may correspond to documentation pages or files.

The only required attribute for a node is path, which provides a unique key. Most attributes of data service nodes are inherited path-wise. That is, a node with path "foo/bar" will inherit from the node with path "foo" any attributes whose values are not explicitly specified in its definition. The node with path "/" functions as the root, and its attribute values provide defaults for all of the other nodes. Any node attribute can be explicitly disabled for a particular node by specifying its value as the empty string. The few attributes that are not inherited will be noted below.

Most of the attributes listed here may also be specified in the application configuration file, in the same manner as data service attributes. In that case, they will provide default values for the nodes. Attributes for which you wish to give the same value to all (or most) nodes can be conveniently specified in this way, or you can specify them when defining the root node "/". You can always override them at a lower level of the node hierarchy, if you wish.

The node attributes do not have accessor methods. Rather, you can retrieve the value of any node attribute by using the node_attr method of the data service instance or of a request instance:

    # either of the following:
    
    $attribute_value = $ds->node_attr($path, $attr_name);
    $attribute_value = $request->node_attr($attr_name);

For example, each operation subroutine is called as a method of a request object. This request object itself has many attributes (with accessors) that are derived from the data service node that matches the request. However, you can if necessary query for arbitrary node attributes:

    sub my_operation {
        my ($request) = @_;
        my $default_limit = $request->node_attr('default_limit');
        ...

Node attributes

With the exception of path, each of these attributes is optional.

path

This attribute is required for every node, and must be unique among the nodes defined for this data service. For each incoming request, the URL path and parameters are processed in various ways (depending upon which data service features and special parameters have been enabled) to extract a path which is compared to the set of defined nodes. If one matches, then the attributes of that node will be used to generate the appropriate response. Otherwise, a 404 error ("not found") will be returned to the client.

disabled

If this attribute is given a true value, then it will never match any request. Any request corresponding to this node's path (whether it is an operation request or a documentation request) will return a 404 error. The attribute is inherited, so any children of this node will be likewise disabled. You can use this to define placeholder nodes for functionality to be added later, or to remove existing functions from your application and leave the code in place to be re-activated later.

undocumented

If this attribute is given a true value, then any request asking for documentation about this node will return a 404 error. The node will still be active for requests that ask for its operation. This allows you to provide undocumented data service operations, while explicitly noting that fact in your code.

title

The value of this attribute must be a string, which will be used as the title for any documentation generated about the node. If not specified, the node path will be used instead. This attribute is not inherited. Best practice is to define a specific and informative title for each node.

doc_string

You can set this attribute either directly or by including one or more documentation strings after the node attribute hash in the call to define_node. The default documentation templates use this value as the main description on each documentation page. If you wish to use a longer description than can be easily conveyed in a call to define_node, then create a specific documentation template for this node.

usage

The value of this attribute must be either a hashref or an array of hashrefs. Its purpose is to generate a list of example URLs as part of the node documentation. Each hash should have at least one of the following keys:

format

This specifies the response format that the example URL will request. If not specified, the default format (if any) will be used.

params

This should either be an array of URL parameters or a string. For example, [ 'state=ak', 'count' ] or 'state=ak&count'.

fragment

This should specify a URL fragment (without the #) to be added on the end.

Any invalid items will be simply ignored. If no valid items are found, the section will be omitted completely.

method

A node that has both this attribute and the attribute "role" is considered to be an "operation node". The attribute value must be the name of a subroutine (not a code reference) in the package specified by role. An "operation request" that matches this node will result in the creation of a "request object" whose class contains the appropriate role, followed by a method call to the specified subroutine.

role

The value of this attribute must be the name of a Moo Role defined by an already-loaded package. Any operation methods defined for this node and/or its children must occur in that package. All operation nodes must either have this attribute specified explicitly or inherit its value from a parent node.

arg

This attribute is only relevant for operation nodes. If specified, its value will be provided as an argument when the specified method is called to carry out the operation. By means of this attribute, you can arrange for more than one node to call the same method, and have that method behave differently depending upon which argument it receives.

ruleset

This attribute is only relevant for operation nodes. Its value must be the name of a ruleset defined for this data service instance. This ruleset will be automatically used to validate the URL parameters for any request that matches this node, and will also be used in the process of generating documentation about the node.

If the attribute is not specified, a ruleset name will be automatically generated by taking the node path, changing any slashes into colons, and adding the ruleset prefix (if any has been defined for this data service). If the resulting name corresponds to a defined ruleset, that ruleset will be used. You will probably find it convenient to use these auto-generated ruleset names in most cases, and will rarely need to specify this attribute.

output

This attribute is only relevant for operation nodes. Its value must be the name of an output block defined for this data service, or the names of more then one output block separated by commas and optional whitespace. For example:

    output => 'block1, block2'

This block or blocks will make up the fixed output of this node's operation.

output_label

The value attribute will be used to label the fixed output blocks in the generated documentation for this node. Its value must be a string. If not specified, it defaults to basic.

optional_output

If specified, the value of this attribute must be the name of a single output map (in other words, a set) defined for this data service. This will be used, in conjunction with the value of the special parameter show, to select additional output blocks to be included in a response. This attribute is useless unless the special parameter show is enabled, and will only be used when responding to requests that include a value for that parameter.

file_path

If this attribute is specified, then this node will be a "file node". The value must be a filename relative to the "public file" directory established by the foundation framework (for Dancer, this is the directory "public" under the application root). A request that exactly matches this node will return the contents of this file, a 404 error if the file does not exist, or a 500 error if it exists but is not readable. It is an error to specify both file_path and file_dir for a single node, or to specify either of them along with method.

file_dir

If this attribute is specified, then this node will be a "file node". The value must be a directory path relative to the "public file" directory (see "file_dir"). A request whose path exactly matches this node will result in a 404 error, but one whose path has this node's path as a prefix will look up the remainder of the path in this directory. If the indicated file exists and is readable, its contents will be returned. If it exists but is not readable, a 500 error will be returned. Otherewise, a 404 error results.

public_access

If this attribute is given a true value, then all response messages generated in association with this node will have the CORS header "Access-control-allow-origin" set to "*". Until we provide better means of controlling the CORS header in a later version of this framework, we suggest that you always set this to true for the root node.

default_format

The value of this attribute must be the name of one of the formats defined for this data service. If no response format can be determined from the request URL and/or parameters, then the specified format will be used for any operation request matching this node. If the data service will only be returning data in a single format, then you should set the value of this attribute in the root node to the name of that format.

default_limit

The value of this attribute will put a limit on the size of the result set for all operation requests matching this node, unless overridden by the special parameter "limit". The purpose of this attribute is to prevent badly-composed requests from accidentally generating an enormous result set. A client can always include limit=all in the request parameters to retrieve the full result set. However, provided that clients leave that parameter off unless needed, this attribute provides a backstop. The value of this attribute must be a positive integer. Unless you want a hard limit that clients cannot override, you should make sure that the special parameter limit is enabled if you use this attribute (it is included in the standard set).

default_header

By default, text format output includes a header unless the client explicitly turns it off by including header=no in the request parameters. If this parameter is set to a false value, then no header will be provided for text format responses matching this node unless explicitly requested by the client using header=yes. Unless you want to disable headers entirely, you should make sure that the special parameter header is enabled if you use this attribute (it is included in the standard set).

default_datainfo

By default, information about the dataset is included in a response only if the client requests it by including datainfo=yes in the request parameters. If this attribute is set to a true value, then this information will be included by default for all operation requests matching this node unless the client specifies datainfo=no (assuming that the special parameter datainfo is active). If you want clients to have control over whether or not this information is provided, you should make sure that the special parameter datainfo is enabled (it is included in the standard set).

default_count

By default, a count of the number of records found and returned is included in a response only if the client requests it by including count=yes in the request parameters. If this attribute is set to a true value, then this information will be included by default for all operation requests matching this node unless the client specifies count=no. If you want clients to have control over whether or not this information is provided, you should make sure that the special parameter count is enabled (it is included in the standard set).

default_linebreak

The value of this attribute must be either 'crlf', 'cr', or 'lf'. If not specified, it defaults to 'crlf'. The specified character sequence will be used to separate the lines of any text format output from requests that match this node, unless overridden by the special parameter linebreak.

default_save_filename

The value of this attribute will used for the 'content-disposition' header of the response message for requests matching this node, if the special parameter save is given with a basic 'true' value and not a filename. The name of the requested response format will automatically be appended as a suffix, so no suffix should be included in the attribute value. For requests given through a web browser, most browsers will offer to save the file under this name.

stream_threshold

The value of this attribute must be a positive integer. It is only relevant if the feature stream_data is enabled for this data service. Any response whose length exceeds the value of this attribute will be streamed to the client instead of sent as a single message. This feature is a good idea to enable for any service that can produce responses of more than a few hundred kilobytes. If the feature is enabled but this attribute is not specified, it defaults to 100Kb.

allow_method

This is a set-valued attribute. The individual values must be HTTP method types (i.e. GET, POST), specifying which HTTP methods are valid for requests matching this node. If GET is allowed, then HEAD is allowed automatically as well. If not specified, then the methods GET and HEAD are allowed.

allow_format

This is a set-valued attribute. The individual values must be the names of response formats defined for this data service, specifying which ones are valid for requests matching this node. If not specified, then all defined formats are allowed.

allow_vocab

This is a set-valued attribute. The individual values must be the names of vocabularies defined for this data service, specifying which ones are valid for requests matching this node. If not specified, then all defined vocabularies are allowed.

doc_template

The value of this attribute must be a file pathname relative to the documentation template directory. The specified template file will be used to respond to any documentation requests matching this node. If not specified, then an automatic path will be constructed by starting with the node path and adding "_doc" followed by the filename suffix specified by the templating plugin. If no file is found under that name, then the node path followed by "/index" and the same suffix is tried. You will probably find it easiest to name your documentation files according to one of these two patterns, so that you will rarely if ever need to specify a value for this attribute.

doc_default_op_template

When a request for documentation matches this node, if the template specified by the "doc_template" attribute is not found, and if the automatic paths are not found either, then the value of this attribute is tried next if this is an operation node. If specified, the attribute value must be a file pathname relative to the documentation template directory. The contents of the template should be a generic "operation documentation" template that can be filled in from the node attributes such as "doc_string". In most cases, you will want to specify this attribute at the root node so that its value will be inherited by all of the other nodes.

doc_default_template

When a request for documentation matches this node, if none of the other template paths correspond to an actual template file on disk, then the value of this attribute will be tried as a final default. The contents of this template might say something like "no documentation can be found". If not specified, the default value is 'doc_not_found' followed by the appropriate suffix for the selected templating engine (i.e. '.tt' for Template Toolkit).

doc_defs

If specified, the value of this attribute must be a file path relative to the documentation template directory. This file will be evaluated before each documentation template is rendered. Its purpose is to define standard elements for use by the documentation template, the header, and the footer. You may set this to the empty string if you do not wish a definition file to be used.

If not specified, the default value is 'doc_defs' followed by the appropriate suffix for the selected templating engine (i.e. '.tt' for Template Toolkit). In most cases, you will want to either use the default or specify this attribute at the root node.

doc_header

If specified, the value of this attribute must be a file path relative to the documentation template directory. This file will be evaluated before each documentation template is rendered, but after the file specified by "doc_defs". Its purpose is to generate a header for the documentation pages. You may set this to the empty string if you do not wish a header to be applied to the documentation pages.

If not specified, the default value is 'doc_header' followed by the appropriate suffix for the selected templating engine (i.e. '.tt' for Template Toolkit). In most cases, you will want to either use the default or specify this attribute at the root node.

If specified, the value of this attribute must be a file path relative to the documentation template directory. This file will be evaluated after each documentation template is rendered. Its purpose is to generate a footer for the documentation pages. You may set this to the empty string if you do not wish a footer to be applied to the documentation pages.

If not specified, the default value is 'doc_footer' followed by the appropriate suffix for the selected templating engine (i.e. '.tt' for Template Toolkit). In most cases, you will want to either use the default or specify this attribute at the root node.

doc_stylesheet

If specified, the value of this attribute must be an absolute or relative URL (not a file path) which should refer to a stylesheet file to go with the documentation pages. If you wish the data service to provide this file, you will need to define a data service node with one of the attributes file_dir or file_path (and enable the send_files feature). Typically, this node should have the path "css" or "css/dsdoc.css", and its attribute should point to a similarly-named subdirectory of the public file directory set up by the foundation framework ("public" in the case of Dancer).

If not specified, the default value is a URL generated using the appropriate pattern for this data service for the node path "/css/dsdoc.css". The default installation of this framework includes an appropriate CSS file under that name, which you can edit however you choose.

Response format definitions

Each data service must define one or more response formats. These are defined using the define_format method of the data service object. This must be be done at data service startup time, before any nodes are defined (so that the 'default_format' and 'allow_format' node attributes can be interpreted properly). Each format definition configures one of the available serialization modules.

Predefined formats

The format names listed below are predefined by Web::DataService and can be activated simply by specifying a hashref with the appropriate name attribute. For example, the following call will enable the formats 'json' and 'txt' with all of their default attribute values and documentation strings.

    $ds->define_format(
        { name => 'json' },
        { name => 'txt' });

You can override any of these attribute values and documentation strings by specifying them explicitly. The example code directory contains documentation files for each of these formats, which you can copy into your own application directory and modify as you see fit. However, you must explicitly define the corresponding data service nodes yourself or else the documentation pages will not be available. The corresponding call for this example would be:

    $ds->define_node(
        { path => 'formats',
          title => 'Response formats' },
        { path => 'formats/json',
          title => 'JSON format' },
        { path => 'formats/txt',
          title => 'Plain text format' });

See the next section for descriptions of the various attributes.

json

This format serializes operation results using JSON (JavaScript Object Notation). It sets the content type of the response to "application/json", and its default documentation node is "formats/json". The module used is Web::DataService::Plugins::JSON.

txt

This format serializes operation results as lines of comma-separated values separated by the specified linebreak sequence. Its content type is "text/plain", and its default documentation node is "formats/txt". The module used is Web::DataService::Plugins::Text.

csv

This format is identical to "txt", except that it sets the content type of the response to "text/csv".

tsv

This format serializes operation results as lines of tab-separated values separated by the specified linebreak sequence. Its content type is "text/tab-separated-values", and its default documentation node is "formats/txt". The module used is Web::DataService::Plugins::Text.

xml

This format serializes operation results as XML. Its content type is "text/xml", and its default documentation node is "formats/xml". The module used is Web::DataService::Plugins::XML.

Response format attributes

With the exception of name, each of these attributes is optional for predefined formats. Those which are required for custom (i.e. not predefined) formats are noted below.

name

Each format defined for a given data service must have a unique name. This name can be used as the value of the node attributes allow_format and default_format, and is matched either to the URL path suffix or to the value of the special parameter "format" depending upon which data service features are enabled. This attribute is required for all format definitions.

title

The value of this attribute is used as the format's title in all generated documentation. It defaults to the name.

content_type

The value of this attribute specifies the HTTP content type that will be reported in the response message. It is required for all custom format definitions.

module

The value of this attribute must be the name of a Perl module implementing this format. This module will be automatically loaded via require. You must specify this attribute when defining a custom format, and then include a module with the corresponding name in the appropriate library directory.

doc_node

The value of this attribute specifies the path of a data service node which will provide documentation about this format. You must define the node with a separate call to define_node.

doc_string

You can set this attribute either directly or by including one or more documentation strings after each format definition hash in the call to define_format. This value will be used in any auto-generated format lists in the documentation pages.

default_vocab

The value of this attribute must be the name of an already-defined vocabulary. This vocabulary will be used when rendering responses in this format, unless overridden by the special parameter vocab. It defaults to 'default', which simply uses the underlying field names from the backend data store.

uses_header

The special parameter header will only be enabled if at least one output format uses it. If you are defining a custom format that includes an optional header, you should give this attribute a true value. All of the Text builtin formats set this by default.

disabled

If this attribute is given a true value, the format definition will be ignored (except that no other format may be defined with the same name).

undocumented

If this attribute is given a true value, the format will be available to be selected in the usual way, but it will never appear in any auto-generated documentation list.

Vocabulary definitions

Each data service may define one or more output vocabularies. These are defined using the define_vocab method of the data service object. This must be be done at data service startup time, before any formats are defined (so that the 'default_vocab' format attribute can be interpreted properly).

A vocabulary named 'default' is always available. You do not need to call define_vocab to make it active, unlike with the predefined formats discussed above. This vocabulary expresses results using the underlying field names used by the backend data store. If you do not wish this vocabulary to be available for selection, you can explicitly disable it as follows:

    $ds->define_vocab({ name => 'default', disabled => 1 });

You make use of these vocabularies when defining output blocks using the define_block method. For example, if you have defined a vocabulary named "foo", then any output fields you subsequently define may contain the attribute "foo_name" whose value will be used as the field name in any output rendered with vocabulary "foo". If no such attribute is specified, then that field will be omitted. You can also include processing rules with an "if_vocab" value of "foo", which will be activated only when this vocabulary is selected. In this way, you can transform both the field names and values as appropriate for this vocabulary, and skip output values that cannot be expressed in that vocabulary.

Vocabulary attributes

With the exception of name, each of these attributes is optional.

name

Each vocabulary defined for a given data service must have a unique name. This name can be used as the value of the format attribute default_vocab, and the processing rule attributes if_vocab and not_vocab.

In addition, suppose you have defined a vocabulary named "foo". You can then include the attribute foo_name in any of your field definitions.

title

The value of this attribute is used as the vocabulary's title in documentation pages. It defaults to the name.

doc_node

The value of this attribute specifies the path of a data service node which will provide documentation about this vocabulary. You must define the node with a separate call to define_node.

doc_string

You can set this attribute either directly or by including one or more documentation strings after the format attribute hash in the call to define_format. This value will be used in any auto-generated vocabulary lists in the documentation pages.

use_field_names

If this attribute is given a true value, then the underlying field names used by the backend data store will be used by this vocabulary. It is automatically set to true for the predefined vocabulary 'default'.

disabled

If this attribute is given a true value, the vocabulary definition will be ignored (except that no other vocabulary may be defined with the same name).

undocumented

If this attribute is given a true value, the vocabulary will be available to be selected in the usual way, but it will never appear in any auto-generated documentation list.

Set definitions

Each data service may define one or more sets of elements, which can be used to specify parameter values or output field values, or establish a mapping from one set of values to another. These are defined using the define_set method of the data service object, or its alias define_output_map.

The first argument to define_set (or define_output_map) must be a string that provides the name of the set. This must be unique among all of the sets defined for this data service. The remaining arguments must be either hashrefs or strings: the hashrefs define the elements of the set, and the strings provide documentation. For example:

    $ds->define_set('size_values',
        { value => 'small' }, "Selects only small items",
        { value => 'medium' }, "Selects only medium items",
        { value => 'large' }, "Selects only large items");

Set element attributes

Each element in a set definition must be a hashref specifying one or more of the following attributes. With the exception of value, each of these is optional.

value

This attribute is required for each element. Its value must be a string, and must be unique within the set. This string will be included in the list of values that make up the set.

maps_to

The value of this attribute must be a string. This attribute is used when defining output maps. You can also use it in order to establish a mapping from one set of values to another, for example to convert output field values from one vocabulary to another.

disabled

If this attribute is given a true value, then the element definition in which it occurs is completely ignored.

undocumented

If this attribute is given a true value, then the element is accepted as a valid value for the set in which it is defined. However, it will not appear in any auto-generated documentation about the set.

Output block definitions

Each data service may define one or more groups of output elements, called "output blocks". These are defined by calling the define_block method of the data service object. These output elements specify which data fields should be included in the result, how they should be labeled, and how they should be processed.

The first argument to define_block must be a string that provides the name of the output block. This must be unique among all of the output blocks defined for this data service. The remaining elements must be either hashrefs or strings: the hashrefs define the individual elements of the block, and the strings provide documentation. For example:

    $ds->define_block( 'basic' =>
        { output => 'name' },
            "The name of the state",
        { output => 'abbrev' },
            "The standard abbreviation for the state",
        { output => 'region' },
            "The region of the country in which the state is located",
        { output => 'pop2010' },
            "The population of the state in 2010");

This call defines an output block called 'basic', with four elements. Each of these elements represent output fields.

When a data service request is handled, the data service operation method is expected to construct and execute the appropriate query and then pass back a either a list of output records (as a listref whose elements are hashes) or a DBI statement handle from which the output records can be retrieved. Each of the output records will included in the data service result according to the list of output blocks that have been selected for this request, as interpreted by the serialization routine corresponding to the selected output format.

There are four categories of output elements, listed below. Each category is defined by the presence of a hash key corresponding to the element type. Each element must contain exactly one of these keys, or else an error will be thrown at startup time.

output

An "output" element specifies a single data field to be included in a data service result. The value of the key output gives the internal name of this field, generally, the name by which the field is known to the backend data store. Other keys may be used to specify the name under which this field will be included in the result, and yet other keys can be used to specify conditions under which this it will or will not be included in the result. This is the only kind of element that is required in order to produce data service output; the others are there for the convenience of the application programmer.

set

A "process" element indicates a processing step to be carried out on the data before it is included in the result. The value of the key set specifies which field's value is to be altered.

select

A "select" element specifies a list of strings that can be retrieved later by the various data service operation methods and used to construct queries on the backend data store. Use of this element is optional. The value of the key select must be an arrayref whose elements are strings that contain field specifications, e.g. for an SQL SELECT statement. These should include all of the fields that are necessary in order to generate the output of this block. A data service operation method can then call one of the methods select_list, select_hash or select_string on the request object in order to retrieve the entire set of fields (with duplicates removed) so as to satisfy all of the output blocks that are being used by this particular request. Other keys can be used to specify auxiliary information such as SQL table names.

include

An "include" element can be used to include the definition of one block inside another. The value of the key include must be the name of another output block defined for this data service; the "include" element will be replaced by a list of all of the elements from the named block.

It is important to note that two lists of elements are generated for each request: a list of process ("set") elements, and a list of output elements. These are taken from the fixed output block(s) first, and then from any optional blocks in the order they were specified (not in the order they were defined!) All of the process elements are applied first, and then the output list is used to determine the serialized output for the record.

Output block attributes

The attributes that can be used to configure output are listed in the following sections, one section for each element type.

output elements

An output element is indicated by the presence of the key output. For example:

    { output => 'foo', dedup => 'bar', long_name => 'foodlerizer' }

This particular element declares that each output record will include the data field 'foo', but only if its value differs from the value of the field 'bar'. If the vocabulary 'long' has been selected for this request, then the field will be labeled 'foodlerizer' in the generated output. Otherwise, the label will default to the field name ('foo').

You may use any of the following attributes in specifying output elements. All of the attributes except for 'output' are optional. In the following subsections, "this element" refers to the output element currently being specified.

output

This attribute is required, and we recommend that you always specify it first in order to make clear the element type. The data for this element will be derived from each output record by using the value of this attribute as a hash key.

name

The value of this attribute must be a string. This value will be used as the label for this element in the generated result, unless a vocabulary-specific name is selected. If this attribute is not specified, then the label will default to the value of output.

<vocab>_name

An attribute of this form specifies the label which will be used for this element in the generated result if the corresponding vocabulary is selected. For example, an attribute called dwc_name would be used whenever the vocabulary dwc is selected for a request.

value

The value of this attribute must be a string. This value will be output as the value of this element in every record, regardless of any value retrieved from the backend data store. The purpose of this attribute is to generate constant-valued fields such as record type indicators.

<vocab>_value

An attribute of this form specifies the value to be used for this element if the corresponding vocabulary is selected. The purpose of such attributes it to generate constant-valued fields whose value is appropriate to the specified vocabulary. See ""<vocab"_name">.

dedup

The value of this attribute must be the name of another data field, which need not correspond to any output element. If the value of the data field named by output is identical to the value of the field named by dedup, then this output element will be ignored. You can use this if you wish to prevent two different fields with the same value from appearing in a single output record. This is evaluated independently for each record that is output.

sub_record

The value of this attribute must be the name of another output block defined for this data service. This attribute is only used if the data value is itself a hashref, and if the selected output format can express hierarchical data (e.g. JSON). In that case, the hashref will be interpreted as a sub-record according to the specified block.

always

If this attribute is given a true value, then this element will always be included in the output even if its value is undefined. By default, the JSON format omits from each record any fields whose values are undefined.

if_field

The value of this attribute must be the name of another data field, which need not correspond to any output element. If the named field has an undefined value, then this output element will be ignored for this record. You can use this to output field B only in records where field A has a value. This attribute is evaluated independently for each record that is output.

not_field

This attribute is the opposite of "if_field". If the named field has a defined value, then this output element will be ignored. You can use this to output field B only for those records in which field A does not have a value.

if_vocab

The value of this attribute must be a string containing the names of one or more vocabularies (separated by commas and optional whitespace) that have been defined for this data service. This output element will only be included in the result if one of the specified vocabularies was selected for the request. In contrast to if_field, this attribute is evaluated once for each request at the beginning of processing.

not_vocab

Thie attribute is the opposite of "if_vocab". This element will only be included in the result if the selected vocabulary is not one of those specified.

if_format

The value of this attribute must be a string containing the names of one or more output formats (separated by commas and optional whitespace) that have been defined for this data service. This element will only be included in the result if the selected output format is one of these. This attribute is evaluated once for each request at the beginning of processing.

not_format

This attribute is the opposite of "if_format". This element will not be included in the result if the selected output format is one of these.

if_block

The value of this attribute must be a string containing the names and/or keys of one or more output blocks (separated by commas and optional whitespace) that have been defined for this data service. This element will only be included in the result if at least one of those blocks is included. This attribute is evaluated once for each request at the beginning of processing.

not_block

This attribute is the oppositve of "if_block". This element will not be included in the result if any of the named blocks is.

text_join

This attribute is only used when the selected output format is a text-based one such as CSV. Its value must be a string. When generating the output for any record where the value of this element's data field is an array, the values will be joined together using the specified string. If this attribute is not specified, it defaults to ", ".

xml_join

This attribute is similar to "text_join", and is used when the selected output format is XML.

show_as_list

This attribute is only used when the selected output format is JSON. If it is given a true value, then this output element will be represented as an array even if the data field contains a single value.

doc_string

You can set this attribute either directly or by including one or more documentation strings after the element-definition hash in the call to define_block. This value will be used to auto-generate documentation for the output of the various data service operations.

undocumented

If this attribute is given a true value, then this element will be left out of any auto-generated documentation. It will still appear in data operation results.

process element attributes

An process element is indicated by the presence of the key set. For example:

    { set => 'foo', from => 'bar', code => 'translate' }

This particular element causes the following action to happen before each record is output: the method translate of the request object is called and is passed the value of the data field bar. The result is stored in the data field foo, which need not have had any value until then.

You may use any of the following attributes in specifying output elements. The attribute set specifies the target of the operation, while one of the attributes from or from_each specifies the source. If neither of these attributes is specified, then the target field is processed in place. The source and/or target may be specified as '*', meaning the entire record.

All attributes except for 'set' are optional. A single process element may have at most one of the attributes code, lookup, split and join.

set

This attribute is required, and we recommend that you always specify it first in order to make clear the element type. If the value is any non-empty string other than '*', the data field named by this string will used as the target of this processing step. If the value is '*', then no target is set. This special value is useful mainly in conjunction with the attribute code, causing the specified subroutine to be passed a reference to the record as a whole. It can then modify the record arbitrarily.

from

The value of this attribute must be a non-empty string. The value of the data field named by this string will be used as the "source value" for this processing step. If the value is '*', then the entire record will be passed as the "source value".

from_each

The value of this attribute must be a non-empty string. All values stored in the field named by this string will be used as source values for this processing step: if the value is an array, the step will be carried out on each value in turn. If the value is a scalar, it will be carried out on that value. If a single value results, the target field will be set to that value. If more than one value results, the target field will be set to an arrayref whose contents are the result values. If no values result, the target field will be set to undef. This attribute is not valid if the target is '*'.

code

The value of this attribute must either be the name of a request method (almost always one which you have written as part of a data service operation role) or a code reference. It will be called with the request object as the first argument, and the source value as the second. The source value will a reference to the entire record if set = '*'> or from = '*'> is also specified. The result of this subroutine call will be stored in the target field, unless the target is '*'.

You can use this powerful functionality to arbitrarily alter the data records before they are output.

lookup

The value of this attribute must be a hashref. The source value will be looked up in this hashref, and the resulting value stored in the target field. If the source value does not occur as a hash key, and the attribute "default" was also specified, its value will be used instead. This attribute is not valid if either the source or the target is '*'.

default

The value of this attribute will be used as the result of this processing step if the source value does not appear in the hashref specified by "lookup".

split

The source value will be split according to the value of this attribute, and the target will be set to the resulting list of values. You can use this with either from or from_each; in the latter case all of the resulting lists are concatenated together. This attribute is not valid if either the source or the target is '*'.

join

The source value(s) will be joined together using the value of this attribute, and the target will be set to the resulting string. This attribute is only valid in conjunction with from, and is not valid if either the source or the target is '*'.

always

If this attribute is given a true value, then the processing step will be carried out whether or not the source value is defined. By default, this step is skipped if the source value is not defined.

if_field

This step will only be carried out if the field named by this attribute has a defined value. This attribute only makes sense if it specifies a field other than the source field, because by default a processing step is skipped if its source field is undefined. This attribute is evaluated once for each record.

not_field

This step will only be carried out if the field named by this attribute does not have a defined value. This is the opposite to if_field, and is also evaluated once for each record.

if_vocab

The value of this attribute must be a string containing the names of one or more vocabularies (separated by commas and optional whitespace) that have been defined for this data service. This processing step will only be carried out if one of the specified vocabularies was selected for the request. In contrast to if_field, this attribute is evaluated once for each request at the beginning of processing.

not_vocab

Thie attribute is the opposite of "if_vocab". This processing step will only be carried out if the selected vocabulary is not one of those specified.

if_format

The value of this attribute must be a string containing the names of one or more output formats (separated by commas and optional whitespace) that have been defined for this data service. This processing step will only be carried out if one of the specified formats was selected for the request. This attribute is evaluated once for each request at the beginning of processing.

not_format

Thie attribute is the opposite of "if_format". This processing step will only be carried out if the selected format is not one of those specified.

if_block

The value of this attribute must be a string containing the names of one or more output blocks (separated by commas and optional whitespace) that have been defined for this data service. This processing step will only be carried out if one of the specified blocks is included in the request. This attribute is evaluated once for each request at the beginning of processing.

not_block

Thie attribute is the opposite of "if_block". This processing step will only be carried out if any of the named blocks is included in the request.

select element attributes

A select element is indicated by the presence of the key select. For example:

    { select => 'a.foo, b.bar', tables => 'a, b' }

This element adds the values 'a.foo' and 'b.bar' to the "select list" and 'a' and 'b' to the "tables list". The data service operation methods that you write can then query the request object to obtain either a list or a hash of the unique select values and a hash of the unique table values.

This element was designed with SQL in mind, but you can use it in any way that makes sense in constructing queries for the backend data system regardless of whether or not it is based on SQL. The idea is that your operation methods can use this mechanism to get a list of the fields and tables (or equivalent constructs) necessary for satisfying all of the output blocks that have been selected for this particular query. In this way, a single operation method can satisfy a wide variety of requests.

You can use any of the following attributes in defining a select element:

select

This attribute is required, and we recommend that you always specify it first in order to make clear the element type. The value can be either a string or an array of strings. In the first case, it will be split on the pattern q{\s*,\s*}.

From your data service operation subroutines, you can call any of the relevant methods of the request object (select_list, select_string, select_hash) to retrieve the list of all the select values from all of the output blocks selected for this request, with duplicates removed.

tables

This attribute is optional. The value can either be a string or an array of strings, and is treated exactly like the value of select except that you retrieve the values by calling tables_hash. In most cases, it will make sense to list all of the unique tables (or equivalent constructs, depending upon the backend data system you are using) used by the elements listed in the value of the attribute select.

include element attributes

An include element is indicated by the presence of the attribute include, which must be the only attribute in this element definition. For example:

    { include => 'other_block' }

This definition specifies that all of the elements defined for 'other_block' should be included in the block currently being defined. The value of this attribute must be either a block name or else a value from the output map specified by the attribute "optional_output" from the data service node corresponding to this request. In other words, you can specify which block to include either by its internal name or by the name that clients use to refer to it.

If the name does not correspond to any defined block, then this element is ignored and a warning is generated in the error log.

Ruleset definitions

Each data service may define one or more groups of rules for validating request parameters, called "rulesets". These are defined by calling the define_ruleset method of the data service object.

The first argument to define_ruleset must be a string that provides the name of the ruleset. This must be unique among all of the rulesets defined for this data service. The remaining elements must be either hashrefs or strings: the hashrefs define the individual rules, and the strings provide documentation. For example:

    $ds->define_ruleset( 'filters' =>
        { param => 'lat', valid => DECI_VALUE('-90.0','90.0') },
            "Return all datasets associated with the given latitude.",
        { param => 'lng', valid => DECI_VALUE('-180.0','180.0') },
            "Return all datasets associated with the given longitude.",
        { together => ['lat', 'lng'], errmsg => "you must specify 'lng' and 'lat' together" },
            "If either 'lat' or 'lng' is given, the other must be as well.",
        { param => 'id', valid => POS_VALUE },
            "Return the dataset with the given identifier",
        { param => 'name', valid => STR_VALUE },
            "Return all datasets with the given name");

These rulesets are selected using the node attribute ruleset. When a data service request is handled, its parameters are automatically validated against the ruleset (if any) specified by corresponding data service node. If the validation fails, then an HTTP 400 error ("bad request") is returned to the client along with one or more error messages indicating how the parameter values should be adjusted. If the validation succeeds but warnings are generated, those warnings are included as part of the response message. If the validation succeeds, the cleaned parameter values are made available through methods of the request object (clean_param, <clean_param_list>).

Ruleset validation is handled by the module HTTP::Validate. The Web::DataService method alters the ruleset definitions as specified below, and then hands off the resulting rulesets to the latter module. See that module's documentation for more information about how this process works.

Ruleset attributes

Each rule in a ruleset has a rule type. The rule type is indicated by the presence of a hash key corresponding to the type; each rule must have exactly one such key. The rule type keys are as follows:

parameter rules

The following three types of rules define the recognized parameter names.

param

    { param => <parameter_name>, valid => <validator> ... }

If the specified parameter is present, then its value must pass one of the specified validators. If it passes any of them, the rest are ignored. If it does not pass any of them, then an appropriate error message will be generated. If no validators are specified, then the value will be accepted no matter what it is.

optional

    { optional => <parameter_name>, valid => <validator> ... }

An optional rule is identical to a param rule, except that the presence or absence of the parameter will have no effect on whether or not the containing ruleset is fulfilled. A ruleset in which all of the parameter rules are optional will always be fulfilled. This kind of rule is useful in validating URL parameters, especially for GET requests.

A special syntax is available which automatically generates validation rules for any special parameters that have been enabled for this data service. You can provide any of the following as the parameter name in a rule of type optional:

SPECIAL(parameter_name)

This generates a rule for the specified special parameter, with the attributes defaulting to appropriate values. You are free to override these by specifying any of the attributes explicitly. Standard documentation is provided by default, unless you specifically provide your own.

For example, the following definition will change the set of acceptable values for the special parameter linebreak.

    { param => 'SPECIAL(linebreak)', valid => ENUM_VALUE('cr', 'crlf', 'foo') }

You will often use this form with the special parameter show, which is not included in the lists shown below because the set of values it can take may vary. Unless your data service is very simple, you will probably need to define multiple rulesets that specify this parameter, selecting the appropriate validation set for each one. For example, the following definition specifies that the valid values for the special parameter show will be the values defined for the set extra. See the tutorial for an example of how these are used.

    { param => 'SPECIAL(show)', valid => 'extra' }
SPECIAL(all)

This generates a list of rules, one for each enabled special parameter except show and possibly vocab, and except for any special parameters that have been already defined in this ruleset. The show parameter is never included, and vocab is only included if any vocabulary other than default has been defined for this data service. Each of these generated rules includes standard documentation strings. If you wish to override these for certain parameters, define them explicitly using the form discussed above.

SPECIAL(single)

This works similarly to SPECIAL(all), but only includes those parameters that are relevant to single-record results. The parameters limit, offset, and count are skipped.

mandatory

    { mandatory => <parameter_name>, valid => <validator> ... }

A mandatory rule is identical to a param rule, except that the parameter is required to be present with a non-empty value. If it is not, then an error message will be generated. This kind of rule can be useful when validating HTML form parameters.

parameter constraint rules

The following rule types can be used to specify additional constraints on the presence or absence of parameter names.

together

    { together => [ <parameter_name> ... ] }

If one of the listed parameters is present, then all of them must be. This can be used with parameters such as 'longitude' and 'latitude', where neither one makes sense without the other.

at_most_one

    { at_most_one => [ <parameter_name> ... ] }

At most one of the listed parameters may be present. This can be used along with a series of param rules to require that exactly one of a particular set of parameters is provided.

ignore

    { ignore => [ <parameter_name> ... ] }

The specified parameter or parameters will be ignored if present, and will not be included in the set of reported parameter values. This rule can be used to prevent requests from being rejected with "unrecognized parameter" errors in cases where spurious parameters may be present. If you are specifying only one parameter name, it does need not be in a listref.

inclusion rules

The following rule types can be used to include one ruleset inside of another. This allows you, for example, to define rulesets for validating different groups of parameters and then combine them into specific rulesets for use with different URL paths.

It is okay for an included ruleset to itself include other rulesets. However, any given ruleset is checked only once per validation no matter how many times it is included.

allow

    { allow => <ruleset_name> }

A rule of this type is essentially an 'include' statement. If this rule is encountered during a validation, it causes the named ruleset to be checked immediately. It must pass, but does not have to be fulfilled.

require

    { require => <ruleset_name> }

This is a variant of allow, with an additional constraint. The validation will fail unless the named ruleset not only passes but is also fulfilled by the parameters. You could use this, for example, with a query-type URL in order to require that the query not be empty but instead contain at least one significant criterion. The parameters that count as "significant" would be declared by param rules, the others by optional rules.

inclusion constraint rules

The following rule types can be used to specify additional constraints on the inclusion of rulesets.

require_one

    { require_one => [ <ruleset_name> ... ] }

You can use a rule of this type to place an additional constraint on a list of rulesets already included with allow rules. Exactly one of the named rulesets must be fulfilled, or else the request is rejected. You can use this, for example, to ensure that a request includes either a parameter from group A or one from group B, but not both.

require_any

    { require_any => [ <ruleset_name> ... ] }

This is a variant of require_one. At least one of the named rulesets must be fulfilled, or else the request will be rejected.

allow_one

    { allow_one => [ <ruleset_name> ... ] }

Another variant of require_one. The request will be rejected if more than one of the listed rulesets is fulfilled, but will pass if either none of them or just one of them is fulfilled. This can be used to allow optional parameters from either group A or group B, but not from both groups.

Other attributes

Any rule definition may also include one or more of the following keys:

errmsg

This key specifies the error message to be returned if the rule fails, overriding the default message. For example:

    $ds->define_ruleset( 'specifier' => 
        { param => 'name', valid => STRING_VALUE },
        { param => 'id', valid => POS_VALUE });
    
    $ds->define_ruleset( 'my_operation' =>
        { require => 'specifier', 
          errmsg => "you must specify either of the parameters 'name' or 'id'" });

Error messages may include any of the following placeholders: {param}, {value}. When included with a parameter rule these are replaced by the parameter name and original parameter value(s), single-quoted. When used with other rules, {param} is replaced by the full list of relevant parameters or ruleset names, quoted and separated by commas. This feature allows you to define common messages once and use them with multiple rules.

warn

This key causes a warning to be generated rather than an error if the rule fails. Unlike errors, warnings do not cause a request to be rejected. Instead, they will automatically be returned as part of the data service response.

If the value of this key is 1, then what would otherwise be the error message will be used as the warning message. Otherwise, the specified string will be used as the warning message.

key

The key 'key' specifies the name under which any inforamtion generated by the rule will be saved. For a parameter rule, the cleaned value will be saved under this name. For all rules, any generated warnings or errors will be stored under the specified name instead of the parameter name or rule number. This allows you to easily determine after a validation which warnings or errors were generated.

The following keys can be used only with rules of type param, optional or mandatory:

valid

This key specifies the domain of acceptable values for a parameter. The value must be either a single string, a single code reference, or a list of code references. If you provide a string, it must be the name of a set previously defined for this data service. Otherwise, you can either select from the list of built-in validator functions or provide your own.

If the parameter named by this rule is present, it must pass at least one of the specified validators or else an error message will be generated. If multiple validators are given, then the error message returned will be the one generated by the last validator in the list. This can be overridden by using the "errmsg" key.

multiple

This key specifies that the parameter may appear multiple times in the request. Without this directive, multiple values for the same parameter will generate an error. For example:

    $ds->define_ruleset( 'identifiers' => 
        { param => 'id', valid => POS_VALUE, multiple => 1 });

If this directive is used, then the cleaned value of the parameter will be a list if at least one valid value was found and undef otherwise. If you wish a request to be considered valid even if some of the values fail the validator, then either use the "list" key instead or include a "warn" key as well.

split

This directive has the same effect as "multiple", and in addition causes each parameter value string to be split ("split" in perlfunc) as indicated by the value of the directive. If this value is a string, then it will be compiled into a regexp preceded and followed by \s*. So in the following example:

    define_ruleset( 'identifiers' =>
        { param => 'id', valid => POS_VALUE, split => ',' });

The value string will be considered to be valid if it contains one or more positive integers separated by commas and optional whitespace. Empty strings between separators are ignored.

    123,456             # returns [123, 456]
    123 , ,456          # returns [123, 456]
    , 456               # returns [456]
    123 456             # not valid
    123:456             # not valid

If you wish more precise control over the separator expression, you can pass a regexp quoted with qr instead.

list

This directive has the same effect as "split", but generates warnings instead of error messages when invalid values are encountered (as if warn => 1 was also specified). The resulting cleaned value will be a list containing any values which pass the validator, or undef if no valid values were found. See also "warn".

alias

This directive specifies one or more aliases for the parameter name (use a listref for multiple aliases). These names may be used interchangeably in requests, but any request that contains more than one of them will be rejected with an appropriate error message unless "multiple" is also specified.

default

This directive specifies a default value for the parameter, which will be reported if no value is specified in the request. If the rule also includes a validator, the specified default value will be passed to it and the resulting cleaned value, if any, will be used. An exception will be thrown at the time of rule definition if the default value does not pass the validator.

AUTHOR

mmcclenn "at" cpan.org

BUGS

Please report any bugs or feature requests to bug-web-dataservice at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Web-DataService. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

COPYRIGHT & LICENSE

Copyright 2014 Michael McClennen, all rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.