The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Web::DataService::Configuration::Output - how to configure output blocks

SYNOPSIS

This page describes the role that output blocks play in the Web::DataService framework, and how to configure them. It includes a list of the attributes that you can use to define them.

OUTPUT BLOCK DEFINITIONS

Each data service may define one or more groups of output elements, called "output blocks". These are defined by calling the define_block method of a data service object. These output elements specify which data fields should be included in the result, how they should be labeled, and how they should be processed.

The first argument to define_block must be a string that provides the name of the output block. This must be unique among all of the output blocks defined for this data service. The remaining elements must be either hashrefs or strings: the hashrefs define the individual elements of the block, and the strings provide documentation. For example:

    $ds->define_block( 'basic' =>
        { output => 'name' },
            "The name of the state",
        { output => 'abbrev' },
            "The standard abbreviation for the state",
        { output => 'region' },
            "The region of the country in which the state is located",
        { output => 'pop2010' },
            "The population of the state in 2010");

This call defines an output block called 'basic', with four elements. Each of these elements represent output fields.

When a data service request is handled, the data service operation method is expected to construct and execute the appropriate query and then pass back a either a list of output records (as a listref whose elements are hashes) or a DBI statement handle from which the output records can be retrieved. Each of the output records will be processed and included in the data service result according to the list of output blocks that have been selected for this request, as interpreted by the serialization routine corresponding to the selected output format.

There are four categories of output elements, listed below. Each category is defined by the presence of a hash key corresponding to the element type. Each element must contain exactly one of these keys, or else an error will be thrown at startup time.

output

An "output" element specifies a single data field to be included in a data service result. The value of the key output gives the internal name of this field, generally, the name by which the field is known to the backend data store. Other keys may be used to specify the name under which this field will be included in the result, and yet other keys can be used to specify conditions under which this it will or will not be included in the result. This is the only kind of element that is required in order to produce data service output; the others are there for the convenience of the application programmer.

set

A "process" element indicates a processing step to be carried out on the data before it is included in the result. The value of the key set specifies which field's value is to be altered.

select

A "select" element specifies a list of strings that can be retrieved by the various data service operation methods and used to construct queries on the backend data store. Use of this element is optional. The value of the key select must be an arrayref whose elements are strings that contain field specifications, e.g. for an SQL SELECT statement. The idea is that these should include all of the fields that are necessary in order to generate the output of this block. A data service operation method can then call one of the methods select_list, select_hash or select_string on the request object in order to retrieve the entire set of fields (with duplicates removed) that will satisfy all of the output blocks that have been selected for this particular request. Other keys (see below) can be used to specify auxiliary information such as SQL table names.

include

An "include" element can be used to include the definition of one block inside another. The value of the key include must be the name of another output block defined for this data service; the "include" element will be replaced by a list of all of the elements from the named block.

It is important to note that two lists of elements are generated for each request: a list of process ("set") elements, and a list of output elements. These are taken from the fixed output block(s) first, and then from any optional blocks in the order they were specified (not in the order they were defined!) All of the process elements are applied first, and then the output list is used to determine the serialized output for the record.

OUTPUT BLOCK ATTRIBUTES

The attributes that can be used to configure output are listed in the following sections, one section for each element type.

Output elements

An output element is indicated by the presence of the key output. For example:

    { output => 'foo', dedup => 'bar', long_name => 'foodlerizer' }

This particular element declares that each output record will include the data field 'foo', but only if its value differs from the value of the field 'bar'. If the vocabulary 'long' has been selected for this request, then the field will be labeled 'foodlerizer' in the generated output. Otherwise, the label will default to the field name ('foo').

You may use any of the following attributes in specifying output elements. All of the attributes except for 'output' are optional.

output

This attribute is required, and we recommend that you always specify it first in order to make clear the element type. The attribute value will be used as a hash key to look up the value for this field in each output record. Thus, it should always correspond to one of the field names used by the backend data store.

name

The value of this attribute must be a string. This value will be used as the label for this field in the generated result, unless a vocabulary-specific name is selected. If this attribute is not specified, then the label will default to the value of output.

<vocab_name>

An attribute of this form specifies the label which will be used for this element in the generated result if the corresponding vocabulary is selected. For example:

    { output => 'occurrence_no', dwc_name => 'occurrenceID', com_name => 'oid' },

If the vocabulary dwc is selected for a request that includes this output field, then the field will be labeled as occurrenceID. On the other hand, if the vocabulary com is selected, then the field will be labeled as oid. If neither of these vocabularies is selected, then the field will be labeled occurrence_no. The manner in which this label is expressed depends upon the output format.

If the selected vocabulary does not include the attribute use_field_names, and if no corresponding _name field is found, then the field will be left out of the output. This provides for the case in which some vocabularies may not have any way of expressing some of the data fields.

value

The value of this attribute must be a string. If specified, then this value will be output as the value of this field in every record, regardless of any value retrieved from the backend data store. The purpose of this attribute is to generate constant-valued fields such as record type indicators.

<vocab_value>

An attribute of this form specifies the value to be used for this element if the corresponding vocabulary is selected. The purpose of such attributes it to generate constant-valued fields whose value is appropriate to the selected vocabulary. See "<vocab_name>".

dedup

The value of this attribute must be the name of another data field, which need not correspond to any output element. If the value of the data field named by output is identical to the value of the field named by dedup, then this output element will be ignored. You can use this if you wish to prevent two different fields with the same value from appearing in a single output record. This condition is evaluated independently for each record that is output.

sub_record

The value of this attribute must be the name of another output block defined for this data service. This attribute is only used if the data value is itself a hashref, and if the selected output format can express hierarchical data (e.g. JSON). In that case, the hashref will be interpreted as a sub-record according to the specified block.

always

If this attribute is given a true value, then this element will always be included in the output even if its value is undefined. By default, the JSON format omits from each record any fields whose values are undefined. Custom output formats may do this as well, depending upon their implementation.

if_field

The value of this attribute must be the name of another data field, which need not correspond to any output element. If the named field has a defined value, then this output element will be included in the current output record. Otherwise, it will be omitted. You can use this to output field B only in records where field A has a value. This attribute is evaluated independently for each record that is output.

not_field

This attribute is the inverse of "if_field". If the named field has a defined value, then this output element will be ignored. You can use this to output field B only for those records in which field A does not have a value.

if_vocab

The value of this attribute must be a string containing the names of one or more vocabularies (separated by commas and optional whitespace) that have been defined for this data service. This output element will only be included in the result if one of the specified vocabularies was selected for the request. In contrast to if_field, this attribute is evaluated once for each request at the beginning of processing.

not_vocab

Thie attribute is the inverse of "if_vocab". This element will only be included in the result if the selected vocabulary is not one of those specified.

if_format

The value of this attribute must be a string containing the names of one or more output formats (separated by commas and optional whitespace) that have been defined for this data service. This element will only be included in the result if the selected output format is one of these. This attribute is evaluated once for each request at the beginning of processing.

not_format

This attribute is the inverse of "if_format". This element will not be included in the result if the selected output format is one of these.

if_block

The value of this attribute must be a string containing the names and/or keys of one or more output blocks (separated by commas and optional whitespace) that have been defined for this data service. This element will only be included in the result if at least one of those blocks is included. This attribute is evaluated once for each request at the beginning of processing.

not_block

This attribute is the inverse of "if_block". This element will not be included in the result if any of the named blocks is.

text_join

This attribute is only used when the selected output format is a text-based one such as CSV. Its value must be a string. When generating the output for any record where the value of this element's data field is an array, the values will be joined together using the specified string. If this attribute is not specified, it defaults to ", ".

xml_join

This attribute is similar to "text_join", and is used when the selected output format is XML.

show_as_list

This attribute is only used when the selected output format is JSON. If it is given a true value, then this output element will be represented as an array even if the data field contains a single value.

doc_string

You can set this attribute either directly or by including one or more documentation strings after the element-definition hash in the call to define_block. This value will be used to auto-generate documentation describing the output of the various data service operations whose output can include this block.

undocumented

If this attribute is given a true value, then this element will be left out of any auto-generated documentation. It will still appear in data operation results.

Process elements

An process element is indicated by the presence of the key set. For example:

    { set => 'foo', from => 'bar', code => 'translate' }

This particular element causes the following action to happen before each record is output: the method translate of the request object is called and is passed the value of the data field bar. The result is stored in the data field foo, which need not have had any value until then.

You may use any of the attributes listed below in specifying process elements. The attribute set specifies the target of the operation, while one of the attributes from or from_each specifies the source. If neither of these attributes is specified, then the target field is processed in place (i.e. the source and target will be the same). The source and/or target may be specified as '*', meaning the entire record.

All attributes except for 'set' are optional. A single process element may have at most one of the attributes code, lookup, split and join.

set

This attribute is required, and we recommend that you always specify it first in order to make clear the element type. If the value is any non-empty string other than '*', the data field named by this string will used as the target of this processing step. If the value is '*', then no target is set. This special value is useful mainly in conjunction with the attribute code, causing the specified subroutine to be passed a reference to the record as a whole. It can then modify the record arbitrarily.

from

The value of this attribute must be a non-empty string. The value of the data field named by this string will be used as the "source value" for this processing step. If the value is '*', then a reference to the entire record will be passed as the "source value".

from_each

The value of this attribute must be a non-empty string. All values stored in the field named by this string will be used as source values for this processing step: if the value is an array, the step will be carried out on each value in turn. If the value is a scalar, it will be carried out on that value. If a single value results, the target field will be set to that value. If more than one value results, the target field will be set to an arrayref whose contents are the result values. If no values result, the target field will be set to undef. This attribute is not valid if the target is '*'.

code

The value of this attribute must either be the name of a request method (almost always one which you have written as part of a data service operation role) or a code reference. It will be called with the request object as the first argument, and the source value as the second. The source value will be the value of the source field, if one is specified, or a reference to the entire record if set => '*' or from => '*' is also specified. The result of this subroutine call will be stored in the target field, unless the target is '*'.

You can use this powerful functionality to arbitrarily alter the data records before they are output.

lookup

The value of this attribute must be a hashref. The source value will be looked up in this hashref, and the resulting value stored in the target field. If the source value does not occur as a hash key, and the attribute "default" was also specified, its value will be used instead. This attribute is not valid if either the source or the target is '*'.

default

The value of this attribute will be used as the result of this processing step if the source value does not appear in the hashref specified by "lookup".

split

The source value will be split according to the value of this attribute, and the target will be set to the resulting list of values. You can use this with either from or from_each; in the latter case all of the resulting lists are concatenated together. This attribute is not valid if either the source or the target is '*'.

join

The source value(s) will be joined together using the value of this attribute, and the target will be set to the resulting string. This attribute is only valid in conjunction with from, and is not valid if either the source or the target is '*'.

always

If this attribute is given a true value, then the processing step will be carried out whether or not the source value is defined. By default, this step is skipped if the source value is not defined.

if_field

This step will only be carried out if the field named by this attribute has a defined value. This attribute only makes sense if it specifies a field other than the source field, because by default a processing step is skipped if its source field is undefined. This attribute is evaluated once for each record.

not_field

This step will only be carried out if the field named by this attribute does not have a defined value. This is the inverse of if_field, and is also evaluated once for each record.

if_vocab

The value of this attribute must be a string containing the names of one or more vocabularies (separated by commas and optional whitespace) that have been defined for this data service. This processing step will only be carried out if one of the specified vocabularies was selected for the request. In contrast to if_field, this attribute is evaluated once for each request at the beginning of processing.

not_vocab

Thie attribute is the inverse of "if_vocab". This processing step will only be carried out if the selected vocabulary is not one of those specified.

if_format

The value of this attribute must be a string containing the names of one or more output formats (separated by commas and optional whitespace) that have been defined for this data service. This processing step will only be carried out if one of the specified formats was selected for the request. This attribute is evaluated once for each request at the beginning of processing.

not_format

Thie attribute is the inverse of "if_format". This processing step will only be carried out if the selected format is not one of those specified.

if_block

The value of this attribute must be a string containing the names of one or more output blocks (separated by commas and optional whitespace) that have been defined for this data service. This processing step will only be carried out if one of the specified blocks is included in the request. This attribute is evaluated once for each request at the beginning of processing.

not_block

Thie attribute is the inverse of "if_block". This processing step will only be carried out if any of the named blocks is included in the request.

Select elements

A select element is indicated by the presence of the key select. For example:

    { select => 'a.foo, b.bar', tables => 'a, b' }

This element adds the values 'a.foo' and 'b.bar' to the "select list" and 'a' and 'b' to the "tables list". The data service operation methods that you write can then query the request object to obtain either a list or a hash of the unique select values and a hash of the unique table values.

This element was designed with SQL in mind, but you can use it in any way that makes sense in constructing queries for the backend data system regardless of whether or not it is based on SQL. The idea is that your operation methods can use this mechanism to get a list of the fields and tables (or equivalent constructs) necessary for satisfying all of the output blocks that have been selected for this particular query. In this way, a single operation method can satisfy a wide variety of requests.

You can use any of the following attributes in defining a select element:

select

This attribute is required, and we recommend that you always specify it first in order to make clear the element type. The value can be either a string or an array of strings. In the first case, it will be split on the pattern q{\s*,\s*}.

From your data service operation subroutines, you can call any of the relevant methods of the request object (select_list, select_string, select_hash) to retrieve the list of all the select values from all of the output blocks selected for this request, with duplicates removed.

tables

This attribute is optional. The value can either be a string or an array of strings, and is treated exactly like the value of select except that you retrieve the values by calling tables_hash. In most cases, it will make sense to list all of the unique tables (or equivalent constructs, depending upon the backend data system you are using) used by the elements listed in the value of the attribute select.

Include elements

An include element is indicated by the presence of the attribute include, which must be the only attribute in this element definition. For example:

    { include => 'other_block' }

This definition specifies that all of the elements defined for 'other_block' should be included in the block currently being defined. The value of this attribute must be either a block name or else a value from an output map defined for this data service. In other words, you can specify which block to include either by its internal name or by the name that clients use to refer to it.

If the name does not correspond to any defined block, then this element is ignored and a warning is generated in the error log.

AUTHOR

mmcclenn "at" cpan.org

BUGS

Please report any bugs or feature requests to bug-web-dataservice at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Web-DataService. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

COPYRIGHT & LICENSE

Copyright 2014 Michael McClennen, all rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.