The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

ZOOM::IRSpy::WebService - Accessing the IRSpy database as a Web Service

INTRODUCTION

Because IRSpy keeps its information about targets as ZeeRex records in a Zebra database, that information is available via the SRU and SRW web services. These two services are very closely related: the former REST-like, based on HTTP GET URLs, and the latter SOAP-based. Both use the same query language (CQL) and the same XML-based result formats.

(In addition, Zebra provides ANSI/NISO Z39.50 services, but these are not further discussed here.)

EXAMPLE

Here is a example SRU URL that accesses the IRSpy database of the live system (although it will not be accessible to most clients due to firewall issues. It is broken across lines for clarity:

        http://irspy.indexdata.com:8018/IR-Explain---1?
                version=1.1&
                operation=searchRetrieve&
                query=net.port=3950&
                maximumRecords=10&
                recordSchema=zeerex

It is beyond the scope of this document to provide a full SRU tutorial, but briefly, the URL above consists of the following parts:

http://irspy.indexdata.com:8018

The base-URL of the SRU server.

IR-Explain---1

The name of the SRU database.

version=1.1, operation=searchRetrieve, etc.

SRU parameters specifying the operation requested.

The parameters are as follows:

version=1.1

Mandatory - SRU requests must contain an explicit version identifier, and Zebra supports only version 1.1.

operation=searchRetrieve

Mandatory - SRU requests must contain an operation. Zebra supports several, as discussed below.

query=net.port=3950

When the operation is searchRetrieve, a query must be specified. The query is always expressed in CQL (Common Query Language), which Zebra's IRSpy database supports as described below.

maximumRecords=10

Optional. Specifies how many records to include in a search response. When omitted, defaults to zero: the response includes a hit-count but no records.

recordSchema=zeerex

Optional. Specifies what format the included XML records, if any, should be in. If omitted, defaults to "dc" (Dublin Core). Zebra's IRSpy database supports several schemas as described below.

SUPPORT

SUPPORTED OPERATIONS

Zebra supports the following SRU operations:

explain

This operation requires no further parameters, and returns a ZeeRex record describing the IRSpy database itself.

searchRetrieve

This is the principle operation of SRU, combining searching of the database and retrieval of the records that are found. Its behaviour is specified primarily by the query parameter, support for which is described below, but also by startRecord, maximumRecords and recordSchema.

scan

This operation scans an index of the database and returns a list of candidate search terms for that index, including hit-counts. Its behaviour is specified primarily by the scanClause parameter, but also by maximumTerms and responsePosition.

Here is an example SRU Scan URL:

        http://irspy.indexdata.com:8018/IR-Explain---1?
                version=1.1&
                operation=scan&
                scanClause=dc.title=fish

This lists all words occurring in titles, in alphabetical order, beginning with "fish" or, if that word does not occur in any title, the word that immediately follows it alphabetically.

The scanClause parameter is a tiny query, consisting only an index-name, a relation (usually "=") and a term. The supported index names are the same as those listed below.

CQL SUPPORT

The following CQL context sets are supported, and are recognised in queries by the specified prefixes:

cql

The CQL context set. http://www.loc.gov/standards/sru/cql/cql-context-set.html

rec

The Record Metadata context set. http://srw.cheshire3.org/contextSets/rec/1.1/

net

The Network context set. http://srw.cheshire3.org/contextSets/net/

dc

The Dublin Core context set. http://www.loc.gov/standards/sru/cql/dc-context-set.html

zeerex

The ZeeRex context set. http://srw.cheshire3.org/contextSets/ZeeRex/

Within those sets, the following indexes are supported:

cql.anywhere
cql.allRecords
rec.id
net.protocol
net.version
net.method
net.host
net.port
net.path
dc.title
dc.creator
zeerex.numberOfRecords
zeerex.set
zeerex.index
zeerex.attributeType
zeerex.attributeValue
zeerex.schema
zeerex.recordSyntax
zeerex.supports_relation
zeerex.supports_relationModifier
zeerex.supports_maskingCharacter
zeerex.default_contextSet
zeerex.default_index

These indexes may in general be used with all the relations <, <=, =, >=, >, <> and exact, although of course not all combinations of index and relation make sense. The masking characters * and ? may be used in all appropriate circumstances, as may the word-anchoring character ^.

Finally, sorting criteria may be specified within the query itself. Since YAZ's CQL parser does not yet implement the recently approved CQL 1.2 sorting extension described at http://zing.z3950.org/cql/sorting.html a different scheme is used involving special relation modifiers, sort, sort-desc and numeric.

When a search-term that carries either the sort or sort-desc relation-modifier is or'd with a query, the results of that query are sorted according to the value associated with the specified index - for example, sorted by title if the query is or'd with dc.title=/sort 0. In such sort-specification query terms, the term itself (0 in this example) is the precendence of the sort-key, with zero being highest. Further less significant sort keys may also be specified, using higher-valued terms. By default, sorting is lexicographical (alphabetical); however, if the additional relation modified numeric is also specified, then numeric sorting is used.

For example, the query:

 net.host = *.edu and dc.title=^a* or net.port=/sort/numeric 0

Finds records describing services hosted in the .edu domain and whose titles' first words begin with the letter a, and sorts the results in numeric order of the port number that they run on. And the query:

 net.host = *.edu or net.port=/sort/numeric 0 or net.path=/sort-desc 1

Sorts all the .edu-hosted services numerically by port; and further sorts each equivalence class of services running the same port alphabetically, descending, by database name.

RECORD SCHEMAS

The IRSpy Zebra database supports record retrieval using the following schemas:

dc

Dublin Core records (title, creator, description, etc.)

zeerex

ZeeRex records, the definitive version of the information that drives the database. These records use an extended version of the ZeeRex 2.0 schema that also includes an <irspy:status> element at the end of the record.

index

An XML format that prescribes how the record is indexed for searching. This is useful for debugging, but not likely to be very exciting for casual passers-by.

SEE ALSO

ZOOM::IRSpy

The specifications for SRU (REST-like Web Service) at http://www.loc.gov/sru

The specifications for SRW (SOAP-based Web Service) at http://www.loc.gov/srw

The Z39.50 specifications at http://lcweb.loc.gov/z3950/agency/

The ZeeRex specifications at http://explain.z3950.org/

The Zebra database at http://indexdata.com/zebra

AUTHOR

Mike Taylor, <mike@indexdata.com>

COPYRIGHT AND LICENSE

Copyright (C) 2006 by Index Data ApS.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.7 or, at your option, any later version of Perl 5 you may have available.