The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

TM - Topic Maps, Base Class

SYNOPSIS

    my $tm = new TM;   # empty map
    ... more to come

ABSTRACT

This (monster) class provides read/write access to so-called materialized maps, i.e. maps which completely can reside in memory. Implementations for non-materialized maps can be derived from it.

DESCRIPTION

As it stands, this package implements directly so-called materialized maps, i.e. those maps which completely reside in memory. Non-materialized and non-materializable maps can be implemented by deriving from this class by overloading one or all of the sub-interfaces. If this is done cleverly, then any application, even a TMQL query processor can operate on non-materialized (virtual) maps in the same way as on materialized ones.

The data manipulation interface is very low-level and directly exposes internal data structures. As long as do not mess with the information you get and you follow the API rules, this can provide a convenient, fast, albeit not overly comfortable interface.

Consistency

An application using a map may expect that a map is consolidated, i.e. that the following consistency conditions are met:

A1 (fixed on)

Every topic appearing in some association as type, role or player is also registered as topic.

A2 (fixed on)

Every association in the map is also a registered topic.

Indicator_based_Merging (default: on)

Two (or more) topics sharing the same subject identifier are treated as one topic.

Subject_based_Merging (default: on)

Two (or more) topics sharing the same subject locator are treated as one topic.

TNC_based_Merging (default: off)

Two (or more) topics sharing the same name in the same scope are treated as one topic.

While the first two (A1, A2) are related with the internal consistency of the data structure, the others are a choice the application can make. See method consistency.

This consistency is not automatically provided when a map is modified by the application. It is the applications responsibility to trigger the process to consolidate the map.

When an IO driver is consuming a map from a resource (say load an XTM file), then that driver will ensure that the map is consolidated according to the current settings before it is handed to the application. The application is then in full control of the map as it can change, add and delete topics and associations. This implies that that can become unconsolidated in this process. The method consolidate reinstates consistency again.

You can change these defaults by (a) providing an additional option to the constructor

   new TM (....,
           consistency => [ TM->Subject_based_Merging,
                            TM->Indicator_based_Merging ]);

or (b) by using the accessor consistency (see below).

INFRASTRUCTURE INTERFACE

Constructor

The constructor will create an empty map, or, to be more exact, it will fill the map with the taxonomy from TM::PSI which covers basic concepts such as topic or associations.

The constructor understands a number of key/value pair parameters:

baseuri (default: tm://nirvana/)

Every item in the map has an unique local identifier (e.g. shoesize). The baseuri parameter controls how an absolute URI is built from this identifier.

consistency (default: [ Subject_based_Merging, Indicator_based_Merging ])
psis

If you need to roll your own taxonomy to bootstrap with, you can pass in a structure which has exactly the same structure as that in TM::PSI.

Methods

baseuri

$bu = $tm->baseuri

This methods retrieves/sets the base URI component of the map. This is a read-only method. The base URI is always defined.

consistency

@merging_constraints = $tm->consistency

$tm->consistency (@list_of_constants)

This method provides read/write access to the consistency settings.

If no parameters are provided, then the current list of consistency settings is returned. If parameters are provided, that list must consist of the constants defined above (see "Consistency").

NOTE: Changing the consistency does NOT automatically trigger consolidate.

consolidate

$tm->consolidate

This method consolidates a map by performing the following actions:

  • perform merging based on subject address (see TMDM section 5.3.2)

  • perform merging based on subject indicators (see TMDM section 5.3.2)

  • remove all superfluous toplets (those which do not take part in any association)

    NOTE: Not implemented yet!

The optional parameter is a list of constants, all of which are defined in TM. If the list is empty, then the consistency of the map will be used, otherwise the consistency as defined with this list will override.

NOTE: In all cases the map will be modified.

NOTE: After merging some of the lids might not be reliably point to a topic.

add

$tm->add ($tm2, ...)

This method accepts a list of TM objects and adds all content (associations and topics) from these maps.

NOTE: There is NO merging done. Use explicitly method consolidate for it.

melt

$tm->melt ($tm2)

This - probably more auxilary - function copies relevant aspect of a second map into the object.

MANIPULATION INTERFACE

This package provides a low-level implementation of a memory-based assertion store. The assertions are stored together with some hash information to speed up particular access patterns. It is designed to hold a significant amount of information in pure-Perl representation in memory. It is a also a prime candidate to be implemented in C later. All changes to the store are immediate; there is no transaction concept at this level.

The whole map consists of two components: An assertion holds association information, occurrence attachments to topics and name attachments to topics. Subject identifiers and one (!) subject locator is kept in a minimalistic topic. Every assertion is ALSO a topic.

On this level you can modify each component individually giving you much freedom and direct access to the map structure. Needless to say, that you can shoot yourself into the knee.

Identifiers

All identifiers which are passed into methods here MUST be absolute URIs. This interface makes no attempt to absolutize identifiers. The URIs are kept as strings, not URI objects.

Assertions

One assertion is a record containing its own identifier, the scope, the type of the assocation, a (redundant) field whether this is an association, an occurrence or a name and then all roles and all players, in separate lists.

These lists always have the same length, so that every player corresponds to exactly one role. If one role is played by several players, the role appears multiple times.

These lists are also canonicalized, i.e. ordered in such a way, that assertions can be compared. To flag that an assertion is canonicalized there is another field in the assertion record.

Assertions consist of the following components:

LID:

Every assertion is also a thing in the map, so it has an identifier. For toplet-related information this is the absolute topic ID, for maplets this is a unique identifier generated from a canonicalized form of the assertion itself.

SCOPE:

Yes, the scope of the assertion.

KIND (redundant information):

For technical reasons (read: it is faster) we distinguish between full associations (ASSOC), and characteristics (NAME, OCC).

TYPE:

The topic ID of the type of this assertion.

ROLES:

A list reference which holds a list of topic IDs for the roles.

PLAYERS:

A list reference which holds a list of topic IDs for the players.

CANON:

Either 1 or undef to signal whether this assertion has been (already) canonicalized (see "Canonicalization").

Assertion Construction Functions

These lowest-level functions deal with housekeeping functions for assertions.

Constructor

$assertion = Assertion->new (...)

Any of the above fields can be defined.

absolutize

$assertion = absolutize ($tm, $assertion)

This method takes one assertion and makes sure that all identifiers in it (for the type, the scope and all the role and players) are made absolute for the context map. It returns this very assertion.

canonicalize

$assertion = canonicalize ($tm, $assertion)

This method takes an assertion and reorders the roles (together with their respective players) in a consistent way. It also makes sure that the KIND is defined (defaults to ASSOC), that the type is defined (defaults to THING) and that all references are made absolute LIDs. Finally, the field CANON is set to 1 to indicate that the assertion is canonicalized.

The function will not do anything if the assertion is already canonicalized. The component CANON is set to 1 if the assertion has been canonicalized.

Conveniently, the function returns the same assertion, albeit a maybe modified one.

hash

$hash = hash ($assertion);

For internal optimization all characteristics have an additional HASH component which can be used to maintain indices. This function takes a assertion and computes an MD5 hash and sets the HASH component if that is not yet defined.

Such a hash only makes sense if the assertion is canonicalized, otherwise an exception is raised.

Example:

    my $a = Assertion->new (lid => 'urn:x-rho:important');
    print "this uniquely (well) identifies the assertion ". hash ($a);

Assertion Role Retrieval

is_player, is_x_player

$bool = is_player ($tm, $assertion, $player_id, [ $role_id ])

$bool = is_x_player ($tm, $assertion, $player_id, [ $role_id ])

This function returns 1 if the identifier specified by the player_id parameter plays any role in the assertion provided as assertion parameter.

If the role_id is provided as third parameter then it must be exactly this role (or any subclass thereof) that is played. The 'x'-version is using equality instead of 'subclassing' ('x' for "exact").

get_players, get_x_players

@player_ids = get_players ($tm, $assertion, $role_id)

@player_ids = get_x_players ($tm, $assertion, $role_id)

This function returns the player(s) for the given role. The "x" version does not honor subclassing.

is_role, is_x_role

$bool = is_role ($tm, $assertion, $role_id)

$bool = is_x_role ($tm, $assertion, $role_id)

This function returns 1 if the role_id is a role in the assertion provided. The "x" version of this function does not honor subclassing.

get_roles

@role_ids = @{ get_roles ($tm, $assertion)

This function extracts a reference to the list of role identifiers.

Assertion Map Methods

assert

$tm->assert (@list-of-assertions)

This method takes a list of assertions, canonicalizes them and then injects them into the map. If one of the newly added assertions already existed in the map, it will be ignored.

In this process, all assertions will be completed (if fields are missing) and will be canonicalized (unless they already were). This implies that non-canonicalized assertions will be modified, in that the role/player lists change.

If an assertion does not have a type, it will default to $TM::PSI::THING. If an assertion does not have a scope, it defaults to $TM::PSI::US. Any assertion not having an LID will get one.

Examples:

  my $a = Assertion->new (type => 'rumsti');
  $ms->assert ($a);

The method returns a list of all asserted assertions (sic).

retrieve

$assertion = $tm->retrieve ($some_assertion_id)

@assertions = $tm->retrieve ($some_assertion_id, ...)

This method takes a list of assertion IDs and returns the assertion(s) with the given (subject) ID(s). If the assertion is not identifiable, undef will be returned in its place. Called in list context, it will return a list of assertion references.

is_asserted

$bool = $tm->is_asserted ($a)

This method will return 1 if the passed-in assertion exists in the store. The assertion will be canonicalized before checking, but no defaults will be added if parts are missing.

retract

$tm->retract (@list_of_assertion_ids)

This methods expects a list of assertion IDs and will remove the assertions from the map. If an ID is bogus, it will be ignored.

Only these particular assertions will be deleted. Any topics in these assertions will remain. Use consolidate to remove unnecessary topics.

match

@list = $tm->match (FORALL or EXISTS [ , search-spec, ... ]);

This method takes a search specification and returns all assertions matching.

If the constant FORALL is used as first parameter, this method returns a list of assertions in the store following the search specification. If the constant EXISTS is used the method will return a non-empty value if at least one can be found. The result list contains references to the assertions themselves, not to copies. You can change the assertions themselves on your own risk (read: better not do it).

NOTE: EXISTS is not yet implemented.

The search specification is a hash with the same fields as for the constructor of an assertion:

Example:

   $tm->match (FORALL, type    => '...',
                                  scope   => '...,
                                  roles   => [ ...., ....],
                                  players => [.... ]);

Any combination of assertion components can be used, all are optional, with the only constraint that the number of roles must match that for the players. All involved IDs will be absolutized before matching.

NOTE: Some combinations will be very fast, while others quite slow. The latter is the case when there is no special-purpose matcher implemented and the general-purpose one has to be used as a fallback.

Midlets

Midlets are light-weight topics in that their information is quite minimal. One midlet is represented by an array with two fields:

ADDRESS

It contains the subject locator URI, if known, otherwise undef.

INDICATORS

This is a reference to a list containing subject identifiers. The list can be empty, no duplicate removal is attempted.

Midlet Methods

internalize

$iid = $tm->internalize ($some_id)

$iid = $tm->internalize ($some_id => $some_id)

@iids = $tm->internalize ($some_id => $some_id, ...)

This method does some trickery when a new topic should be added to the map, depending on how parameters are passed into it. The general scheme is that pairs of identifiers are passed in. The first is usually the internal identifier, the second the subject identifier or the subject locator. The convention is that subject identifier URIs are passed in as string reference, whereas subject locator URIs are passed in as strings.

The following cases are covered:

ID => undef

If the ID is already an absolute URI and contains the baseuri of the map as prefix, then this URI is used. If the ID is some other URI, then a topic with that URI as subject locator is search in the map. If such a topic already exists, then nothing special needs to happen. If no such topic existed, a new URI, based on the baseuri and a random number will be created.

ID => URI

Like above, only that the URI is used as subject locator.

ID => \ URI (reference to string)

Like above, only that the URI is used as another subject identifier.

undef => URI

Like above, only that the internal identifier has to be (maybe) created.

undef => undef

A topic with a generated ID will be inserted. Not sure what this is good for.

In any case, the internal identifier(s) of all inserted (or existing) topics are returned.

mids

$mid = $tm->mids ($some_id)

@mids = $tm->mids ($some_id, ...)

This function tries to build absolute versions of the identifiers passed in. undef will be returned if no such can be found. Can be used in scalar and list context.

If the passed in identifier is a relative URI, so it is made absolute by prefixing it with the map baseuri and then we look for a topic with that internal identifier.

If the passed in identifier is an absolute URI, where the baseuri is a prefix, then that URI will be used as internal identifier to look for a topic.

If the passed in identifier is an absolute URI, where the baseuri is NOT a prefix, then that URI will be used as subject locator and such a topic will be looked for.

If the passed in identifier is a reference to an absolute URI, then that URI will be used as subject identifier and such a topic will be looked for.

externalize

$tm->externalize ($some_id, ...)

This function simply deletes the topic entry for a given internal identifier(s). See mids to find these. The function returns all deleted topic entries.

NOTE: Assertions in which this topic is involved will not be removed. Use consolidate to clean up all assertion where non-existing topics still exist.

midlets

@mids = $tm->midlets

This function returns all the things (actually their ids) known in the map.

midlet

$t = $tm->midlet ($mid)

@ts = $tm->midlet ($mid, ....)

This function returns a reference to a topic structure. That includes a subject address, if available and a list (reference) for the optional subject indicators.

Can be used in scalar and list context.

Taxonomics and Subsumption

The following methods provide useful basic, ontological functionality around subclassing (also transitive) between classes and instance/type relationships.

Deriving classes may want to consider to overload/redefine these methods better suitable for their representation of the a map. Saying this, the methods below are not optimized for speed.

NOTE: There are NO subclasses of the thing. But everything is an instance of thing.

is_subclass

$bool = $tm->is_subclass ($superclass_id, $subclass_id)

This function returns 1 if the first parameter is a (transitive) superclass of the second, i.e. there is an assertion of type is-subclass-of in the context map. It also returns 1 if the superclass is a $TM::PSI::THING or if subclass and superclass are the same (reflexive).

TODO: memoize

subclasses, subclassesT

@lids = $tm->subclasses ($lid)

@lids = $tm->subclassesT ($lid)

subclasses returns all direct subclasses of the thing identified by $lid. If the thing does not exist, the list will be empty. subclassesT is a variant which honors the transitive subclassing (so if A is a subclass of B and B is a subclass of C, then A is also a subclass of C).

superclasses, superclassesT

@lids = $tm->superclasses ($lid)

@lids = $tm->superclassesT ($lid)

The method superclasses returns all direct superclasses of the thing identified by $lid. If the thing does not exist, the list will be empty. superclassesT is a variant which honors transitive subclassing.

types, typesT

@lids = $tm->types ($lid)

@lids = $tm->typesT ($lid)

The method types returns all direct classes of the thing identified by $lid. If the thing does not exist, the list will be empty. typesT is a variant which honors transitive subclassing (so if a is an instance of type A and A is a subclass of B, then a is also an instance of B).

instances, instancesT

@lids = $tm->instances ($lid)

@lids = $tm->instancesT ($lid)

These methods return the direct (instances) and also indirect (instancesT) instances of the thing identified by $lid.

is_a

$tm->is_a ($something_lid, $class_lid)

This method returns 1 if the thing referenced by the first parameter is an instance of the class referenced by the second. The method honors transitive subclassing.

NOTE: Everything is an instance of a thing.

Filters

Quite often one needs to walk through a list of things to determine whether they are instances (or types, subtypes or supertypes) of some concept. This list of functions lets you do that: you pass in a list and the function behaves as filter.

are_instances

@ids = $tm->are_instances ($class_id, @list_of_ids)

Returns all those ids where the topic is an instance of the class provided.

are_types

@ids = $tm->are_types ($instance_id, @list_of_ids)

Returns all those ids where the topic is a type of the instance provided.

are_supertypes

@ids = $tm->are_supertypes ($class_id, @list_of_ids)

Returns all those ids where the topic is a supertype of the class provided.

are_subtypes

@ids = $tm->are_subtypes ($class_id, @list_of_ids)

Returns all those ids where the topic is a subtype of the class provided.

Reification

reified_by (experimental)

Provided with an identifier, this method returns the subject locator. It returns undef if there is no such topic or no locator.

TODO: list context

TODO: name sucks

WARNING: this function may go away

reifies (experimental)

WARNING: this function may go away

Variants (aka "The Warts")

No comment.

variants

$tm->variants ($id, $variant)

$tm->variants ($id)

With this method you can get/set a variant tree for any topic. According to the standard only basenames (aka topic names) can have variants, but, hey, this is such an ugly beast (I am digressing). According to this data model you can have variants for all toplets/maplets. You only need their id.

The structure is like this:

  $VAR1 = {
    'tm:param1' => {
      'variants' => {
        'tm:param3' => {
          'variants' => undef,
          'value' => 'name for param3'
        }
      },
      'value' => 'name for param1'
    },
    'tm:param2' => {
      'variants' => undef,
      'value' => 'name for param2'
    }
  };

The parameters are the keys (there can only be one, which is a useful, cough, restriction of the standard) and the data is the value. Obviously, one key value (i.e. parameter) can only exists once.

Caveat: this is not very well tested.

SEE ALSO

TM::Tau

COPYRIGHT AND LICENSE

Copyright 200[1-6] by Robert Barta, <drrho@cpan.org>

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.