NAME
Data::TagDB::Tutorial::Conventions - Work with Tag databases
VERSION
version v0.08
Overview
This documentation outlines basic conventions that have been establish when working with universal tag data. Most of this document applies to the creation of tags.
It must be noted that the conventions outlined in this documentation are one of many ways. However they are selected as they give a very good compatibility with other existing data structures, patterns, and conventions.
Meaning of a tag
Each tag forms an inseparably pair with exactly one subject. The represents exactly this one subject. And each subject is represented by exactly one tag.
As tag and subject form this tight atomic bond they are often used synonymous.
Abstract and specific
A subject can be one of abstract, a concept, or specific, an instances of a concept or neither nor.
An abstract subject is always untouchable, a kind, or genre of things. Examples include the concept of books, or the concept of parentship. In software development they are often called types.
A specific subject is an instance of one or more abstract subjects. Examples include any specific book one can touch, or the specific relation between a specific parent and child.
Note: The concept of touchableness is to be understood figuratively here. For example "air" (our atmosphere) is often considered untouchable, while the atmosphere is still specific (touchable).
Tag definition and merging
Any two tags that represent the same subject are in fact the same tag. If they occur in software they can be (and should) be merged.
However, which subject a tag belongs to is given by it's definition. Definitions by different parties (or even by the same party over time, e.g. different versions of a standard) will differ. In those cases tags must not be merged.
This also implies that if for example a standard updates the definition of a tag it must create a new tag (most often denoted by a new identifier for the new tag).
Tag identification
Tag identifiers
Each tag is identified by a set of zero or more identifiers. Identifiers can be of different types. Each type has it's own specific properties. Each tag should have at least one globally unique identifier, often an UUID.
See also "also-shares-identifier" in Data::TagDB::Tutorial::WellKnown.
ISE identifiers
It is recommended that each tag should have an UUID, an OID, or an URI (in that order of preference) that identifies the tag. Those types have several properties that make them good identifiers:
Each of those types can be serialised using ISE (Identifier String Encoding)
All UUIDs can be converted to both OIDs and URIs.
All OIDs can be converted to URIs. OIDs that have been UUIDs before can be converted back into their normal form as UUID.
URIs that have been UUIDs or OIDs can be converted back into their normal form as UUID, or OID.
All three types support globally unique identifiers.
Each of the types has it's own special properties complementing each other.
Note: URIs here are understood as identifiers for tags. This is not to be confused with URIs (most often URLs) used to locate specific documents (for fetching them). It is generally expected that fetching any URI used as an identifier will result in error or a document being fetched that is different from the tag's subject.
See also "uuid" in Data::TagDB::Tutorial::WellKnown, "oid" in Data::TagDB::Tutorial::WellKnown, "uri" in Data::TagDB::Tutorial::WellKnown.
Tag names
Tag names are specific identifiers of the type tagname (see "tagname" in Data::TagDB::Tutorial::WellKnown). They provide a basic way to name a tag. However they are not to be confused with proper names, or document titles or technical designators. However they often mirror those.
Tag names are not unique in any way. Each tag can have zero or more tag names.
The following guidelines should be followed for tag names:
Tag names are for reading by a person, not a computer. They should be useful to the user. Including any meaning for machines should be avoided. Other relations should be used for that.
If a subject has a proper name, that name should be included as tag name as-is. For examples Richard George Adams' tag should include the tag name
Richard George Adams
. However if there are valid variations they should also be included. Such asRichard Adams
as his middle name is commonly not used in his publications.Note: Tag names can contain any character. It is therefore possible to include non-latin scripts. Spaces are valid.
Note: If a proper name is commonly translated (such as city names) more languages might be included. See below.
If the subject does not have a proper name it is common to use the English (that is British) name. If the subject is specific to a social context (such as a city that is located within some nation) than it is common to include the name of the subject that is native to that context.
It is possible to include names in more languages if needed.
It is also possible to use context in the link denoting the name to provide cultural, language, or other context for the name.
Subjects without a proper name and (often) abstract subjects may sometimes require more than one word to build a useful tag name. While spaces are valid in tag names it is often harder to enter them in computer systems. For example lists of keywords are often separating individual keywords by space. In that case the use of dash (
-
) is native. Underscore (_
) or CamelCase or similar is dissuaded.Tag names must not include any non-name part (such symbols of sex, membership of a social context or movement). there are other relations for this.
Tag names for relations that allow or encourage multiple values are commonly prefixed with
also-
. For examplealso-has-role
andalso-shares-identifier
.Tag names for subjects that are super types of other subjects are commonly prefixed with
proto-
. Examples includeproto-file
andproto-entity
.See also Data::TagDB::Tutorial::WellKnown for more examples.
Types and roles
Each subject has zero or one type and zero or more roles. The type of a subject is one of it's roles.
Each role defines a set of properties or operations that can be done with the subject. For example one might own an apple (specific) that has the type of apples (abstract) that has also the role of a fruit.
In software development this is often implemented as types and inherence. See for example "package" in perlfunc, "ise" in UNIVERSAL, and "DOES" in UNIVERSAL for Perl's implementation.
Note: It is often not clear when creating a tag what it's type actually is. In this case it is wise to not set the type at all but set those roles that it implements.
Note: While each subject has only one type, in a real world database a tag might have several has-type
relations. However this is only allowed if those types are actually in inherence. As the data might not be complete this might be hard to check. Therefore, it might be wide to allow multiple has-type
alike also-has-role
however print a warning.
See also "also-has-role" in Data::TagDB::Tutorial::WellKnown, "has-type" in Data::TagDB::Tutorial::WellKnown.
Type inherence
To implement inherence in types the relations specialises
and generalises
are used.
It is very common to have multi level type trees. For newly defined types it is also wise to specialise them from common well known types. This allows software that is not aware of them to perform basic operations (such as to correctly display them to the user).
See also "specialises" in Data::TagDB::Tutorial::WellKnown, "generalises" in Data::TagDB::Tutorial::WellKnown.
Common types
The following types are very common. Many other complex types specialise them.
See also Data::TagDB::Tutorial::WellKnown.
Types and roles
Types and roles inherit from subject-type
.
See also "subject-type" in Data::TagDB::Tutorial::WellKnown.
Entities, Persons, accounts
All accounts (like e-mail, bank, or user accounts) should have roles including account
(b72508ba-7fb9-42ae-b4cf-b850b53a16c2
). Accounts are owned by one or more entities. They commonly include some amount of profile data (such as names and contact information) and may represent the entity they are owned by.
In contrast an entity (natural or legal) should have roles including entity
(09ade47e-b049-436b-bf10-8357f4b6bc05
). Entities also include some amount of profile data. However this should be limited to such data that is directly linked to the entity and is valid indefinitely (e.g. names, cultural background, locations, and important events). An entity never contain any technical data such as login credentials. Such data is a clear sign that it is in fact an account.
Legal entities are entities that are created by legal means. Most commonly companies. In order for them to be legal entities registration and legal documentation is needed. Such entities should include the role legal-entity
(f57f5e00-1d08-4731-b49b-c8316e23f06a
). If it is unclear if a group qualifies as a legal entity (e.g. people doing something together vs. a registered association) it is wise to only mark it as entity
.
Natural entities are living beings. Those include alive, dead, real, fictional, human and non-human beings. Such subjects should have the role natural-entity
AKA person
(f6249973-59a9-47e2-8314-f7cf9a5f77bf
) included in their list of roles.
In addition to the natural-entity
the universal tag model includes the type body
(5501e545-f39a-4d62-9f65-792af6b0ccba
) used to record all what is related to the body of a person, such as birthday, locations, and species. And character
(a331f2c5-20e5-4aa2-b277-8e63fd03438d
) used to record anything about the character of a person, such as identity, world view, and interests.
Note: When in doubt it is wise to use the role proto-entity
(7be4d8c7-6a75-44cc-94f7-c87433307b26
). It is the super-role for all other entity and account roles and provides many common properties.
Files
There are four different things people commonly refer to as files: hardlinks, inodes, bit exact copies, and creative works.
We'll conver each in their own section. Here is a basic overview to find out what section is the correct one:
- Hardlinks
-
If you consider two files the same if they have the same filename (that is on the same machine and the full path matches) even if the content, timestamps, and file attributes change you most likely mean hardlinks.
- Inodes
-
If you consider two files the same if refer to the same physical storage (the same blocks on the disk), indepdenent on the filename (and path), and permit for the content, timestamps, and file attributes to change but consider each copy (that does not share blocks and therefore can change indepndently) distict (this also implies that any copy on a different disk is a distinct) you likely mean inodes.
- Bit exact copies
-
If you consider any copy that has exactly (bit-wise) the same content, independent on it's storage the same then you most likely mean bit exact copies (sometimes also called states).
This also implies that if two files have the same hash or checksum they are likely the same, but if they have different hashes or checksums they are always distinct.
- Creative works
-
If you consider all copies the same that have the same human readable content, independent on storage location and file format, encoding, or any other property you likely mean a creative work.
Note: While creative works are one of the fundamental types they are strictly speaking not files by the definition of a file. Therefore they are covered in their own section under "Creative works".
Hardlinks
Hardlinks are very volatile as they often change over time. Therefore it is hard to include them in a non-volatile data model. It is best to consider if the data one wants to store is better bound to another propery than the filename (alone).
If hardlinks are however used, they have the type/role hardlink
(61fba55f-1ba3-460d-85a7-9262557f41c9
. It also seems wise to include the basename (that is the filename without drive and directory) as a tagname. A also-on-filesystem
(cd5bfb11-620b-4cce-92bd-85b7d010f070
) relation also seems to be helpful to locate the device. A also-list-contains-also may be used on the directory the link is part of.
Inodes
Inodes are a fundamental filesystem object. Flat files, directories, symlinks, and other data is stored as an inode on the filesystem. Hardlinks give names to inodes. There can be any number (even zero) of hardlinks to an inode. An inode is known by it's inode-number to the filesystem. For any point time knowing the filesystem a file is on and it's inode number uniquely identifies the file. However once a file is deleted the inode number may be reused by the filesystem.
Therefore for software that depends on the use of inode numbers as keys they must ensure the inode is not deleted. At runtime this can be achieved by keeping an handle to the file open. Between runtimes this is harder. If the inode can be set to immutable that is often the best option. However that may require support by the system and corresponding rights. Having a shadow hardlink or setting the file to read only may or may not provide similar semantics depending on the usecase, operating system, and filesystem.
The type/role to be used with inodes is inode
(59a5691a-6a19-4051-bc26-8db82c019df3
. They should include a relation also-on-filesystem
(cd5bfb11-620b-4cce-92bd-85b7d010f070
), and inode-number
(d2526d8b-25fa-4584-806b-67277c01c0db
).
It is recommended that they include final-*
relations, such as final-file-hash, or has-final-state
(54d30193-2000-4d8a-8c28-3fa5af4cad6b
).
Any intermediate state may be recorded using also-has-state
(4c426c3c-900e-4350-8443-e2149869fbc9
).
Bit exact copies
Bit exact copies are the way to look at files that will not change, or files that will transit between well defined states. This is the case for example for documents that need to be archived, static assets, files under version or revision control (such as files in git), disk images, most types of publications and many more. It is one of the preferred ways to look at files. However it is often combined with one of the other types. For example one might expect a never changing document under a given filename (hardlink plus bit exact copy).
Tags for bit exact copies include the role proto-file. They are also often used as the related (target) of a has-final-state or also-has-state relation. In that case they are also known as a state the file can be in.
Such tags commonly also contain a number of final-*
relations. Very common are final-file-hash, final-file-size, and final-file-encoding.
Creative works
A creative work is any work that was created by an author. Common types include texts, pictures, songs, videos, theatre plays, and curated collections.
Each of those types has it's very own set of roles and relations that are specific to the type. However all of them should have the role creative-work
(84402088-2d3b-49c2-af16-f7715b54f051
).
Common relations for all/most kinds of work include also-has-artist
(94a4ce85-dda0-433e-8b57-949a20606d7f
), also-has-topic
(38314410-0eeb-442f-8215-fcded3d35738
), also-has-title
(f7fd59e6-6727-4128-a0a7-cbc702dc09b8
), and also-depicts
(a6ff7196-bc1a-4bee-9ee4-de99767cf236
).
AUTHOR
Löwenfelsen UG (haftungsbeschränkt) <support@loewenfelsen.net>
COPYRIGHT AND LICENSE
This software is Copyright (c) 2024-2025 by Löwenfelsen UG (haftungsbeschränkt) <support@loewenfelsen.net>.
This is free software, licensed under:
The Artistic License 2.0 (GPL Compatible)