Author image Sullivan Beck


Set::Files - routines to work with files, each definining a single set


  use Set::Files;
  $Version = $Set::Files::VERSION;

  $obj     = new Set::Files(OPT => VAL, OPT => VAL, ...);

  @set     = $obj->list_sets( [TYPE] );

  @uid     = $obj->owner;
  $uid     = $obj->owner(SET);

  @set     = $obj->owned_by(UID [,TYPE]);

  @ele     = $obj->members(SET);

  $flag    = $obj->is_member(SET, ELE);

  @type    = $obj->list_types( [SET] );

  @dir     = $obj->dir;
  $dir     = $obj->dir(SET);

  %opts    = $obj->opts(SET);
  $val     = $obj->opts(SET,VAR);


  $num     = $obj->add   (SET, FORCE, COMMIT, ELE1,ELE2,...);
  $num     = $obj->remove(SET, ELE1,ELE2,...);




This is a module for working with simple sets of elements where each set is defined in a separate file (one file for each set to be defined).

The advantages of putting each set in a separate file are:

Set managment can be delegated

If all sets are defined in a single file, management of all sets must be done by a single user, or by using a suid program. By putting each set in a separate file, different files can be owned by different users so management of different sets can be delegated.

Set files are a simple format

Because a file consists of a single set only, there is no need to have a complex file format which has to be parsed to get information about the set. As a result, set files can easily be autogenerated or edited with any simple text editor, and errors are less likely to be introduced into the file.

The disadvantages are:

Permissions problems

Some applications may need to read all of the data, but since the different set files may be owned by different people, permissions may get set such that not all set files are readable.

Applications which actually gather all of the data will need to be run as root in order to be reliable. Alternately, some means of enforcing the appropriate permissions needs to be in place.

No central data location

Usually, when you want to define sets, the data ultimately needs to be stored in one central location (which might be a single file or database).

To get around this, a wrapper must be written using this module to copy the data to the central location.

Simple elements only

Many types of sets have elements which have attributes (for example, a ranking within the set or some other attribute). When you start adding attributes, you need a more complex file structure in order to store this information, so that type of set is not addressed with this module. The only attribute that an element has is membership in the set.

Slow data access

Because the data is spread out over several files, each of which must be parsed, and any error checking done, accessing the data can be significantly slower than if the data were stored in a central location.

Features of this module include:

Data caching

This module provides routines for caching the information from all the set files. This can be used to avoid the permissions problems (allowing user run applications access to all cached data) and decrease access time (no parsing is left, and error checking can be done prior to caching the information).

This still requires that a privileged user or suid script be used to update the cache.

Multiple type of sets

Often, it is conveniant to define different types of sets using a single set of files as there may be considerable overlap between the sets of different types.

For example, it might be useful to create files containing sets of users who belong to different committees in a department. Also, there might be sets of users who belong to various departmental mailing lists. One solution is to have two different directories, one with set files with lists of users on the various committees; one with set files with lists of users on each mailing list. Since there might be overlap between these groups, it might be nice to have the two sets of files overlap. For example, some committees may want to have a mailing list associated with the group, others don't want a mailing list, and there may be mailing lists not associated with a committee.

This allows you to have a single file for each set of users, but some sets will have mailing lists, some will be committees, and some will be both.

Set ownership

Since the different files may be owned by different people, operations based on set ownership can be done.


The following methods are available:

  use Set::Files;

Check the module version.

  $obj = new Set::Files(OPT => VAL, OPT => VAL, ...);

This creates a new Set::Files object which reads the appropriate set files (or a cache of the information in set files). The initialization options available are described below.

  @set     = $obj->list_sets( [TYPE] );

Returns a list of all defined sets or the sets of the specified type.

  @uid     = $obj->owner;
  $uid     = $obj->owner(SET);

Lists all UIDs who own a set, or the owner of the specified set.

  @set     = $obj->owned_by(UID [,TYPE]);

Lists all sets owned by the specified UID (or those of a specific type).

  @ele     = $obj->members(SET);

Lists all elements in the specified set.

  $flag    = $obj->is_member(SET, ELE);

Returns 1 if ELE is a member of SET.

  @type    = $obj->list_types( [SET] );

A list of all types defined, or the types that the specified set belong to.

  @dir     = $obj->dir;
  $dir     = $obj->dir(SET);

All directories containing set files, or the directory containing the file of the specified set.

  %opts    = $obj->opts(SET);
  $val     = $obj->opts(SET,VAR);

Returns a hash of all options set for a set, or the value of a specific option. If the specific option is not set, 0 is returned.


This removes the specified set file. By default, it renames the set file to .set_files.$set (which are ignored when reading in set data). If the optional second argument is passed in, no backup is made (i.e. the set file is deleted completely).

This method is only available to those who have write access to the directory containing the set file.


This dumps the current set information to a cache file. This method is only valid if the data was read in from files. If it was read in from the cache, this method will fail.

add, remove
  $num = $obj->add   (SET, FORCE, COMMIT, ELE1,ELE2,...);
  $num = $obj->remove(SET, FORCE, COMMIT, ELE1,ELE2,...);

These functions add/remove the specified elements to/from the set.

When adding elements to a set, it is first checked to see if the element is already in the set, and if so, whether it is explicitely excluded in the set file, or comes from some other set file via. an INCLUDE tag.

If the element is not in the set, it is added. If the FORCE flag is true, the element will be added to the set file explicitly if it is already in the set, but only via. an INCLUDE tag. In either case, any OMIT tag which removes this element will be removed from the list.

When removing elements from a set, a similar set of tests are done. If the element is in the set, it is removed from the file (if it appears in the file) AND a OMIT tag is included. If the element does NOT appear in the set, the file is unmodified unless the FORCE flag is true, in which case an OMIT tag is added.

The COMMIT flag is used to determine whether the file should be written out over the existing file. The file can only be written out if data was read from the files. If it was read in from the cache, this will fail.

The return value is the number of changes made to the set.


Any changes that have been made with the add and remove methods can be written out to the set file(s) with this method. This method is only valid if the data was read in from files. If it was read in from the cache, this method will fail.


The following options can be passed in to the new method:

  path => DIR1:DIR2:...
  path => [ DIR1, DIR2, ... ]

The set files may be stored in one or more different directories. By default, set files are assumed to be in the current directory, but using this option, the directory (or directories) can be explicitely set.

One thing to note. If multiple directories are used, and a file of the same name exists in more than one of the directories, the first one found (in the order that the directories are included in the list) is used. A warning will be issued for files of the same name in other directories, but they will be ignored.

Warnings will be issued for unreadable directories, or unreadable files within a directory.

  valid_file => REGEXP
  valid_file => !REGEXP
  valid_file => \&FUNCTION

By default, all files in the directories are used. With this option, filenames are tested and only those that pass will be used. Others will be silently ignored.

REGEXP is a regular expression. Only filenames which match the REGEXP will pass (or if !REGEXP is used, only filenames which do NOT match REGEXP will pass).

If a reference to a function is passed in, the function &FUNCTION(dir,file) will be evaluated for each file. If it returns 0, the file will be silently ignored. Otherwise it will be used.

  invalid_quiet = 1

By default, when a file is ignored due to failing a valid_file test, or when an element is ignored due to failing a valid_ele test, a warning is issued. With this option, no warning is issued.

  cache => DIR

Data from the set files may be cached in order to speed up data access. If this option is used, you must specify the directory where the data will be cached. The directory may be the same as one of the directories containing the set files.

The cache directory defaults to the first directory given in the path option (or the current directory if no path option is given).

  read => "cache"
  read => "files"
  read => "file"

When an application wants to use data from the set files, they can either read the data from set files or the cache.

If the cache option was used, the default is to read from the cache if it exists, read from the files otherwise. If no cache option was used, the default is to read from the files. When data is read in from the cache, the commit and cache methods are disabled.

If the file option is used, it reads a single set from a single file along with all dependancy sets (i.e. sets that are included or excluded via. the appropriate tags). This allows someone to make changes to a single set file that they own even if permissions are set so that they cannot read other set files. The commit method is available, but the cache method is disabled. The file option requires that the set option also be present.

With the files option, all set files are read. Both the commit and cache methods are enabled.

  set => SET

This defines which set to read when the read = file> option is used. This option is required when read = file> and ignored for any other value for read.

  types => TYPE
  types => [ TYPE1, TYPE2, ... ]

Sets can be of one or more types (or they can belong to no type and be used solely in building other sets using the INCLUDE or EXCLUDE tags described in the FILE FORMAT section below).

This option can be used to specify the names of the different types of sets defined by these files.

If this option is not given, then there is only one type and by default, all sets belong to it.

  default_types => [ TYPEa, TYPEb, ... ]
  default_types => "all"
  default_types => "none"
  default_typew => TYPE

Some types of sets may be more common than others, and you may or may not want to have to explicitely define which types a set belong to.

If a list of types are passed in, every type must be defined in the types option (warnings will be issued if they weren't). If a value of "all" is passed in, sets belong to all types by default. If a value of "none" is passed in, sets don't belong to any type by default.

By default, sets belong to all types available.

  comment => REGEXP

This defines a regular expression used to recognize (and strip out) comments from a set file. The default expression is "#.*" which means that all characters from a pound sign to the end of the line are removed.

If REGEXP is passed in as an empty string, there are no comments. All lines are either empty or contain an element.

  tagchars => STRING

This defines a character (or a string) which marks a line of the set file as containing a tag. The default value is "@".

  valid_ele => REGEXP
  valid_ele => !REGEXP
  valid_ele => \&FUNCTION

By default, every non-blank line (after comments have been stripped out) is treated as an element. If this option is used, elements are tested, and only those that pass the test are treated as valid. Others are invalid and produce a warning.

If a reference to a function is passed in, the function &FUNCTION(set,ele) will be evaluated for each element. If it returns 0, the element will be silently ignored. Otherwise it will be included in the set.

  scratch => DIR

When automatically updating a set file, the directory where the files live may or may not be writable by a user who owns a set file.

If the directory is writable by the user, there is no problem. In this case, when a new set file is written, the old one is backed up and the new one written in it's place.

If the directory is NOT writable by the user, the old copy is backed up to the scratch directory. This directory must be writable by the user. It defaults to /tmp.


A set file has a very simple format. It consists of blank lines, tags, and elements. Comments may be included as whole lines or part of one of the above lines.

Each line is checked for comments and they are removed before any other processing is done. A comment is anything that matches a regular expression which can be set using the comment Init option. The default regular expression is "#.*" which means that comments start with a pound sign anywhere on the line and go to the end of the line.

Tags are lines which begin with begin with a special string (which can be set with the tagchars Init option. The default string is "@". Tag lines are of one of the formats:

  @TAG VAL1,VAL2,...

All other lines are elements. Elements are any string (one per line).

Leading/trailing spaces are ignored in all cases.

The set name is the name of the set file.

The following TAGs are known:


This includes all members of one or more other sets in the current set.


This excludes all members of one or more other sets from the current set. This overrides any members included from other sets, but does NOT exclude members explicitely included in the set file.


This exludes a specific element from the current set. This overrides any elements included via. an INCLUDE tag, or any elements explicitly included in the set file.

Each element must be specified separately since there is no guarantee that elements may not contain commas.


The default types that this set belongs to are determined by the types and default_types Init options.

This tag explicitely puts this set if the specified types, even if it is not in those types of default.


Similar to the TYPE tag, but this tag explicitely removes the set from the specified types, even if it is in them by default.


Although there is no support for element specific attributes, there IS support for attributes which apply to the entire set (and which can be made available to applications using these sets).

Each set may have a hash associated with with key/value pairs (if no value is include, it defaults to 1). These attributes are available using the info method.

All tag lines can be repeated any number of times, so:

  @INCLUDE foo,bar

is equivalent to

  @INCLUDE foo
  @INCLUDE bar

All tags are case insensitive.

When determining the members of a set which includes and excludes other sets, or omits specific elements from the set, all inclusions are evaluted first, followed by all exclusions (i.e. all exclusions override all inclusions). If there is a cyclic dependancy (i.e. A depends on B depends on A where a dependancy can either be an INCLUDE or EXCLUDE), an error is reported and the cyclic dependancy is ignored.

A few examples illustrate the use of INCLUDE, EXCLUDE, and OMIT tags. In the examples, the set file A contains the elements: E1, E2, E3. The set file B contains the elements: E3, E4, E5. The set file contains the following lines:


defines a set contains the elements: E1, E2, E5, E6. The first line includes E1, E2, E3. The second line excludes E3. It does NOT exclude E5 since the EXCLUDE tag does not override elements explicitly included in the set file. Finally, the E5 and E6 elements are added.

The set file containing the following lines:

  @OMIT    E2
  @OMIT    E6

defines a set contains the elements: E1, E5. This is similar to the above example, except that the OMIT tags override elements included via. the INCLUDE tag AND elements explicitly included in the set file.


Several files are used by the Set::Files module. They all live in the directory set by the cache Init Option except for set specific files which live in the same directory as the set file. Files are:


A backup of the given set. When a set file is updated, the original file is stored in this file. The file is stored either in the same directory as the set file (if it is writable) or in the directory specified by the scratch Init Option.

A temporary file where a new set file (or the update to an old one) is written. Once completed, this file is moved into place as the new set file. This file lives in the same directory as the set file or in the scratch directory.


The file containing the cache. This is created using the cache method.


When creating a new set file (or updating an existing one), this file is used (if it exists) as a starting point and then all the data is appended to it. This is a good place to store comments describin how to edit the set files, etc., that set file maintainers can read for help.


None at this point.


This script is free software; you can redistribute it and/or modify it under the same terms as Perl itself.


Sullivan Beck (