The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Table::Readable - Simple-to-edit tables of data

SYNOPSIS

    use FindBin '$Bin';
    use Table::Readable qw/read_table/;
    my @list = read_table ("$Bin/file.txt");
    for my $entry (@list) {
        for my $k (keys %$entry) {
            print "$k $entry->{$k}\n";
        }
    }
    

produces output

    en Residual Current Device
    ja 配線用遮断器
    de Fehlerstrom-Schutzschalter

(This example is included as synopsis.pl in the distribution.)

VERSION

This documents Table::Readable version 0.05 corresponding to git commit c5bf0b4cfb273eaa66a9d14c56a59527a86c3e11 released on Sun Feb 7 09:47:05 2021 +0900.

DESCRIPTION

Table::Readable provides a format for human-editable tables of information which a computer can read. By design, the format does not support any kind of nesting, and can only be text in UTF-8 encoding.

FUNCTIONS

read_table

    my @table = read_table ("list_file.txt");

Read one table of information from the specified file. Each row of information is stored as an anonymous hash. The return value is an array. It dies if not called in array context.

Each row of the table consists of key/value pairs. The key/value pairs are given in the form

    key: value

If the key has spaces

    key with spaces: value

then it is turned into key_with_spaces in the anonymous hash.

Rows are separated by a blank line.

So, for example

    row: first
    data: some information

    row: second
    data: more information
    gubbins: guff here

defines two rows, the first one gets a hash reference with entries row and data, and the second one is a hash reference with entries row and data and gubbins, each containing the information on the right of the colon.

If the key begins with two percentage symbols,

    %%key:

then it marks the beginning of a multiline value which continues until the next line which begins with two percentage symbols. Thus

    %%key:

    this is the value

    %%

assigns "this is the value" to "key".

If the key contains spaces, these are replaced by underscores. For example,

    this key: value

becomes this_key in the output. Whitespace before the colon is also converted, so

    this key : value

becomes this_key_ in the output, with an underscore at the end.

Comments can be added to the table using lines with # as the first character.

The file is assumed to be in the UTF-8 encoding.

Read from a scalar

    my @table = read_table ($stuff, scalar => 1);

Read from a scalar in $stuff.

read_table_hash

    my $hash = read_table_hash ('table.txt', 'id');

    my ($hash, $order) = read_table_hash ('table.txt', 'id');
    for (@$order) {
        print $hash->{$_}{value}, "\n";
    }

This reads the table specified in the first argument, then creates a hash reference using the key specified as the second argument. If some entries of the table do not have the specified key, or if some entries have the same value for the key, warnings are printed.

write_table

    write_table (\@table, 'file.txt');

Write the table in @table to file.txt. It insists on an array reference containing hash references, each of which has simple scalars as values.

This does not convert underscores in the keys into spaces.

If the name of the file is omitted, it prints to STDOUT.

    write_table (\@table);

If the caller asks for a return value, it returns the table as a string rather than printing it.

    my $table = write_table (\@table);

If the width of an output line exceeds a maximum length, the entry is written using the multiline format. This maximum length is available as the global variable $Table::Readable::maxlen. The default value is 75.

TABLE FORMAT

This section gives exact details of the format of the tables.

The table takes the format

    key1: value
    key2: value

    key1: another value
    key2: yet more values

where rows of the table are separated by a blank line, and the columns of each row are defined by giving the name of the column, followed by a colon, followed by the value.

Blank lines

A blank line may contain spaces (something which matches \s).

Comments

Lines containing a hash character '#' at the beginning of the line are ignored. However, lines containing a hash character '#' within multiline entries are considered part of the entry, not comments. Hash characters at positions other than the start of a line are not considered comments, and are not ignored.

Comments are not considered blank lines for the purpose of separating table rows.

    use Table::Readable 'read_table';
    my $table = <<EOF;
    row: 1
    # comment
    some: thing
    
    row: 2
    EOF
    my @rows = read_table ($table, scalar => 1);
    print scalar (@rows), "\n";
    

produces output

    2

(This example is included as comment-not-row.pl in the distribution.)

Encoding

The file must be encoded in the UTF-8 encoding.

Unparseable lines

Lines which are not part of a multiline value, are not comments, and do not contain a key, are discarded and a warning is printed.

Values

Multiline values

    %%key1:

    value goes here.

    %%

Multiline values begin and end with two percent characters at the beginning of the line. Between the two percent characters there may be any number of blank lines. Whitespace (anything matching \s) is stripped from the beginning and end of the value.

There is no way to have double percent characters at the beginning of a line within a multiline value, so if you need double percents, you must use a different syntax and then post-process the entry to convert your syntax to double percent characters.

Whitespace

Whitespace (anything matching \s) is stripped from the beginning and end of the value. Leading and trailing whitespace can be preserved by preceding it with a backslash character:

    use utf8;
    use Table::Readable 'read_table';
    my $table =<<'EOF';
    a: \  b     
    %%c:
    \
    
    d
    
    %%
    %%e:
    
    f
    
    !
    
    
    \
    %%
    EOF
    my @entries = read_table ($table, scalar => 1);
    for my $k (keys %{$entries[0]}) {
        my $v = $entries[0]{$k};
        $v =~ s/!$//;
        print "'$k' = '$v'\n";
    }
    

produces output

    'a' = '  b'
    'e' = 'f
    
    !
    
    
    '
    'c' = '
    
    d'

(This example is included as slash.pl in the distribution.)

If you actually need a backslash at the start or end of your string, use a double backslash, \\. In parts of the string other than the first or the last position, double backslashes are not treated specially.

Alternatively you could use your own syntax such as the following.

    use Table::Readable 'read_table';
    my $table =<<EOF;
    a: b     
    %%c:
    
    d
    
    %%
    %%e:
    
    f
    
    !
    %%
    EOF
    my @entries = read_table ($table, scalar => 1);
    for my $k (keys %{$entries[0]}) {
        my $v = $entries[0]{$k};
        $v =~ s/!$//;
        print "'$k' = '$v'\n";
    }
    

produces output

    'e' = 'f
    
    '
    'c' = 'd'
    'a' = 'b'

(This example is included as whitespace.pl in the distribution.)

Empty values

Keys without values, like

    key:

are permitted within the table. A key with no value results in the value for that key being an empty string, rather than the undefined value.

Keys

Key syntax

A key is a series of one or more of any characters whatsoever except for colons. In regular expression language, a key matches $2 in the following:

    ^(%%)?([^:]+)

Keys cannot contain colons, so if you need to have colons in your keys, invent your own escape sequence, such as substituting semicolons or @ marks for colons.

Consistency of keys

There is no requirement that the keys in one row of the table have to be the same as the keys in the subsequent row. Each row of the table may have completely inconsistent keys. If you need consistent keys, add a post-processor of your own.

Uniqueness of keys

Keys within a single row must be unique. A duplicate key within a row causes a fatal error.

Design and motivation

This module and the associated format were born out of exasperation with various complicated file formats, and the associated complicated parser software. In particular I originally made this module and format as an alternative to using the TMX format for translation memory files, and also out of frustration with the AppConfig module. I currently use this to store translations, such as http://kanji.sljfaq.org/translations.txt, and files of tabular information, such as https://www.lemoda.net/unix/troff-dictionary/dictionary.txt.

This format is designed to reduce the amount of mental effort necessary to type in a machine-readable table of information. By design, it adds only the most minimal possible interpretations to characters. There are only five significant characters, the newline, the colon, the hash character #, the percent character %, and the backslash \. The hash character and the percent character are only significant either when they come immediately after a new line or when they are the first byte in the file, and the backslash is only significant in conjunction with leading or trailing white space. The multiline escape sequence is two percents at the beginning of a line, a sequence which rarely occurs in normal text.

The minimalism of this module is intentional; I will never, ever, add new syntax, extra escape characters, comments not at the end of lines, nested tables, or multiple tables in one file to this format, and I would gladly remove anything from it, if there was anything that could possibly be removed. The reason for that is that every time one adds a new facility, it adds yet another meaning to some sequence of characters, which not only has to be remembered, but also has to be programmed around by adding yet another escape. Let's say that I added comments like this:

     key: value # this is a comment

then I would have to add yet another escape for the case where I actually wanted to put a hash character inside a value, yet another annoying bit of syntax to remember like

     key: value \# not a comment

The more one adds these kinds of meaningful characters, the more the complexity, the more the bugs, the more the workarounds, the more the fixes, and the more the number of things to remember, and the more the headaches. No thanks!

OTHER

There is a Go mode and an Emacs mode for this format as well as the CPAN Perl distribution. They are all part of the same github repository, so you can report issues or make pull requests at the same place as this.

Emacs mode

There is an Emacs mode for the format called table-readable-mode.el in the top directory of the CPAN distribution or in the github repository.

This includes highlighting of comments and makes it easier to format paragraphs of multiline text.

At the moment it is restricted to lower-case alphabetical keys, although that is not part of the format specification.

Go mode

There is a reader and writer of the format in Go in the github repository including tests. I'm not currently making that much use of this code at the moment, since string manipulation in Go is a nuisance compared to Perl, and I've found it's usually easier to convert the tables to JSON in Perl then read the JSON into a map[string]string in Go.

DEPENDENCIES

Carp

Carp is used for printing error messages.

EXPORTS

Nothing is exported by default. All functions can be exported on request. A tag ":all" exports all the functions:

    use Table::Readable ':all';

AUTHOR

Ben Bullock, <bkb@cpan.org>

COPYRIGHT & LICENCE

This package and associated files are copyright (C) 2010-2021 Ben Bullock.

You can use, copy, modify and redistribute this package and associated files under the Perl Artistic Licence or the GNU General Public Licence.