The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Catmandu::Fix - a Catmandu class used for data transformations

SYNOPSIS

# From the command line

$ catmandu convert JSON --fix 'add(foo,bar)' < data.json
$ catmandu convert YAML --fix 'upcase(job) remove(test)' < data.yml
$ catmandu convert CSV  --fix 'sort(tags)' < data.csv
$ catmandu run /tmp/myfixes.txt
$ catmandu convert OAI --url http://biblio.ugent.be/oai --fix /tmp/myfixes.txt

# With preprocessing
$ catmandu convert JSON --var field=foo --fix 'add({{field}},bar)' < data.json

# From Perl

use Catmandu;

my $fixer = Catmandu->fixer('upcase(job)','removed(test)');
my $fixer = Catmandu->fixer('/tmp/myfixes.txt');

# Convert data
my $arr      = $fixer->fix([ ... ]);
my $hash     = $fixer->fix({ ... });
my $importer = Catmandu->importer('YAML', file => 'data.yml');
my $fixed_importer = $fixer->fix($importer);

# With preprocessing
my $fixer = Catmandu::Fix->new(
    variables => {x => 'foo', y => 'bar'},
    fixes => ['add({{x}},{{y}})'],
);

# Inline fixes
use Catmandu::Fix::upcase as => 'my_upcase';
use Catmandu::Fix::remove as => 'my_remove';

my $hash = { 'job' => 'librarian' , deep => { nested => '1'} };

my_upcase($hash,'job');
my_remove($hash,'deep.nested');

DESCRIPTION

A Catmandu::Fix is a Perl package that can transform data. These packages are used for easy data manipulation by non programmers. The main intention is to use fixes on the command line or in Fix scripts. A small DSL language is available to execute many Fix command on a stream of data.

When a fix argument is given to a Catmandu::Importer, Catmandu::Exporter or Catmandu::Store then the transformations are executed on every item in the stream.

FIX LANGUAGE

A Fix script is a collection of one or more Fix commands. The fixes are executed on every record in the dataset. If this command is executed on the command line:

$ catmandu convert JSON --fix 'upcase(title); add(deep.nested.field,1)' < data.json

then all the title fields will be upcased and a new deeply nested field will be added:

{ "title":"foo" }
{ "title":"bar" }

becomes:

{ "title":"FOO" , "deep":{"nested":{"field":1}} }
{ "title":"BAR" , "deep":{"nested":{"field":1}} }

Using the command line, Fix commands need a semicolon (;) as separator. All these commands can also be written into a Fix script where semicolons are not required:

$ catmandu convert JSON --fix script.fix < data.json

where script.fix contains:

upcase(title)
add(deep.nested.field,1)

Conditionals can be used to provide the logic when to execute fixes:

if exists(error)
    set(valid, 0)
end

if exists(error)
    set(is_valid, 0)
elsif exists(warning)
    set(is_valid, 1)
    log(...)
else
    set(is_valid, 1)
end

unless all_match(title, "PERL")
    add(is_perl, "noooo")
end

exists(error) and set(is_valid, 0)
exists(error) && set(is_valid, 0)

exists(title) or log('title missing')
exists(title) || log('title missing')

Binds are used to manipulate the context in which Fixes are executed. E.g. execute a fix on every item in a list:

# 'demo' is an array of hashes
bind list(path:demo)
   add_field(foo,bar)
end
# do is an alias for bind
do list(path:demo)
   add_field(foo,bar)
end

To delete records from a stream of data the reject Fix can be used:

reject()           #  Reject all in the stream

if exists(foo)
    reject()       # Reject records that contain a 'foo' field
end

reject exists(foo) # Reject records that contain a 'foo' field

The opposite of reject is select:

select()           # Keep all records in the stream

select exists(foo) # Keep only the records that contain a 'foo' field

Comments in Fix scripts are all lines (or parts of a line) that start with a hash (#):

# This is ignored
add(test,123)      # This is also a comment

You can load fixes from another namespace with the use statement:

# this will look for fixes in the Foo::Bar namespace and make them
# available prefixed by fb
use(foo.bar, as: fb)
fb.baz()

# this will look for Foo::Bar::Condition::is_baz
if fb.is_baz()
   ...
   fix()
   ...
end

FIX COMMANDS, ARGUMENTS AND OPTIONS

Fix commands manipulate data or in some cases execute side effects. Fix commands have zero or more arguments and zero or more options. Fix command arguments are separated by commas ",". Fix options are name/value pairs separated by a colon ":".

# A command with zero arguments
my_command()

# A command with multiple arguments
my_other_command(foo,bar,test)

# A command with optional arguments
my_special_command(foo,bar,color:blue,size:12)

All command arguments are treated as strings. These strings can be FIX PATHs pointing to values or string literals. When command line arguments don't contain special characters comma "," , equal "=" , great than ">" or colon ":", then they can be written as-is. Otherwise, the arguments need to be quoted with single or double quotes:

# Both commands below have the same effect
my_other_command(foo,bar,test)
my_other_command("foo","bar","test")

# Illegal syntax
my_special_command(foo,http://test.org,color:blue,size:12) # <- syntax error

# Correct syntax
my_special_command(foo,"http://test.org",color:blue,size:12)

# Or, alternative
my_special_command("foo","http://test.org",color:"blue",size:12)

FIX PATHS

Most of the Fix commands use paths to point to values in a data record. E.g. 'foo.2.bar' is a key 'bar' which is the 3-rd value of the key 'foo'. E.g. "foo.''" is a an empty string key which is the value of the key 'foo'.

A special case is when you want to point to all items in an array. In this case the wildcard '*' can be used. E.g. 'foo.*' points to all the items in the 'foo' array.

For array values there are special wildcards available:

* $append   - Add a new item at the end of an array
* $prepend  - Add a new item at the start of an array
* $first    - Syntactic sugar for index '0' (the head of the array)
* $last     - Syntactic sugar for index '-1' (the tail of the array)

E.g.

# Create { mods => { titleInfo => [ { 'title' => 'a title' }] } };
add('mods.titleInfo.$append.title', 'a title');

# Create { mods => { titleInfo => [ { 'title' => 'a title' } , { 'title' => 'another title' }] } };
add('mods.titleInfo.$append.title', 'another title');

# Create { mods => { titleInfo => [ { 'title' => 'foo' } , { 'title' => 'another title' }] } };
add('mods.titleInfo.$first.title', 'foo');

# Create { mods => { titleInfo => [ { 'title' => 'foo' } , { 'title' => 'bar' }] } };
add('mods.titleInfo.$last.title', 'bar');

Some Fix commands can implement an alternatice path syntax to point to values. See for example Catmandu::MARC, Catmandu:PICA:

# Copy the MARC 245a field to the my.title field
marc_map(245a,my.title)

OPTIONS

fixes

An array of fixes. Catmandu::Fix which will execute every fix in consecutive order. A fix can be the name of a Catmandu::Fix::* routine, or the path to a plain text file containing all the fixes to be executed. Required.

preprocess

If set to 1, fix files or inline fixes will first be preprocessed as a moustache template. See variables below for an example. Default is 0, no preprocessing.

variables

An optional hashref of variables that are used to preprocess the fix files or inline fixes as a moustache template. Setting the variables option also sets preprocess to 1.

my $fixer = Catmandu::Fix->new(
    variables => {x => 'foo', y => 'bar'},
    fixes => ['add({{x}},{{y}})'],
);
my $data = {};
$fixer->fix($data);
# $data is now {foo => 'bar'}

METHODS

fix(HASH)

Execute all the fixes on a HASH. Returns the fixed HASH.

fix(ARRAY)

Execute all the fixes on every element in the ARRAY. Returns an ARRAY of fixes.

fix(Catmandu::Iterator)

Execute all the fixes on every item in an Catmandu::Iterator. Returns a (lazy) iterator on all the fixes.

fix(sub {})

Executes all the fixes on a generator function. Returns a new generator with fixed data.

log

Return the current logger. See Catmandu for activating the logger in your main code.

CODING

One can extend the Fix language by creating own custom-made fixes. Three methods are available to create an new fix function:

* Simplest: create a class that implements a C<fix> method.
* For most use cases: create a class that consumes the C<Catmandu::Fix::Builder> role and use C<Catmandu::Path> to build your fixer.
* Hardest: create a class that emits Perl code that will be evaled by the Fix module.

Both methods will be explained shortly.

Quick and easy

A Fix function is a Perl class in the Catmandu::Fix namespace that implements a fix method. The fix methods accepts a Perl hash as input and returns a (fixed) Perl hash as output. As an example, the code belows implements the meow Fix which inserts a 'meow' field with value 'purrrrr'.

package Catmandu::Fix::meow;

use Moo;

sub fix {
    my ($self,$data) = @_;
    $data->{meow} = 'purrrrr';
    $data;
}

1;

Given this Perl class, the following fix statement can be used in your application:

# Will add 'meow' = 'purrrrr' to the data
meow()

Use the quick and easy method when your fixes are not dependent on reading or writing data from/to a JSON path. Your Perl classes need to implement their own logic to read or write data into the given Perl hash.

Fix arguments are passed as arguments to the new function of the Perl class. As in

# In the fix file...
meow('test123', count: 4)

# ...will be translated into this pseudo code
my $fix = Catmandu::Fix::meow->new('test123', count: 4);

Using Moo these arguments can be catched with Catmandu::Fix::Has package:

package Catmandu::Fix::meow;

use Catmandu::Sane;
use Moo;
use Catmandu::Fix::Has;

has msg   => (fix_arg => 1); # required parameter 1
has count => (fix_opt => 1, default => sub { 4 }); # optional parameter 'count' with default value 4

sub fix {
    my ($self,$data) = @_;
    $data->{meow} = $self->msg x $self->count;
    $data;
}

1;

Using this code the fix statement can be used like:

# Will add 'meow' = 'purrpurrpurrpurr'
meow('purr', count: 4)

To allow using the fix as inline function in Perl code use Catmandu::Fix::Inlineable:

with 'Catmandu::Fix::Inlineable';

SEE ALSO

Catmandu::Fixable, Catmandu::Importer, Catmandu::Exporter, Catmandu::Store, Catmandu::Bag