Playing games with outthentic dsl

Outthentic - is language to parse unstructured text. It was grown as supporter for web application test tool named swat. Web application is where text often comes in unstructured and unordered way, even though there is json and xml, there are a lot of applications when it is not the case.

Then a generic test tool named outthentic was created as solution for any text parsing/testing tasks. This tool is based on outthentic dsl as well.

Creation a new consumers of outthentic language is way too easy, with API exposed and explained at outthentic documentation.

What I try to do in this short post is to highlight some randomly picked features to let readers to have a sense what is outthentic way to analyze and verify text output, which of course could be used wide in daily testing tasks.

If when read this post you feel like to know more - an official outthentic documentation is here and ( less formal ) - here

Ranges

Sometimes you are given with some repetitive lines bounded by some conditions .

A classic thing is tables.

Imagine a table with two columns of ABC letters and a position numbers:

Letter  Number
A       1
B       2
C       3
...
Z       26
End of table

Let's write up a dsl code to verify that:

First let's verify a basic structure:

between: Letter\s+Number End\s+of\s+table
    regexp: ([A-Z]+)\s+(\d+)
end:

Having this we asked outthentic dsl parser to check that we have Letters and numbers inside range bounded by table header and table footer. Quite easy so far.

Then let's count a table rows.

To do this we need to add some imperative constructions to this quite declarative code:

between: Letter\s+Number End\s+of\s+table
    regexp: ([A-Z]+)\s+(\d+)
    code: our $total_rows++ for @{match_lines()};
    validator: [  our $total_rows == 26, 'valid rows number ']
end:

Comments here:

Define perl code being executed during parsing process

Returns array of successfully matched lines

Define perl code being executed, once code is executed a return value is passed as as arguments to Test::More::ok function:

# $r - is array reference returned after execution of :validator code 

ok($r->[0],$r->[1])

Fine control with captures

Captures function let you gain more fine control over data being checked. It returns all the chunks get captured over latest regular expression check.

between: Letter\s+Number End\s+of\s+table
    regexp: ([A-Z]+)\s+(\d+)
    code:                                   \
    for my $c (@{captures()}){              \
        print $c->[0],'/',$c->[1], "\n";    \
    }
end:

The code above will print:

A/1
B/2
C/3
...

Sequences and generators

Continuous lines sequences are often a subject of testing when dealing with unstructured text.

Let's rewrite latest code example using text block expressions:

begin:
    A 1
    B 2
    C 3
    # and so on till Z 26
end:

This simple code snippet is example of continuous sequence check, when you need to verify that one line followed by another and so on.

Quite easy, but we need to hardcode all 26 rows, which is not good. Let's rewrite this simple test again using generator expression:

begin:
    generator: [ my $i, map { $i++; "$_ $i" } A .. Z ]
end:

Generators like code: or validator: expressions are just piece of perl code being executed.

A return value of generator code ( should be array reference ) defines new outthentic entities get parsed by outthentic parser.

There is no limit, as generator could create:

Streams

And finally new killer feature of outthentic dsl called streams.

Stream() function like match_lines() function return lines successfully matched during verification process.

But streams add some improvements against match_lines, they are able to:

Let's see a trivial text output need to verify:

<letters>
    A
    B
    C
</letters>

<letters>
    D
    E
    F
    G
</letters>

<letters>
    H
    I
</letters>

Writing a dsl code:

between <letters> <\/letters>
regexp: [A-Z]

So good so far. Let's add some debugging lines:

between <letters> <\/letters>
regexp: [A-Z]
code: for my $l (@{match_lines}) { print "$l "\n" }

Get this: ( which is obvious )

A
B
C
D
E
F
G
H
I

What we could see? We lost group context, all the letters now are seen at one heap, without knowledge about original groups.

Now with stream() function:

between <letters> <\/letters>
regexp: [A-Z]
code:                           \
my $i;                          \
for my $s (@{match_lines}) {    \
    $i++;                       \
    print "stream #$i\n";       \
    for my $l (@{$s}){          \
        print "$l\n"            \
    }                           \       
 
}

Output:

stream # 1
A
B
C
stream # 2
D
E
F
G
stream # 3
H
I

-- Regards

Alexey Melezhik