NAME

Tstregex - A Diagnostic Tool that quickly shows the longest Regular Expression string match, highlighting the rejected part. The terminal command tstregex '/^[a-z]*\d{3}$/' 'abc12a' shows: abc12a (^[a-z]*\d{3}$)

# Above, the normal parts are the longuest matching substring when bold parts highlights the rejected substring
#(idem with regexp lexical groups between parenthesis)
# A Hybrid Regex Diagnostic Tool (single file Library module with API and command tool) 

SYNOPSIS

$tstregex 'regex' string1 string2 ... stringN

OPTIONS (CLI)

-h --help

show that help..

-v --verbose

shows key info on (un)matching..

-d --diag

Triggers the Enriched Diagnostic View. It displays:
- The string with the failing part highlighted.
- The exact token in the regex that caused the break.
- A visual pointer (^--- HERE) aligned with the regex syntax.
- Execution time (useful for spotting ReDoS/Exponential backtracking).

-a --assert

Misc: performs a huge test suite various a large collection of regexp tests with Tstregex..

Perl Module SYNOPSIS

use Tstregex;
my $ctx = tstregex_init_desc('/^\d{3}/');
tstregex($ctx, '12a');
if (!tstregex_is_full_match($res))
    {
    my $token = tstregex_get_fail_token($res);
    my $pos   = tstregex_get_match_len($res);
    print "Failure on token '$token' at column $pos\n";
    }
Note that if your purpose is just to display the result as the command would do, you can call directly 
the main with the appropriate (argc, argv) style parameters to avoid to spawn another perl process.
See the tstregex file stub for details..

API

tstregex_init_desc($raw_re)

Pre-parses the regex, handles delimiters (m!!, //, etc.), extracts modifiers (i, s, m, x), and prepares the nibbling steps. Returns a context hash.

tstregex($ctx, $string)

Executes the diagnostic. Updates the context.

tstregex_is_full_match

Returns match status of input string (BOOL 0 OR 1)

tstregex_get_match_portion

Returns the matching portion in case of full match (might be smaller than input string, depending on anchors..)

tstregex_get_match_len

Returns the matching substring length

tstregex_get_fail_token

Returns the failing token in the regexp

tstregex_get_re_clean

Returns the matching regexp subpart

tstregex_get_re_raw

Returns the internal representation of the regexp

tstregex_get_prefix_offset

Returns the offset of the original regexp in the raw regexp

DESCRIPTION

tstregex is designed to solve the "Black Box" problem of Regular Expressions.
When a complex regex fails, Perl usually just says "No Match".
This tool identifies exactly where and why it failed by finding the longest possible partial match.

EXAMPLE

$ perl lib/Tstregex.pm '/^[a-z]*\d{3}$/' 'abc123' 'abc12a'
abc123
abc12a (^[a-z]*\d{3}$)

The tool highlights the part of the string where the match failed.

The "Nibbling" Engine

The diagnostic logic uses a "Nibbling" (grignotage) strategy:

1. Decomposition

The engine breaks down your regex into a hierarchy of valid sub-patterns (lexical groups, atoms, and quantifiers) from longest to shortest.

It iteratively tests these sub-patterns against the input string.

It's not just checking if the start matches, but what is the maximum sequence of instructions the engine could follow before hitting a wall.

3. Failure Point Identification

Once the longest matching sub-pattern is found, the tool identifies the very next token in your regex syntax. This is your "Point of Failure".

AUTHOR

Olivier Delouya - 2026

LICENSE

Artistic Version 2