NAME

Chorus::Collection::Filter - Pattern matching on ordered sequences of Chorus::Frame objects

VERSION

This module is part of Chorus::Engine 1.05.

SYNOPSIS

use Chorus::Frame;
use Chorus::Collection::Filter qw($FILTER @_VFILTER);

# Build a token sequence (e.g. from Chorus::Collection::List)
# Each $token is a Chorus::Frame with a 'categorie' slot.

my $f = Chorus::Frame->new(_ISA => $FILTER);

# Tell the filter how to extract a comparable value from a Frame
$f->set_node_test(sub {
    my ($frame) = @_;
    return $frame->categorie;
});

# Compile a pattern
$f->set_filter('^DET NOM (ADJ+) !PONCT*$');

# Test a sequence
if ($f->check(@tokens)) {
    my ($adjectives) = @_VFILTER;   # captured group (ADJ+)
}

DESCRIPTION

Chorus::Collection::Filter provides $FILTER, a Chorus::Frame prototype for testing whether an ordered sequence of Frames matches a pattern.

The pattern language is inspired by regular expressions but operates on sequences of discrete tokens rather than characters. Each position in the pattern is matched against the result of a user-supplied node test function (set with "set_node_test") that extracts a comparable value from each Frame.

Captured groups (enclosed in parentheses) are collected in the exported array @_VFILTER after a successful "check" call, in the same way $1, $2, etc., work with Perl regular expressions.

EXPORTS

Nothing is exported by default. The following symbols are available on request:

use Chorus::Collection::Filter qw($FILTER @_VFILTER);
$FILTER

The Frame prototype. Use _ISA => $FILTER to create filter instances. $FILTER itself inherits from "$LIST" in Chorus::Collection::List — a compiled pattern is stored internally as a linked list of node Frames.

@_VFILTER

Global array of captured groups. Each element is an arrayref containing the Frames matched by the corresponding capture group in the last successful "check" call.

# pattern: '^DET NOM (ADJ+) !PONCT*$'
if ($f->check(@tokens)) {
    my ($adjs) = @_VFILTER;   # arrayref of ADJ Frames
}

Note: @_VFILTER is reset at the start of every "check" call. Capture the value immediately after the call if you need to keep it across further check invocations.

CONSTANTS

Match-mode constants

ANYTHING    # matches any token (equivalent to . in regexp)
IS          # token must be in the node's token set  (default)
IS_NOT      # token must NOT be in the node's token set (prefix !)

Count-mode constants

EXACTLY_ONE   # exactly one occurrence  (default, no quantifier)
ZERO_OR_MORE  # zero or more occurrences  (quantifier *)
ONE_OR_MORE   # one or more occurrences   (quantifier +)
INTERVALLE    # between min and max occurrences  (quantifier {m,n})

COUNT_MAX_LIMIT

Maximum number of occurrences considered for * and + quantifiers. Currently set to 100.

METHODS

All methods are slots on the $FILTER prototype and are called on any Frame that inherits from $FILTER.

set_node_test

$f->set_node_test( \&sub )

Installs the function used to extract a comparable token value from a Frame during pattern matching. The function receives a single Frame argument and should return a scalar (string, number) or an arrayref of strings for multi-valued tokens.

$f->set_node_test(sub {
    my ($frame) = @_;
    return $frame->categorie;        # e.g. 'NOM', 'ADJ', 'VRB'
});

The default node test is the identity function (returns the Frame itself). Always call set_node_test before "check" unless you intentionally compare Frame references.

set_filter

$f->set_filter( $pattern_string )

Compiles $pattern_string into an internal linked list of node Frames and stores it as the current pattern. Resets any previously compiled pattern.

$f->set_filter('^DET NOM (ADJ+) !PONCT*$');

See "PATTERN SYNTAX" for a description of the pattern language.

check

$f->check( @frames )

Tests the sequence @frames against the compiled pattern. Returns true (1) on success, or undef on failure.

On success, @_VFILTER is populated with one arrayref per capture group (same order as the parentheses in the pattern).

if ($f->check(@tokens)) {
    my ($group1, $group2) = @_VFILTER;
}

Note: @_VFILTER is reset to () on every call, including failed ones.

PATTERN SYNTAX

A pattern is a space-separated string of node descriptors, optionally bounded by anchors.

Anchors

^    Start-of-sequence anchor.  The pattern must match from the first token.
$    End-of-sequence anchor.  The pattern must match through the last token.

Token descriptors

X         Matches exactly the token X  (IS mode).
!X        Matches any token that is NOT X  (IS_NOT mode).
.         Matches any single token  (ANYTHING mode).
[A B C]   Matches any token that is A, B, or C  (OR group).

Quantifiers

Quantifiers follow a token descriptor immediately (no space):

X+        One or more occurrences of X.
X*        Zero or more occurrences of X  (greedy).
X?        Zero or one occurrence of X  (lazy / short match).
X{m,n}    Between m and n occurrences of X.
X{n}      Exactly n occurrences of X.

Capture groups

Parentheses delimit a capture group. The Frames matched by the group are collected as an arrayref in @_VFILTER:

(ADJ+)         captures one or more ADJ Frames → $VFILTER[0]
(PREP{0,1})    captures zero or one PREP Frame  → $VFILTER[1]

Examples

'NOM ADJ'                    # NOM followed by ADJ anywhere in the sequence
'^DET NOM$'                  # exactly DET then NOM, full sequence
'^NOM (ADJ+) !PONCT*$'       # NOM, one-or-more ADJ (captured), opt non-PONCT tail
'[NOM ADJ]+ VRB'             # one or more NOM-or-ADJ, then VRB

SEE ALSO

Chorus::Frame, Chorus::Collection::List, Chorus::Engine

AUTHOR

Christophe Ivorra

BUGS

Please report bugs via http://rt.cpan.org/NoAuth/Bugs.html?Dist=Chorus.

LICENSE AND COPYRIGHT

Copyright (C) 2013-2026 Christophe Ivorra.

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

See http://dev.perl.org/licenses/ for more information.