NAME
Chorus::Collection::Filter - Pattern matching on ordered sequences of Chorus::Frame objects
VERSION
This module is part of Chorus::Engine 1.05.
SYNOPSIS
use Chorus::Frame;
use Chorus::Collection::Filter qw($FILTER @_VFILTER);
# Build a token sequence (e.g. from Chorus::Collection::List)
# Each $token is a Chorus::Frame with a 'categorie' slot.
my $f = Chorus::Frame->new(_ISA => $FILTER);
# Tell the filter how to extract a comparable value from a Frame
$f->set_node_test(sub {
my ($frame) = @_;
return $frame->categorie;
});
# Compile a pattern
$f->set_filter('^DET NOM (ADJ+) !PONCT*$');
# Test a sequence
if ($f->check(@tokens)) {
my ($adjectives) = @_VFILTER; # captured group (ADJ+)
}
DESCRIPTION
Chorus::Collection::Filter provides $FILTER, a Chorus::Frame prototype for testing whether an ordered sequence of Frames matches a pattern.
The pattern language is inspired by regular expressions but operates on sequences of discrete tokens rather than characters. Each position in the pattern is matched against the result of a user-supplied node test function (set with "set_node_test") that extracts a comparable value from each Frame.
Captured groups (enclosed in parentheses) are collected in the exported array @_VFILTER after a successful "check" call, in the same way $1, $2, etc., work with Perl regular expressions.
EXPORTS
Nothing is exported by default. The following symbols are available on request:
use Chorus::Collection::Filter qw($FILTER @_VFILTER);
$FILTER-
The Frame prototype. Use
_ISA => $FILTERto create filter instances.$FILTERitself inherits from "$LIST" in Chorus::Collection::List — a compiled pattern is stored internally as a linked list of node Frames. @_VFILTER-
Global array of captured groups. Each element is an arrayref containing the Frames matched by the corresponding capture group in the last successful "check" call.
# pattern: '^DET NOM (ADJ+) !PONCT*$' if ($f->check(@tokens)) { my ($adjs) = @_VFILTER; # arrayref of ADJ Frames }Note:
@_VFILTERis reset at the start of every "check" call. Capture the value immediately after the call if you need to keep it across furthercheckinvocations.
CONSTANTS
Match-mode constants
ANYTHING # matches any token (equivalent to . in regexp)
IS # token must be in the node's token set (default)
IS_NOT # token must NOT be in the node's token set (prefix !)
Count-mode constants
EXACTLY_ONE # exactly one occurrence (default, no quantifier)
ZERO_OR_MORE # zero or more occurrences (quantifier *)
ONE_OR_MORE # one or more occurrences (quantifier +)
INTERVALLE # between min and max occurrences (quantifier {m,n})
COUNT_MAX_LIMIT
Maximum number of occurrences considered for * and + quantifiers. Currently set to 100.
METHODS
All methods are slots on the $FILTER prototype and are called on any Frame that inherits from $FILTER.
set_node_test
$f->set_node_test( \&sub )
Installs the function used to extract a comparable token value from a Frame during pattern matching. The function receives a single Frame argument and should return a scalar (string, number) or an arrayref of strings for multi-valued tokens.
$f->set_node_test(sub {
my ($frame) = @_;
return $frame->categorie; # e.g. 'NOM', 'ADJ', 'VRB'
});
The default node test is the identity function (returns the Frame itself). Always call set_node_test before "check" unless you intentionally compare Frame references.
set_filter
$f->set_filter( $pattern_string )
Compiles $pattern_string into an internal linked list of node Frames and stores it as the current pattern. Resets any previously compiled pattern.
$f->set_filter('^DET NOM (ADJ+) !PONCT*$');
See "PATTERN SYNTAX" for a description of the pattern language.
check
$f->check( @frames )
Tests the sequence @frames against the compiled pattern. Returns true (1) on success, or undef on failure.
On success, @_VFILTER is populated with one arrayref per capture group (same order as the parentheses in the pattern).
if ($f->check(@tokens)) {
my ($group1, $group2) = @_VFILTER;
}
Note: @_VFILTER is reset to () on every call, including failed ones.
PATTERN SYNTAX
A pattern is a space-separated string of node descriptors, optionally bounded by anchors.
Anchors
^ Start-of-sequence anchor. The pattern must match from the first token.
$ End-of-sequence anchor. The pattern must match through the last token.
Token descriptors
X Matches exactly the token X (IS mode).
!X Matches any token that is NOT X (IS_NOT mode).
. Matches any single token (ANYTHING mode).
[A B C] Matches any token that is A, B, or C (OR group).
Quantifiers
Quantifiers follow a token descriptor immediately (no space):
X+ One or more occurrences of X.
X* Zero or more occurrences of X (greedy).
X? Zero or one occurrence of X (lazy / short match).
X{m,n} Between m and n occurrences of X.
X{n} Exactly n occurrences of X.
Capture groups
Parentheses delimit a capture group. The Frames matched by the group are collected as an arrayref in @_VFILTER:
(ADJ+) captures one or more ADJ Frames → $VFILTER[0]
(PREP{0,1}) captures zero or one PREP Frame → $VFILTER[1]
Examples
'NOM ADJ' # NOM followed by ADJ anywhere in the sequence
'^DET NOM$' # exactly DET then NOM, full sequence
'^NOM (ADJ+) !PONCT*$' # NOM, one-or-more ADJ (captured), opt non-PONCT tail
'[NOM ADJ]+ VRB' # one or more NOM-or-ADJ, then VRB
SEE ALSO
Chorus::Frame, Chorus::Collection::List, Chorus::Engine
AUTHOR
Christophe Ivorra
BUGS
Please report bugs via http://rt.cpan.org/NoAuth/Bugs.html?Dist=Chorus.
LICENSE AND COPYRIGHT
Copyright (C) 2013-2026 Christophe Ivorra.
This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.
See http://dev.perl.org/licenses/ for more information.