The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Regexp::Common::Apache2 - Apache2 Expressions

SYNOPSIS

    use Regexp::Common qw( Apache2 );
    use Regexp::Common::Apache2 qw( $ap_true $ap_false );

    while( <> )
    {
        my $pos = pos( $_ );
        /\G$RE{Apache2}{Word}/gmc      and  print "Found a word expression at pos $pos\n";
        /\G$RE{Apache2}{Variable}/gmc  and  print "Found a variable $+{varname} at pos $pos\n";
    }

    # Override Apache2 expressions by the legacy ones
    $RE{Apache2}{-legacy => 1}
    # or use it with the Legacy prefix:
    if( $str =~ /^$RE{Apache2}{LegacyVariable}$/ )
    {
        print( "Found variable $+{variable} with name $+{varname}\n" );
    }

VERSION

    v0.1.1

DESCRIPTION

This is the perl port of Apache2 expressions

The regular expressions have been designed based on Apache2 Backus-Naur Form (BNF) definition as described below in "APACHE2 EXPRESSION"

You can also use the extended pattern by calling Regexp::Common::Apache2 like:

    $RE{Apache2}{-legacy => 1}

All of the regular expressions use named capture. See "%+" in perlvar for more information on named capture.

APACHE2 EXPRESSION

comp

BNF:

    stringcomp
    | integercomp
    | unaryop word
    | word binaryop word
    | word "in" listfunc
    | word "=~" regex
    | word "!~" regex
    | word "in" "{" words "}"

    $RE{Apache2}{Comp}

For example:

    "Jack" != "John"
    123 -ne 456
    # etc

This uses other expressions namely "stringcomp", "integercomp", "word", "listfunc", "regex", "words"

The capture names are:

comp

Contains the entire capture block

comp_binary

Matches the expression that uses a binary operator, such as:

    ==, =, !=, <, <=, >, >=, -ipmatch, -strmatch, -strcmatch, -fnmatch
comp_binaryop

The binary op used if the expression is a binary comparison. Binary operator is:

    ==, =, !=, <, <=, >, >=, -ipmatch, -strmatch, -strcmatch, -fnmatch
comp_integercomp

When the comparison is for an integer comparison as opposed to a string comparison.

comp_list

Contains the list used to check a word against, such as:

    "Jack" in {"John", "Peter", "Jack"}
comp_listfunc

This contains the listfunc when the expressions contains a word checked against a list function, such as:

    "Jack" in listMe("some arguments")
comp_regexp

The regular expression used when a word is compared to a regular expression, such as:

    "Jack" =~ /\w+/

Here, comp_regexp would contain /\w+/

comp_regexp_op

The regular expression operator used when a word is compared to a regular expression, such as:

    "Jack" =~ /\w+/

Here, comp_regexp_op would contain =~

comp_stringcomp

When the comparison is for a string comparison as opposed to an integer comparison.

comp_unary

Matches the expression that uses unary operator, such as:

    -d, -e, -f, -s, -L, -h, -F, -U, -A, -n, -z, -T, -R
comp_word

Contains the word that is the object of the comparison.

comp_word_in_list

Contains the expression of a word checked against a list, such as:

    "Jack" in {"John", "Peter", "Jack"}
comp_word_in_listfunc

Contains the word when it is being compared to a listfunc, such as:

    "Jack" in listMe("some arguments")
comp_word_in_regexp

Contains the expression of a word checked against a regular expression, such as:

    "Jack" =~ /\w+/

Here the word Jack (without the parenthesis) would be captured in comp_word

comp_worda

Contains the first word in comparison expression

comp_wordb

Contains the second word in comparison expression

cond

BNF:

    "true"
    | "false"
    | "!" cond
    | cond "&&" cond
    | cond "||" cond
    | comp
    | "(" cond ")"

    $RE{Apache2}{Cond}

For example:

    use Regexp::Common::Apache qw( $ap_true $ap_false );
    ($ap_false && $ap_true)

The capture names are:

cond

Contains the entire capture block

cond_and

Contains the expression like:

    ($ap_true && $ap_true)
cond_false

Contains the false expression like:

    ($ap_false)
cond_neg

Contains the expression if it is preceded by an exclamation mark, such as:

    !$ap_true
cond_or

Contains the expression like:

    ($ap_true || $ap_true)
cond_true

Contains the true expression like:

    ($ap_true)

expr

BNF: cond | string

    $RE{Apache2}{Expr}

The capture names are:

expr

Contains the entire capture block

expr_cond

Contains the expression of the condition

expr_string

Contains the expression of a string

function

BNF: funcname "(" words ")"

    $RE{Apache2}{Function}

For example:

    base64("Some string")

The capture names are:

function

Contains the entire capture block

function_args

Contains the list of arguments. In the example above, this would be Some string

function_name

The name of the function . In the example above, this would be base64

integercomp

BNF:

    word "-eq" word | word "eq" word
    | word "-ne" word | word "ne" word
    | word "-lt" word | word "lt" word
    | word "-le" word | word "le" word
    | word "-gt" word | word "gt" word
    | word "-ge" word | word "ge" word

    $RE{Apache2}{IntegerComp}

For example:

    123 -ne 456
    789 gt 234
    # etc

The hyphen before the operator is optional, so you can say eq instead of -eq

The capture names are:

stringcomp

Contains the entire capture block

integercomp_op

Contains the comparison operator

integercomp_worda

Contains the first word in the string comparison

integercomp_wordb

Contains the second word in the string comparison

listfunc

BNF: listfuncname "(" words ")"

    $RE{Apache2}{Function}

For example:

    base64("Some string")

This is quite similar to the "function" regular expression

The capture names are:

listfunc

Contains the entire capture block

listfunc_args

Contains the list of arguments. In the example above, this would be Some string

listfunc_name

The name of the function . In the example above, this would be base64

regex

BNF:

    "/" regpattern "/" [regflags]
    | "m" regsep regpattern regsep [regflags]

    $RE{Apache2}{Regex}

For example:

    /\w+/i
    # or
    m,\w+,i

The capture names are:

regex

Contains the entire capture block

regflags

The regula expression modifiers. See perlre

This can be any combination of:

    i, s, m, g
regpattern

Contains the regular expression. See perlre for example and explanation of how to use regular expression. Apache2 uses PCRE, i.e. perl compliant regular expressions.

regsep

Contains the regular expression separator, which can be any of:

    /, #, $, %, ^, |, ?, !, ', ", ",", ;, :, ".", _, -

string

BNF: substring | string substring

    $RE{Apache2}{String}

For example:

    URI accessed is: %{REQUEST_URI}

The capture names are:

string

Contains the entire capture block

stringcomp

BNF:

    word "==" word
    | word "!=" word
    | word "<"  word
    | word "<=" word
    | word ">"  word
    | word ">=" word

    $RE{Apache2}{StringComp}

For example:

    "John" == "Jack"
    sub(s/\w+/Jack/i, "John") != "Jack"
    # etc

The capture names are:

stringcomp

Contains the entire capture block

stringcomp_op

Contains the comparison operator

stringcomp_worda

Contains the first word in the string comparison

stringcomp_wordb

Contains the second word in the string comparison

substring

BNF: cstring | variable

    $RE{Apache2}{Substring}

For example:

    Jack
    # or
    %{REQUEST_URI}

See "variable" and "word" regular expression for more on those.

The capture names are:

substring

Contains the entire capture block

variable

BNF:

    "%{" varname "}"
    | "%{" funcname ":" funcargs "}"

    $RE{Apache2}{Variable}
    # or
    $RE{Apache2}{LegacyVariable}

For example:

    %{REQUEST_URI}
    # or
    %{md5:"some string"}

See "word" and "cond" regular expression for more on those.

The capture names are:

variable

Contains the entire capture block

var_cond

If this is a condition inside a variable, such as:

    %{:$ap_true == $ap_false}
var_func_args

Contains the function arguments.

var_func_name

Contains the function name.

var_word

A variable containing a word. See "word" for more information about word expressions.

varname

Contains the variable name without the percent sign or dollar sign (if legacy regular expression is enabled) or the possible surrounding accolades

word

BNF:

    digits
    | "'" string "'"
    | '"' string '"'
    | word "." word
    | variable
    | function
    | "(" word ")"
    | rebackref

    $RE{Apache2}{Word}

This is the most complex regular expression used, since it uses all the others and can recurse deeply

For example:

    12
    # or
    "John"
    # or
    'Jack'
    # or
    %{REQUEST_URI}
    # or
    %{HTTP_HOST}.%{HTTP_PORT}
    # or
    md5("some string")
    # or any word surrounded by parenthesis, such as:
    ("John")

See "string", "word", "variable", "sub", "join", "function" regular expression for more on those.

The capture names are:

word

Contains the entire capture block

word_digits

If the word is actually digits, thise contains those digits.

word_dot_word

This contains the text when two words are separated by a dot.

word_enclosed

Contains the value of the word enclosed by single or double quotes or by surrounding parenthesis.

word_function

Contains the word containing a "function"

word_quote

If the word is enclosed by single or double quote, this contains the single or double quote character

word_variable

Contains the word containing a "variable"

words

BNF:

    word
    | word "," word

    $RE{Apache2}{Words}

For example:

    "Jack"
    # or
    "John", "Peter", "Paul"

See "word" and "list" regular expression for more on those.

The capture names are:

words

Contains the entire capture block

words_word

Contains the word

words_list

Contains the list

ADVANCED APACHE2 EXPRESSION

comp

BNF:

    stringcomp
    | integercomp
    | unaryop word
    | word binaryop word
    | word "in" listfunc
    | word "=~" regex
    | word "!~" regex
    | word "in" "{" list "}"

    $RE{Apache2}{TrunkComp}

For example:

    "Jack" != "John"
    123 -ne 456
    # etc

This uses other expressions namely "stringcomp", "integercomp", "word", "listfunc", "regex", "list"

This is similar to the regular "comp" in "APACHE2 EXPRESSION", except it uses "list" instead of "words"

The capture names are:

comp

Contains the entire capture block

comp_binary

Matches the expression that uses a binary operator, such as:

    ==, =, !=, <, <=, >, >=, -ipmatch, -strmatch, -strcmatch, -fnmatch
comp_binaryop

The binary op used if the expression is a binary comparison. Binary operator is:

    ==, =, !=, <, <=, >, >=, -ipmatch, -strmatch, -strcmatch, -fnmatch
comp_integercomp

When the comparison is for an integer comparison as opposed to a string comparison.

comp_list

Contains the list used to check a word against, such as:

    "Jack" in {"John", "Peter", "Jack"}
comp_listfunc

This contains the listfunc when the expressions contains a word checked against a list function, such as:

    "Jack" in listMe("some arguments")
comp_regexp

The regular expression used when a word is compared to a regular expression, such as:

    "Jack" =~ /\w+/

Here, comp_regexp would contain /\w+/

comp_regexp_op

The regular expression operator used when a word is compared to a regular expression, such as:

    "Jack" =~ /\w+/

Here, comp_regexp_op would contain =~

comp_stringcomp

When the comparison is for a string comparison as opposed to an integer comparison.

comp_unary

Matches the expression that uses unary operator, such as:

    -d, -e, -f, -s, -L, -h, -F, -U, -A, -n, -z, -T, -R
comp_word

Contains the word that is the object of the comparison.

comp_word_in_list

Contains the expression of a word checked against a list, such as:

    "Jack" in {"John", "Peter", "Jack"}
comp_word_in_listfunc

Contains the word when it is being compared to a listfunc, such as:

    "Jack" in listMe("some arguments")
comp_word_in_regexp

Contains the expression of a word checked against a regular expression, such as:

    "Jack" =~ /\w+/

Here the word Jack (without the parenthesis) would be captured in comp_word

comp_worda

Contains the first word in comparison expression

comp_wordb

Contains the second word in comparison expression

cond

BNF:

    "true"
    | "false"
    | "!" cond
    | cond "&&" cond
    | cond "||" cond
    | comp
    | "(" cond ")"

    $RE{Apache2}{TrunkCond}

Same as "cond" in "APACHE2 EXPRESSION"

expr

BNF: cond | string

    $RE{Apache2}{TrunkExpr}

Same as "cond" in "APACHE2 EXPRESSION"

function

BNF: funcname "(" words ")"

    $RE{Apache2}{TrunkFunction}

Same as "cond" in "APACHE2 EXPRESSION"

integercomp

BNF:

    word "-eq" word | word "eq" word
    | word "-ne" word | word "ne" word
    | word "-lt" word | word "lt" word
    | word "-le" word | word "le" word
    | word "-gt" word | word "gt" word
    | word "-ge" word | word "ge" word

    $RE{Apache2}{TrunkIntegerComp}

Same as "cond" in "APACHE2 EXPRESSION"

join

BNF:

    "join" ["("] list [")"]
    | "join" ["("] list "," word [")"]

    $RE{Apache2}{TrunkJoin}

For example:

    join({"word1" "word2"})
    # or
    join({"word1" "word2"}, ', ')

This uses "list" and "word"

The capture names are:

join

Contains the entire capture block

join_list

Contains the value of the list

join_word

Contains the value for word used to join the list

list

BNF:

    split
    | listfunc
    | "{" words "}"
    | "(" list ")

    $RE{Apache2}{TrunkList}

For example:

    split( /\w+/, "Some string" )
    # or
    {"some", "words"}
    # or
    (split( /\w+/, "Some string" ))
    # or
    ( {"some", "words"} )

This uses "split", "listfunc", words and "list"

The capture names are:

list

Contains the entire capture block

list_func

Contains the value if a "listfunc" is used

list_list

Contains the value if this is a list embedded within parenthesis

list_split

Contains the value if the list is based on a split

list_words

Contains the value for a list of words.

listfunc

BNF: listfuncname "(" words ")"

    $RE{Apache2}{TrunkFunction}

Same as "cond" in "APACHE2 EXPRESSION"

regany

BNF: regex | regsub

    $RE{Apache2}{TrunkRegany}

For example:

    /\w+/i
    # or
    m,\w+,i

This regular expression includes "regany" and "regsub"

The capture names are:

regany

Contains the entire capture block

regany_regex

Contains the regular expression. See "regex"

regany_regsub

Contains the substitution regular expression. See "regsub"

regex

BNF:

    "/" regpattern "/" [regflags]
    | "m" regsep regpattern regsep [regflags]

    $RE{Apache2}{TrunkRegex}

Same as "cond" in "APACHE2 EXPRESSION"

regsub

BNF: "s" regsep regpattern regsep string regsep [regflags]

    $RE{Apache2}{TrunkRegsub}

For example:

    s/\w+/John/gi

The capture names are:

regflags

The modifiers used which can be any combination of:

    i, s, m, g

See perlre for an explanation of their usage and meaning

regstring

The string replacing the text found by the regular expression

regsub

Contains the entire capture block

regpattern

Contains the regular expression which is perl compliant since Apache2 uses PCRE.

regsep

Contains the regular expression separator, which can be any of:

    /, #, $, %, ^, |, ?, !, ', ", ",", ;, :, ".", _, -

split

BNF:

    "split" ["("] regany "," list [")"]
    | "split" ["("] regany "," word [")"]

    $RE{Apache2}{TrunkSplit}

For example:

    split( /\w+/, "Some string" )

This uses "regany", "list" and "word"

The capture names are:

split

Contains the entire capture block

split_regex

Contains the regular expression used for the split

split_list

The list being split. It can also be a word. See below

split_word

The word being split. It can also be a list. See above

string

BNF: substring | string substring

    $RE{Apache2}{TrunkString}

Same as "cond" in "APACHE2 EXPRESSION"

stringcomp

BNF:

    word "==" word
    | word "!=" word
    | word "<"  word
    | word "<=" word
    | word ">"  word
    | word ">=" word

    $RE{Apache2}{TrunkStringComp}

Same as "cond" in "APACHE2 EXPRESSION"

sub

BNF: "sub" ["("] regsub "," word [")"]

    $RE{Apache2}{TrunkSub}

For example:

    sub(s/\w/John/gi,"Peter")

The capture names are:

sub

Contains the entire capture block

sub_regsub

Contains the substitution expression, i.e. in the example above, this would be:

    s/\w/John/gi
sub_word

The target for the substitution. In the example above, this would be "Peter"

substring

BNF: cstring | variable

    $RE{Apache2}{TrunkSubstring}

For example:

    Jack
    # or
    %{REQUEST_URI}
    # or
    %{:sub(s/\b\w+\b/Peter/, "John"):}

See "variable" and "word" regular expression for more on those.

This is different from "substring" in "APACHE2 EXPRESSION" in that it does not include regular expression back reference, i.e. $1, $2, etc.

The capture names are:

substring

Contains the entire capture block

variable

BNF:

    "%{" varname "}"
    | "%{" funcname ":" funcargs "}"
    | "%{:" word ":}"
    | "%{:" cond ":}"
    | rebackref

    $RE{Apache2}{TrunkVariable}

For example:

    %{REQUEST_URI}
    # or
    %{md5:"some string"}
    # or
    %{:sub(s/\b\w+\b/Peter/, "John"):}
    # or a reference to previous regular expression capture groups
    $1, $2, etc..

See "word" and "cond" regular expression for more on those.

The capture names are:

variable

Contains the entire capture block

var_cond

If this is a condition inside a variable, such as:

    %{:$ap_true == $ap_false}
var_func_args

Contains the function arguments.

var_func_name

Contains the function name.

var_word

A variable containing a word. See "word" for more information about word expressions.

varname

Contains the variable name without the percent sign or dollar sign (if legacy regular expression is enabled) or the possible surrounding accolades

word

BNF:

    digits
    | "'" string "'"
    | '"' string '"'
    | word "." word
    | variable
    | sub
    | join
    | function
    | "(" word ")"

    $RE{Apache2}{TrunkWord}

This is the most complex regular expression used, since it uses all the others and can recurse deeply

For example:

    12
    # or
    "John"
    # or
    'Jack'
    # or
    %{REQUEST_URI}
    # or
    %{HTTP_HOST}.%{HTTP_PORT}
    # or
    %{:sub(s/\b\w+\b/Peter/, "John"):}
    # or
    sub(s,\w+,Paul,gi, "John")
    # or
    join({"Paul", "Peter"}, ', ')
    # or
    md5("some string")
    # or any word surrounded by parenthesis, such as:
    ("John")

See "string", "word", "variable", "sub", "join", "function" regular expression for more on those.

The capture names are:

word

Contains the entire capture block

word_digits

If the word is actually digits, thise contains those digits.

word_dot_word

This contains the text when two words are separated by a dot.

word_enclosed

Contains the value of the word enclosed by single or double quotes or by surrounding parenthesis.

word_function

Contains the word containing a "function"

word_join

Contains the word containing a "join"

word_quote

If the word is enclosed by single or double quote, this contains the single or double quote character

word_sub

If the word is a substitution, this contains tha substitution

word_variable

Contains the word containing a "variable"

words

BNF:

    word
    | word "," list

    $RE{Apache2}{TrunkWords}

For example:

    "Jack"
    # or
    "John", {"Peter", "Paul"}
    # or
    sub(s/\b\w+\b/Peter/, "John"), {"Peter", "Paul"}

See "word" and "list" regular expression for more on those.

It is different from "words" in "APACHE2 EXPRESSION" in that it uses "list" instead of "word"

The capture names are:

words

Contains the entire capture block

words_word

Contains the word

words_list

Contains the list

LEGACY

There are 2 expressions that can be used as legacy:

comp

See "comp"

variable

See "variable"

CHANGES & CONTRIBUTIONS

Feel free to reach out to the author for possible corrections, improvements, or suggestions.

AUTHOR

Jacques Deguest <jack@deguest.jp>

SEE ALSO

https://httpd.apache.org/docs/current/expr.html and https://httpd.apache.org/docs/trunk/en/expr.html

COPYRIGHT & LICENSE

Copyright (c) 2020 DEGUEST Pte. Ltd.

You can use, copy, modify and redistribute this package and associated files under the same terms as Perl itself.