- NAME
- SYNOPSIS
- VERSION
- DESCRIPTION
- APACHE2 EXPRESSION
- ADVANCED APACHE2 EXPRESSION
- LEGACY
- CHANGES & CONTRIBUTIONS
- AUTHOR
- SEE ALSO
- COPYRIGHT & LICENSE
NAME
Regexp::Common::Apache2 - Apache2 Expressions
SYNOPSIS
use Regexp::Common qw( Apache2 );
use Regexp::Common::Apache2 qw( $ap_true $ap_false );
while( <> )
{
my $pos = pos( $_ );
/\G$RE{Apache2}{Word}/gmc and print "Found a word expression at pos $pos\n";
/\G$RE{Apache2}{Variable}/gmc and print "Found a variable $+{varname} at pos $pos\n";
}
# Override Apache2 expressions by the legacy ones
$RE{Apache2}{-legacy => 1}
# or use it with the Legacy prefix:
if( $str =~ /^$RE{Apache2}{LegacyVariable}$/ )
{
print( "Found variable $+{variable} with name $+{varname}\n" );
}
VERSION
v0.1.1
DESCRIPTION
This is the perl port of Apache2 expressions
The regular expressions have been designed based on Apache2 Backus-Naur Form (BNF) definition as described below in "APACHE2 EXPRESSION"
You can also use the extended pattern by calling Regexp::Common::Apache2 like:
$RE{Apache2}{-legacy => 1}
All of the regular expressions use named capture. See "%+" in perlvar for more information on named capture.
APACHE2 EXPRESSION
comp
BNF:
stringcomp
| integercomp
| unaryop word
| word binaryop word
| word "in" listfunc
| word "=~" regex
| word "!~" regex
| word "in" "{" words "}"
$RE{Apache2}{Comp}
For example:
"Jack" != "John"
123 -ne 456
# etc
This uses other expressions namely "stringcomp", "integercomp", "word", "listfunc", "regex", "words"
The capture names are:
- comp
-
Contains the entire capture block
- comp_binary
-
Matches the expression that uses a binary operator, such as:
==, =, !=, <, <=, >, >=, -ipmatch, -strmatch, -strcmatch, -fnmatch
- comp_binaryop
-
The binary op used if the expression is a binary comparison. Binary operator is:
==, =, !=, <, <=, >, >=, -ipmatch, -strmatch, -strcmatch, -fnmatch
- comp_integercomp
-
When the comparison is for an integer comparison as opposed to a string comparison.
- comp_list
-
Contains the list used to check a word against, such as:
"Jack" in {"John", "Peter", "Jack"}
- comp_listfunc
-
This contains the listfunc when the expressions contains a word checked against a list function, such as:
"Jack" in listMe("some arguments")
- comp_regexp
-
The regular expression used when a word is compared to a regular expression, such as:
"Jack" =~ /\w+/
Here, comp_regexp would contain
/\w+/
- comp_regexp_op
-
The regular expression operator used when a word is compared to a regular expression, such as:
"Jack" =~ /\w+/
Here, comp_regexp_op would contain
=~
- comp_stringcomp
-
When the comparison is for a string comparison as opposed to an integer comparison.
- comp_unary
-
Matches the expression that uses unary operator, such as:
-d, -e, -f, -s, -L, -h, -F, -U, -A, -n, -z, -T, -R
- comp_word
-
Contains the word that is the object of the comparison.
- comp_word_in_list
-
Contains the expression of a word checked against a list, such as:
"Jack" in {"John", "Peter", "Jack"}
- comp_word_in_listfunc
-
Contains the word when it is being compared to a listfunc, such as:
"Jack" in listMe("some arguments")
- comp_word_in_regexp
-
Contains the expression of a word checked against a regular expression, such as:
"Jack" =~ /\w+/
Here the word
Jack
(without the parenthesis) would be captured in comp_word - comp_worda
-
Contains the first word in comparison expression
- comp_wordb
-
Contains the second word in comparison expression
cond
BNF:
"true"
| "false"
| "!" cond
| cond "&&" cond
| cond "||" cond
| comp
| "(" cond ")"
$RE{Apache2}{Cond}
For example:
use Regexp::Common::Apache qw( $ap_true $ap_false );
($ap_false && $ap_true)
The capture names are:
- cond
-
Contains the entire capture block
- cond_and
-
Contains the expression like:
($ap_true && $ap_true)
- cond_false
-
Contains the false expression like:
($ap_false)
- cond_neg
-
Contains the expression if it is preceded by an exclamation mark, such as:
!$ap_true
- cond_or
-
Contains the expression like:
($ap_true || $ap_true)
- cond_true
-
Contains the true expression like:
($ap_true)
expr
BNF: cond | string
$RE{Apache2}{Expr}
The capture names are:
- expr
-
Contains the entire capture block
- expr_cond
-
Contains the expression of the condition
- expr_string
-
Contains the expression of a string
function
BNF: funcname "(" words ")"
$RE{Apache2}{Function}
For example:
base64("Some string")
The capture names are:
- function
-
Contains the entire capture block
- function_args
-
Contains the list of arguments. In the example above, this would be
Some string
- function_name
-
The name of the function . In the example above, this would be
base64
integercomp
BNF:
word "-eq" word | word "eq" word
| word "-ne" word | word "ne" word
| word "-lt" word | word "lt" word
| word "-le" word | word "le" word
| word "-gt" word | word "gt" word
| word "-ge" word | word "ge" word
$RE{Apache2}{IntegerComp}
For example:
123 -ne 456
789 gt 234
# etc
The hyphen before the operator is optional, so you can say eq
instead of -eq
The capture names are:
- stringcomp
-
Contains the entire capture block
- integercomp_op
-
Contains the comparison operator
- integercomp_worda
-
Contains the first word in the string comparison
- integercomp_wordb
-
Contains the second word in the string comparison
listfunc
BNF: listfuncname "(" words ")"
$RE{Apache2}{Function}
For example:
base64("Some string")
This is quite similar to the "function" regular expression
The capture names are:
- listfunc
-
Contains the entire capture block
- listfunc_args
-
Contains the list of arguments. In the example above, this would be
Some string
- listfunc_name
-
The name of the function . In the example above, this would be
base64
regex
BNF:
"/" regpattern "/" [regflags]
| "m" regsep regpattern regsep [regflags]
$RE{Apache2}{Regex}
For example:
/\w+/i
# or
m,\w+,i
The capture names are:
- regex
-
Contains the entire capture block
- regflags
-
The regula expression modifiers. See perlre
This can be any combination of:
i, s, m, g
- regpattern
-
Contains the regular expression. See perlre for example and explanation of how to use regular expression. Apache2 uses PCRE, i.e. perl compliant regular expressions.
- regsep
-
Contains the regular expression separator, which can be any of:
/, #, $, %, ^, |, ?, !, ', ", ",", ;, :, ".", _, -
string
BNF: substring | string substring
$RE{Apache2}{String}
For example:
URI accessed is: %{REQUEST_URI}
The capture names are:
- string
-
Contains the entire capture block
stringcomp
BNF:
word "==" word
| word "!=" word
| word "<" word
| word "<=" word
| word ">" word
| word ">=" word
$RE{Apache2}{StringComp}
For example:
"John" == "Jack"
sub(s/\w+/Jack/i, "John") != "Jack"
# etc
The capture names are:
- stringcomp
-
Contains the entire capture block
- stringcomp_op
-
Contains the comparison operator
- stringcomp_worda
-
Contains the first word in the string comparison
- stringcomp_wordb
-
Contains the second word in the string comparison
substring
BNF: cstring | variable
$RE{Apache2}{Substring}
For example:
Jack
# or
%{REQUEST_URI}
See "variable" and "word" regular expression for more on those.
The capture names are:
- substring
-
Contains the entire capture block
variable
BNF:
"%{" varname "}"
| "%{" funcname ":" funcargs "}"
$RE{Apache2}{Variable}
# or
$RE{Apache2}{LegacyVariable}
For example:
%{REQUEST_URI}
# or
%{md5:"some string"}
See "word" and "cond" regular expression for more on those.
The capture names are:
- variable
-
Contains the entire capture block
- var_cond
-
If this is a condition inside a variable, such as:
%{:$ap_true == $ap_false}
- var_func_args
-
Contains the function arguments.
- var_func_name
-
Contains the function name.
- var_word
-
A variable containing a word. See "word" for more information about word expressions.
- varname
-
Contains the variable name without the percent sign or dollar sign (if legacy regular expression is enabled) or the possible surrounding accolades
word
BNF:
digits
| "'" string "'"
| '"' string '"'
| word "." word
| variable
| function
| "(" word ")"
| rebackref
$RE{Apache2}{Word}
This is the most complex regular expression used, since it uses all the others and can recurse deeply
For example:
12
# or
"John"
# or
'Jack'
# or
%{REQUEST_URI}
# or
%{HTTP_HOST}.%{HTTP_PORT}
# or
md5("some string")
# or any word surrounded by parenthesis, such as:
("John")
See "string", "word", "variable", "sub", "join", "function" regular expression for more on those.
The capture names are:
- word
-
Contains the entire capture block
- word_digits
-
If the word is actually digits, thise contains those digits.
- word_dot_word
-
This contains the text when two words are separated by a dot.
- word_enclosed
-
Contains the value of the word enclosed by single or double quotes or by surrounding parenthesis.
- word_function
-
Contains the word containing a "function"
- word_quote
-
If the word is enclosed by single or double quote, this contains the single or double quote character
- word_variable
-
Contains the word containing a "variable"
words
BNF:
word
| word "," word
$RE{Apache2}{Words}
For example:
"Jack"
# or
"John", "Peter", "Paul"
See "word" and "list" regular expression for more on those.
The capture names are:
- words
-
Contains the entire capture block
- words_word
-
Contains the word
- words_list
-
Contains the list
ADVANCED APACHE2 EXPRESSION
comp
BNF:
stringcomp
| integercomp
| unaryop word
| word binaryop word
| word "in" listfunc
| word "=~" regex
| word "!~" regex
| word "in" "{" list "}"
$RE{Apache2}{TrunkComp}
For example:
"Jack" != "John"
123 -ne 456
# etc
This uses other expressions namely "stringcomp", "integercomp", "word", "listfunc", "regex", "list"
This is similar to the regular "comp" in "APACHE2 EXPRESSION", except it uses "list" instead of "words"
The capture names are:
- comp
-
Contains the entire capture block
- comp_binary
-
Matches the expression that uses a binary operator, such as:
==, =, !=, <, <=, >, >=, -ipmatch, -strmatch, -strcmatch, -fnmatch
- comp_binaryop
-
The binary op used if the expression is a binary comparison. Binary operator is:
==, =, !=, <, <=, >, >=, -ipmatch, -strmatch, -strcmatch, -fnmatch
- comp_integercomp
-
When the comparison is for an integer comparison as opposed to a string comparison.
- comp_list
-
Contains the list used to check a word against, such as:
"Jack" in {"John", "Peter", "Jack"}
- comp_listfunc
-
This contains the listfunc when the expressions contains a word checked against a list function, such as:
"Jack" in listMe("some arguments")
- comp_regexp
-
The regular expression used when a word is compared to a regular expression, such as:
"Jack" =~ /\w+/
Here, comp_regexp would contain
/\w+/
- comp_regexp_op
-
The regular expression operator used when a word is compared to a regular expression, such as:
"Jack" =~ /\w+/
Here, comp_regexp_op would contain
=~
- comp_stringcomp
-
When the comparison is for a string comparison as opposed to an integer comparison.
- comp_unary
-
Matches the expression that uses unary operator, such as:
-d, -e, -f, -s, -L, -h, -F, -U, -A, -n, -z, -T, -R
- comp_word
-
Contains the word that is the object of the comparison.
- comp_word_in_list
-
Contains the expression of a word checked against a list, such as:
"Jack" in {"John", "Peter", "Jack"}
- comp_word_in_listfunc
-
Contains the word when it is being compared to a listfunc, such as:
"Jack" in listMe("some arguments")
- comp_word_in_regexp
-
Contains the expression of a word checked against a regular expression, such as:
"Jack" =~ /\w+/
Here the word
Jack
(without the parenthesis) would be captured in comp_word - comp_worda
-
Contains the first word in comparison expression
- comp_wordb
-
Contains the second word in comparison expression
cond
BNF:
"true"
| "false"
| "!" cond
| cond "&&" cond
| cond "||" cond
| comp
| "(" cond ")"
$RE{Apache2}{TrunkCond}
Same as "cond" in "APACHE2 EXPRESSION"
expr
BNF: cond | string
$RE{Apache2}{TrunkExpr}
Same as "cond" in "APACHE2 EXPRESSION"
function
BNF: funcname "(" words ")"
$RE{Apache2}{TrunkFunction}
Same as "cond" in "APACHE2 EXPRESSION"
integercomp
BNF:
word "-eq" word | word "eq" word
| word "-ne" word | word "ne" word
| word "-lt" word | word "lt" word
| word "-le" word | word "le" word
| word "-gt" word | word "gt" word
| word "-ge" word | word "ge" word
$RE{Apache2}{TrunkIntegerComp}
Same as "cond" in "APACHE2 EXPRESSION"
join
BNF:
"join" ["("] list [")"]
| "join" ["("] list "," word [")"]
$RE{Apache2}{TrunkJoin}
For example:
join({"word1" "word2"})
# or
join({"word1" "word2"}, ', ')
The capture names are:
- join
-
Contains the entire capture block
- join_list
-
Contains the value of the list
- join_word
-
Contains the value for word used to join the list
list
BNF:
split
| listfunc
| "{" words "}"
| "(" list ")
$RE{Apache2}{TrunkList}
For example:
split( /\w+/, "Some string" )
# or
{"some", "words"}
# or
(split( /\w+/, "Some string" ))
# or
( {"some", "words"} )
This uses "split", "listfunc", words and "list"
The capture names are:
- list
-
Contains the entire capture block
- list_func
-
Contains the value if a "listfunc" is used
- list_list
-
Contains the value if this is a list embedded within parenthesis
- list_split
-
Contains the value if the list is based on a split
- list_words
-
Contains the value for a list of words.
listfunc
BNF: listfuncname "(" words ")"
$RE{Apache2}{TrunkFunction}
Same as "cond" in "APACHE2 EXPRESSION"
regany
BNF: regex | regsub
$RE{Apache2}{TrunkRegany}
For example:
/\w+/i
# or
m,\w+,i
This regular expression includes "regany" and "regsub"
The capture names are:
- regany
-
Contains the entire capture block
- regany_regex
-
Contains the regular expression. See "regex"
- regany_regsub
-
Contains the substitution regular expression. See "regsub"
regex
BNF:
"/" regpattern "/" [regflags]
| "m" regsep regpattern regsep [regflags]
$RE{Apache2}{TrunkRegex}
Same as "cond" in "APACHE2 EXPRESSION"
regsub
BNF: "s" regsep regpattern regsep string regsep [regflags]
$RE{Apache2}{TrunkRegsub}
For example:
s/\w+/John/gi
The capture names are:
- regflags
-
The modifiers used which can be any combination of:
i, s, m, g
See perlre for an explanation of their usage and meaning
- regstring
-
The string replacing the text found by the regular expression
- regsub
-
Contains the entire capture block
- regpattern
-
Contains the regular expression which is perl compliant since Apache2 uses PCRE.
- regsep
-
Contains the regular expression separator, which can be any of:
/, #, $, %, ^, |, ?, !, ', ", ",", ;, :, ".", _, -
split
BNF:
"split" ["("] regany "," list [")"]
| "split" ["("] regany "," word [")"]
$RE{Apache2}{TrunkSplit}
For example:
split( /\w+/, "Some string" )
This uses "regany", "list" and "word"
The capture names are:
- split
-
Contains the entire capture block
- split_regex
-
Contains the regular expression used for the split
- split_list
-
The list being split. It can also be a word. See below
- split_word
-
The word being split. It can also be a list. See above
string
BNF: substring | string substring
$RE{Apache2}{TrunkString}
Same as "cond" in "APACHE2 EXPRESSION"
stringcomp
BNF:
word "==" word
| word "!=" word
| word "<" word
| word "<=" word
| word ">" word
| word ">=" word
$RE{Apache2}{TrunkStringComp}
Same as "cond" in "APACHE2 EXPRESSION"
sub
BNF: "sub" ["("] regsub "," word [")"]
$RE{Apache2}{TrunkSub}
For example:
sub(s/\w/John/gi,"Peter")
The capture names are:
- sub
-
Contains the entire capture block
- sub_regsub
-
Contains the substitution expression, i.e. in the example above, this would be:
s/\w/John/gi
- sub_word
-
The target for the substitution. In the example above, this would be "Peter"
substring
BNF: cstring | variable
$RE{Apache2}{TrunkSubstring}
For example:
Jack
# or
%{REQUEST_URI}
# or
%{:sub(s/\b\w+\b/Peter/, "John"):}
See "variable" and "word" regular expression for more on those.
This is different from "substring" in "APACHE2 EXPRESSION" in that it does not include regular expression back reference, i.e. $1
, $2
, etc.
The capture names are:
- substring
-
Contains the entire capture block
variable
BNF:
"%{" varname "}"
| "%{" funcname ":" funcargs "}"
| "%{:" word ":}"
| "%{:" cond ":}"
| rebackref
$RE{Apache2}{TrunkVariable}
For example:
%{REQUEST_URI}
# or
%{md5:"some string"}
# or
%{:sub(s/\b\w+\b/Peter/, "John"):}
# or a reference to previous regular expression capture groups
$1, $2, etc..
See "word" and "cond" regular expression for more on those.
The capture names are:
- variable
-
Contains the entire capture block
- var_cond
-
If this is a condition inside a variable, such as:
%{:$ap_true == $ap_false}
- var_func_args
-
Contains the function arguments.
- var_func_name
-
Contains the function name.
- var_word
-
A variable containing a word. See "word" for more information about word expressions.
- varname
-
Contains the variable name without the percent sign or dollar sign (if legacy regular expression is enabled) or the possible surrounding accolades
word
BNF:
digits
| "'" string "'"
| '"' string '"'
| word "." word
| variable
| sub
| join
| function
| "(" word ")"
$RE{Apache2}{TrunkWord}
This is the most complex regular expression used, since it uses all the others and can recurse deeply
For example:
12
# or
"John"
# or
'Jack'
# or
%{REQUEST_URI}
# or
%{HTTP_HOST}.%{HTTP_PORT}
# or
%{:sub(s/\b\w+\b/Peter/, "John"):}
# or
sub(s,\w+,Paul,gi, "John")
# or
join({"Paul", "Peter"}, ', ')
# or
md5("some string")
# or any word surrounded by parenthesis, such as:
("John")
See "string", "word", "variable", "sub", "join", "function" regular expression for more on those.
The capture names are:
- word
-
Contains the entire capture block
- word_digits
-
If the word is actually digits, thise contains those digits.
- word_dot_word
-
This contains the text when two words are separated by a dot.
- word_enclosed
-
Contains the value of the word enclosed by single or double quotes or by surrounding parenthesis.
- word_function
-
Contains the word containing a "function"
- word_join
-
Contains the word containing a "join"
- word_quote
-
If the word is enclosed by single or double quote, this contains the single or double quote character
- word_sub
-
If the word is a substitution, this contains tha substitution
- word_variable
-
Contains the word containing a "variable"
words
BNF:
word
| word "," list
$RE{Apache2}{TrunkWords}
For example:
"Jack"
# or
"John", {"Peter", "Paul"}
# or
sub(s/\b\w+\b/Peter/, "John"), {"Peter", "Paul"}
See "word" and "list" regular expression for more on those.
It is different from "words" in "APACHE2 EXPRESSION" in that it uses "list" instead of "word"
The capture names are:
- words
-
Contains the entire capture block
- words_word
-
Contains the word
- words_list
-
Contains the list
LEGACY
There are 2 expressions that can be used as legacy:
- comp
-
See "comp"
- variable
-
See "variable"
CHANGES & CONTRIBUTIONS
Feel free to reach out to the author for possible corrections, improvements, or suggestions.
AUTHOR
Jacques Deguest <jack@deguest.jp>
SEE ALSO
https://httpd.apache.org/docs/current/expr.html and https://httpd.apache.org/docs/trunk/en/expr.html
COPYRIGHT & LICENSE
Copyright (c) 2020 DEGUEST Pte. Ltd.
You can use, copy, modify and redistribute this package and associated files under the same terms as Perl itself.