The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Apache2::Expression - Apache2 Expressions

SYNOPSIS

    use Apache2::Expression;
    my $exp = Apache2::Expression->new( legacy => 1 );
    my $hash = $exp->parse;

VERSION

    v0.1.0

DESCRIPTION

Apache2::Expression is used to parse Apache2 expression like the one found in SSI (Server Side Includes).

METHODS

parse

This method takes a string representing an Apache2 expression as argument, and returns an hash containing the details of the elements that make the expression.

It takes an optional hash of parameters, as follows :

legacy

When this is provided with a positive value, this will enable Apache2 legacy regular expression. See Regexp::Common::Apache2 for more information on what this means.

trunk

When this is provided with a positive value, this will enable Apache2 experimental and advanced expressions. See Regexp::Common::Apache2 for more information on what this means.

For example :

    $HTTP_COOKIE = /lang\%22\%3A\%22([a-zA-Z]+\-[a-zA-Z]+)\%22\%7D;?/

would return :

    {
      elements => [
        {
          elements => [
            {
              elements => [
                {
                  elements => [],
                  name => "HTTP_COOKIE",
                  raw => "\$HTTP_COOKIE",
                  re => { variable => "\$HTTP_COOKIE", varname => "HTTP_COOKIE" },
                  subtype => "variable",
                  type => "variable",
                },
                {
                  elements => [],
                  flags => undef,
                  pattern => "lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?",
                  raw => "/lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?/",
                  re => {
                    regex => "/lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?/",
                    regpattern => "lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?",
                    regsep => "/",
                  },
                  sep => "/",
                  type => "regex",
                },
              ],
              op => "=",
              raw => "\$HTTP_COOKIE = /lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?/",
              re => {
                comp => "\$HTTP_COOKIE = /lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?/",
                comp_in_regexp_legacy => "\$HTTP_COOKIE = /lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?/",
                comp_regexp => "/lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?/",
                comp_regexp_op => "=",
                comp_word => "\$HTTP_COOKIE",
              },
              regexp => "/lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?/",
              subtype => "regexp",
              type => "comp",
              word => "\$HTTP_COOKIE",
            },
          ],
          raw => "\$HTTP_COOKIE = /lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?/",
          re => {
            cond => "\$HTTP_COOKIE = /lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?/",
            cond_comp => "\$HTTP_COOKIE = /lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?/",
          },
          subtype => "comp",
          type => "cond",
        },
      ],
      raw => "\$HTTP_COOKIE = /lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?/",
    }

The properties returned in the hash are:

elements

An array reference of sub elements contained which provides granular definition.

Whatever the elements array reference contains is defined in one of the types below.

name

The name of the element. For example if this is a function, this would be the function name, or if this is a variable, this would be the variable name without it leading dollar or percent sign nor its possible surrounding accolades.

raw

The raw string, or chunk of string that was processed.

re

This contains the hash of capture groups as provided by Regexp::Common::Apache2. It is made available to enable finer and granular control.

regexp
subtype

A sub type that provide more information about the type of expression processed.

This can be any of the type mentioned below plus the following ones : binary (for comparison), list (for word to list comparison), negative, parenthesis, rebackref, regexp, unary (for comparison)

See below for possible combinations.

type

The main type matching the Apache2 expression. This can be comp, cond, digits, function, integercomp, quote (for quoted words), regex, stringcomp, listfunc, variable, word

See below for possible combinations.

word

If this is a word, this contains the word. In th example above, $HTTP_COOKIE would be the word used in the regular expression comparison.

parse_args

Given a string that represents typically a function arguments, this method will use PPI to parse it and returns an array of parameters as string.

Parsing a function argument is non-trivial as it can contain function call within function call.

COMBINATIONS

comp

Type: comp

Possible sub types:

binary

When a binary operator is used, such as :

    ==, =, !=, <, <=, >, >=, -ipmatch, -strmatch, -strcmatch, -fnmatch

Example :

    192.168.2.10 -ipmatch 192.168.2/24

192.168.2.10 would be captured in property worda, ipmatch (without leading dash) would be captured in property op and 192.168.2/24 would be captured in property wordb.

The array reference in property elements will contain more information on worda and wordb

Also the details of elements for worda can be accessed with property worda_def as an array reference and likewise for wordb with wordb_def.

function

This contains the function name and arguments when the lefthand side word is compared to a list function.

For example :

    192.168.1.10 in split( /\,/, $ip_list )

In this example, 192.168.1.10 would be captured in word and split( /\,/, $ip_list ) would be captured in function with the array reference elements containing more information about the word and the function.

Also the details of elements for word can be accessed with property word_def as an array reference and likewise for function with function_def.

list

Is true when the comparison is of a word on the lefthand side to a list of words, such as :

    %{SOME_VALUE} in {"John", "Peter", "Paul"}

In this example, %{SOME_VALUE} would be captured in property word and "John", "Peter", "Paul" (without enclosing accolades or possible spaces after and before them) would be captured in property list

The array reference elements will possibly contain more information on word and each element in list

Also the details of elements for word can be accessed with property word_def as an array reference and likewise for list with list_def.

regexp

When the lefthand side word is being compared to a regular expression.

For example :

    %{HTTP_COOKIE} =~ /lang\%22\%3A\%22([a-zA-Z]+\-[a-zA-Z]+)\%22\%7D;?/

In this example, %{HTTP_COOKIE} would be captured in property word and /lang\%22\%3A\%22([a-zA-Z]+\-[a-zA-Z]+)\%22\%7D;?/ would be captured in property regexp and =~ would be captured in property op

Check the array reference in property elements for more details about the word and the regular expression in regexp.

Also the details of elements for word can be accessed with property word_def as an array reference and likewise for regexp with regexp_def.

unary

When the following operator is used against a word :

    -d, -e, -f, -s, -L, -h, -F, -U, -A, -n, -z, -T, -R

For example:

    -A /some/uri.html # (same as -U)
    -d /some/folder # file is a directory
    -e /some/folder/file.txt # file exists
    -f /some/folder/file.txt # file is a regular file
    -F /some/folder/file.txt # file is a regular file and is accessible to all (Apache2 does a sub query to check)
    -h /some/folder/link.txt # true if file is a symbolic link
    -n %{QUERY_STRING} # true if string is not empty (opposite of -z)
    -s /some/folder/file.txt # true if file is not empty
    -L /some/folder/link.txt # true if file is a symbolic link (same as -h)
    -R 192.168.1.1/24 # remote ip match this ip block; same as %{REMOTE_ADDR} -ipmatch 192.168.1.1/24
    -T %{HTTPS} # false if string is empty, "0", "off", "false", or "no" (case insensitive). True otherwise.
    -U /some/uri.html # check if the uri is accessible to all (Apache2 does a sub query to check)
    -z %{QUERY_STRING} # true if string is empty (opposite of -n)

In this example -e /some/folder/file.txt, e (without leading dash) would be captured in op and /some/folder/file.txt would be captured in word

Check the array reference in property elements for more information about the word in word

Also the details of elements for word can be accessed with property word_def as an array reference.

See here for more information: Regexp::Common::Apache2::comp

Available properties:

op

Contains the operator used. See Regexp::Common::Apache2::comp, "stringcomp" in Regexp::Common::Apache2 and "integercomp" in Regexp::Common::Apache2

This may be for unary operators :

    -d, -e, -f, -s, -L, -h, -F, -U, -A, -n, -z, -T, -R

For binary operators :

    ==, =, !=, <, <=, >, >=, -ipmatch, -strmatch, -strcmatch, -fnmatch

For integer comparison :

    -eq, -ne, -lt, -le, -gt, -ge

For string comparison :

    ==, !=, <, <=, >, >=

In all the possible operators above, op contains the value, but without the leading dash, if any.

word

The word being compared.

worda

The first word being compared, and on the left of the operator. For example :

    12 -ne 10
wordb

The second word, being compared to, and on the right of the operator.

See "comp" in Regexp::Common::Apache2 for more information.

cond

Type: cond

Possible sub types:

and

When the condition is an ANDed expression such as :

    $ap_true && $ap_false

In this case, $ap_true would be captured in property expr1 and $ap_false would be captured in property expr2

Also the details of elements for the variable can be accessed with property and_def as an array reference and and_expr1_def and and_expr2_def

comp

Contains the expression when the condition is actually a comparison.

This will recurse and you can see more information in the array reference in the property elements. For more information on what it will contain, check the comp type.

cond

Default sub type

negative

When the condition is negative, ie prefixed by an exclamation mark.

For example :

    !-z /some/folder/file.txt

You need to check for the details in array reference contained in property elements

Also the details of elements for the variable can be accessed with property negative_def as an array reference.

or

When the condition is an ORed expression such as :

    $ap_true || $ap_false

In this case, $ap_true would be captured in property expr1 and $ap_false would be captured in property expr2

Also the details of elements for the variable can be accessed with property and_def as an array reference and and_expr1_def and and_expr2_def

parenthesis

When the condition is embedded within parenthesis

You need to check the array reference in property elements for information about the embedded condition.

Also the details of elements for the variable can be accessed with property parenthesis_def as an array reference.

variable

Contains the expression when the condition is based on a variable, such as :

    %{REQUEST_URI}

Check the array reference in property elements for more details about the variable, especially the property name which would contain the name of the variable; in this case : REQUEST_URI

Also the details of elements for the variable can be accessed with property variable_def as an array reference.

Available properties:

args

Function arguments. See the content of the elements array reference for more breakdown on the arguments provided.

is_negative

If the condition is negative, this value is true

name

Function name

See "cond" in Regexp::Common::Apache2 for more information.

function

Type: function

Possible sub types: none

Available properties:

args

Function arguments. See the content of the elements array reference for more breakdown on the arguments provided.

Also the details of elements for those args can be accessed with property args_def as an array reference.

name

Function name

See "function" in Regexp::Common::Apache2 for more information.

integercomp

Type: integercomp

Possible sub types: none

Available properties:

op

Contains the operator used. See "integercomp" in Regexp::Common::Apache2

worda

The first word being compared, and on the left of the operator. For example :

    12 -ne 10

Also the details of elements for worda can be accessed with property worda_def as an array reference.

wordb

The second word, being compared to, and on the right of the operator.

Also the details of elements for wordb can be accessed with property wordb_def as an array reference.

See "integercomp" in Regexp::Common::Apache2 for more information.

join

Type: join

Possible sub types: none

Available properties:

list

The list of strings to be joined. See the content of the elements array reference for more breakdown on the arguments provided.

Also the details of elements for those args can be accessed with property list_def as an array reference.

word

The word used to join the list. This parameter is optional.

Details for the word parameter, if any, can be found in the elements array reference or can be accessed with the word_def property.

For example :

    join({"John Paul Doe"}, ', ')
    # or
    join({"John", "Paul", "Doe"}, ', ')
    # or just
    join({"John", "Paul", "Doe"})

See "join" in Regexp::Common::Apache2 for more information.

listfunc

Type: listfunc

Possible sub types: none

Available properties:

args

Function arguments. See the content of the elements array reference for more breakdown on the arguments provided.

Also the details of elements for those args can be accessed with property args_def as an array reference.

name

Function name

See "listfunc" in Regexp::Common::Apache2 for more information.

regex

Type: regex

Possible sub types: none

Available properties:

flags

Example: mgis

pattern

Regular expression pattern, excluding enclosing separators.

sep

Type of separators used. It can be: /, #, $, %, ^, |, ?, !, ', ", ",", ";", ":", ".", _, and -

See "regex" in Regexp::Common::Apache2 for more information.

stringcomp

Type: stringcomp

Possible sub types: none

Available properties:

op

COntains the operator used. See "stringcomp" in Regexp::Common::Apache2

worda

The first word being compared, and on the left of the operator. For example :

    12 -ne 10

Also the details of elements for worda can be accessed with property worda_def as an array reference.

wordb

The second word, being compared to, and on the right of the operator.

Also the details of elements for wordb can be accessed with property wordb_def as an array reference.

See "stringcomp" in Regexp::Common::Apache2 for more information.

variable

Type: variable

Possible sub types:

function
    %{md5:"some arguments"}
rebackref

This is a regular expression back reference, such as $1, $2, etc. up to 9

variable
    %{REQUEST_URI}
    # or by enabling the legacy expressions
    ${REQUEST_URI}

Available properties:

args

Function arguments. See the content of the elements array reference for more breakdown on the arguments provided.

name

Function name, or variable name.

value

The regular expression back reference value, such as 1, 2, etc

See "variable" in Regexp::Common::Apache2 for more information.

word

Type: word

Possible sub types:

digits

When the word contains one or more digits.

dotted

When the word contains words sepsrated by dots, such as 192.168.1.10

function

When the word is a function.

parens

When the word is surrounded by parenthesis

quote

When the word is surrounded by single or double quotes

rebackref

When the word is a regular expression back reference such as $1, $2, etc up to 9.

regex

This is an extension I added to make work some function such as split( /\w+/, $ip_list)

Without it, the regular expression would not be recognised as the Apache BNF stands.

variable

When the word is a variable. For example : %{REQUEST_URI}, and it can also be a variable like ${REQUEST_URI if the legacy mode is enabled.

Available properties:

flags

The regular expression flags used, such as mgis

parens

Contains an array reference of the open and close parenthesis, such as:

    ["(", ")"]
pattern

The regular expression pattern

quote

Contains the type of quote used if the sub type is quote

regex

Contains the regular expression

sep

The separator used in the regular expression, such as /

value

The value of the digits if the sub type is digits or rebackref

word

The word enclosed in quotes

See "variable" in Regexp::Common::Apache2 for more information.

CAVEAT

This module supports well Apache2 expressions. However, some expression are difficult to process. For example:

Expressions with functions not using enclosing parenthesis:

    %{REMOTE_ADDR} -in split s/.*?IP Address:([^,]+)/$1/, PeerExtList('subjectAltName')

Instead, use:

    %{REMOTE_ADDR} -in split(s/.*?IP Address:([^,]+)/$1/, PeerExtList('subjectAltName'))

There is no mechanism yet to prevent infinite recursion. This needs to be implemented.

CHANGES & CONTRIBUTIONS

Feel free to reach out to the author for possible corrections, improvements, or suggestions.

AUTHOR

Jacques Deguest <jack@deguest.jp>

SEE ALSO

Apache2::SSI, Regexp::Common::Apache2, https://httpd.apache.org/docs/current/expr.html

COPYRIGHT & LICENSE

Copyright (c) 2020 DEGUEST Pte. Ltd.

You can use, copy, modify and redistribute this package and associated files under the same terms as Perl itself.