Regexp::Optimizer - optimizes regular expressions


  use Regexp::Optimizer;
  my $o  = Regexp::Optimizer->new;
  my $re = $o->optimize(qr/foobar|fooxar|foozap/);
  # $re is now qr/foo(?:[bx]ar|zap)/


This module does, ahem, attempts to, optimize regular expressions.


To install this module type the following:

   perl Makefile.PL
   make test
   make install


Here is a quote from perltodo.

    Factoring out common suffices/prefices in regexps (trie optimization)

    Currently, the user has to optimize "foo|far" and "foo|goo" into "f(?:oo|ar)" and "[fg]oo" by hand; this could be done automatically.

This module implements just that.


Since this is an OO module there is no symbol exported.


This module is implemented as a subclass of Regexp::List. For methods not listed here, see Regexp::List.

$o = Regexp::Optimizer->new;
$o->set(key => value, ...)

Just the same us Regexp::List except for the attribute below;


When set to one, $o->optimize() tries to $o->expand before actually starting the operation.

  # cases you need to set expand => 1
  $o->set(expand => 1)->optimize(qr/
$re = $o->optimize(regexp);

Does the job. Note that unlike ->list2re() in Regexp::List, the argument is the regular expression itself. What it basically does is to find groups will alterations and replace it with the result of $o->list2re.

$re = $o->list2re(list of words ...)

Same as list2re() in Regexp::List in terms of functionality but how it tokenize "atoms" is different since the arguments can be regular expressions, not just strings. Here is a brief example.

  my @expr = qw/foobar fooba+/;
  Regexp::List->new->list2re(@expr) eq qr/fooba[\+r]/;
  Regexp::Optimizer->new->list2re(@expr) eq qr/foob(?:a+ar)/;


This module is still experimental. Do not assume that the result is the same as the unoptimized version.

  • When you just want a regular expression which matches normal words with not metacharacters, use <Regexp::List>. It's more robus and much faster.

  • When you have a list of regular expessions which you want to aggregate, use list2re of THIS MODULE.

  • Use ->optimize() when and only when you already have a big regular expression with alterations therein.

    ->optimize() does support nested groups but its parser is not tested very well.


  • Regex parser in this module (which itself is implemented by regular expression) is not as thoroughly tested as Regexp::List

  • May still fall into deep recursion when you attempt to optimize deeply nested regexp. See "PRACTICALITY".

  • Does not grok (?{expression}) and (?(cond)yes|no) constructs yet

  • You need to escape characters in character classes.

      $o->optimize(qr/[a-z()]|[A-Z]/);              # wrong
      $o->optimize(qr/[a-z\(\)]|[A-Z]/);            # right
      $o->optimize(qr/[0-9A-Za-z]|[\Q-_.!~*"'()\E]/ # right, too. 
  • When character(?: class(?:es)?)? are aggregated, duplicate ranges are left as is. Though functionally OK, it is cosmetically ugly.

      # simply turns into [0-5][5-9]0123456789] not [0-9]

    I left it that way because marking-rearranging approach can result a humongous result when unicode characters are concerned (and \p{Properties}).


Though this module is still experimental, It is still good enough even for such deeply nested regexes as the followng.

  # See 3.2.2 of
  # BNF faithfully turned into a regex

  # and optimized

By carefully examine both you can find that character classes are properly aggregated.


Regexp::List -- upon which this module is based

eg/ directory in this package contains example scripts.

Perl standard documents
 L<perltodo>, L<perlre>
CPAN Modules

Regexp::Presuf, Text::Trie


Mastering Regular Expressions


Dan Kogai <>


Copyright 2003 by Dan Kogai

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.