Author image Yuki Kimoto
and 1 contributors

NAME

SPVM::Regex - Regular expression

SYNOPSYS

  use Regex;
  
  # Pattern match
  {
    my $re = Regex->new("ab*c");
    my $target = "zabcz";
    my $match = $re->match($target, 0);
  }

  # Pattern match - UTF-8
  {
    my $re = Regex->new("あ+");
    my $target = "いあああい";
    my $match = $re->match($target, 0);
  }

  # Pattern match - Character class and the nagation
  {
    my $re = Regex->new("[A-Z]+[^A-Z]+");
    my $target = "ABCzab";
    my $match = $re->match($target, 0);
  }

  # Pattern match with captures
  {
    my $re = Regex->new("^(\w+) (\w+) (\w+)$");
    my $target = "abc1 abc2 abc3";
    my $match = $re->match($target, 0);
    
    if ($match) {
      my $cap1 = $re->captures->[0];
      my $cap2 = $re->captures->[1];
      my $cpa3 = $re->captures->[2];
    }
  }
  
  # Replace
  {
    my $re = Regex->new("abc");
    my $target = "ppzabcz";
    
    # "ppzABCz"
    my $result = $re->replace($target, 0, "ABC");
    
    my $replace_count = $re->replace_count;
  }

  # Replace with a callback and capture
  {
    my $re = Regex->new("a(bc)");
    my $target = "ppzabcz";
    
    # "ppzABbcCz"
    my $result = $re->replace_cb($target, 0, method : string ($re : Regex) {
      return "AB" . $re->captures->[0] . "C";
    });
  }

  # Replace all
  {
    my $re = Regex->new("abc");
    my $target = "ppzabczabcz";
    
    # "ppzABCzABCz"
    my $result = $re->replace_all($target, 0, "ABC");
  }

  # Replace all with a callback and capture
  {
    my $re = Regex->new("a(bc)");
    my $target = "ppzabczabcz";
    
    # "ppzABCbcPQRSzABCbcPQRSz"
    my $result = $re->replace_all_cb($target, 0, method : string ($re : Regex) {
      return "ABC" . $re->captures->[0] . "PQRS";
    });
  }

  # . - single line mode
  {
    my $re = Regex->new("(.+)", "s");
    my $target = "abc\ndef";
    
    my $match = $re->match($target, 0);
    
    unless ($match) {
      return 0;
    }
    
    unless ($re->captures->[0] eq "abc\ndef") {
      return 0;
    }
  }

DESCRIPTION

Regex provides regular expression functions.

This module is very unstable compared to other modules. So many changes will be performed.

REGULAR EXPRESSION SYNTAX

Regex provides the methodset of Perl regular expression. The target string and regex string is interpretted as UTF-8 string.

  # Quantifier
  +     more than or equals to 1 repeats
  *     more than or equals to 0 repeats
  ?     0 or 1 repeats
  {m,n} repeats between m and n
  
  # Regular expression character
  ^    first of string
  $    last of string
  .    all character except "\n"
  
  #    Default mode     ASCII mode
  \d   Not supported    [0-9]
  \D   Not supported    not \d
  \s   Not supported    " ", "\t", "\f", "\r", "\n"
  \S   Not supported    not \s
  \w   Not supported    [a-zA-Z0-9_]
  \W   Not supported    not \w
  
  # Character class and the negatiton
  [a-z0-9]
  [^a-z0-9]
  
  # Capture
  (foo)

Regex Options:

  s    single line mode
  a    ascii mode

Regex options is used by new_with_options method.

  my $re = Regex->new("^ab+c", "sa");

Limitations:

Regex do not support the same set of characters after a quantifier.

  # A exception occurs
  Regex->new("a*a");
  Regex->new("a?a");
  Regex->new("a+a");
  Regex->new("a{1,3}a")
      

If 0 width quantifir is between two same set of characters after a quantifier, it is invalid.

  # A exception occurs
  Regex->new("\d+\D*\d+");
  Regex->new("\d+\D?\d+");

CLASS METHODS

new

  static method new : Regex ($re_str_and_options : string[]...)

Create a new Regex object and compile the regex.

  my $re = Regex->new("^ab+c");
  my $re = Regex->new("^ab+c", "s");

new_with_options

  static method new_with_options : Regex ($re_str : string, $option_chars : string) {
  

Create a new Regex object and compile the regex with the options.

  my $re = Regex->new("^ab+c", "s");

INSTANCE METHODS

captures

  static method captures : string[] ()

Get the strings captured by "match" method.

match_start

  static method match_start : int ()

Get the start byte offset of the string matched by "match" method method.

match_length

  static method match_length : int ()

Get the byte length of the string matched by "match" method method.

replace_count

  static method replace_count : int ();

Get the replace count of the strings replaced by "replace" or "replace_all" method.

match

  method match : int ($target : string, $target_offset : int)

Execute pattern matching to the specific string and the start byte offset of the string.

If the pattern match succeeds, 1 is returned, otherwise 0 is returned.

You can get captured strings using "captures" method, and get the byte offset of the matched whole string using "match_start" method, and get the length of the matched whole string using "match_length" method.

replace

  method replace  : string ($target : string, $target_offset : int, $replace : string)

Replace the target string specified with the start byte offset with replace string.

replace_cb

  method replace_cb  : string ($target : string, $target_offset : int, $replace_cb : Regex::Replacer)

Replace the target string specified with the start byte offset with replace callback. The callback must have the Regex::Replacer interface..

replace_all

  method replace_all  : string ($target : string, $target_offset : int, $replace : string)

Replace all of the target strings specified with the start byte offset with replace string.

replace_all_cb

  method replace_all_cb  : string ($target : string, $target_offset : int, $replace_cb : Regex::Replacer)

Replace all of the target strings specified with the start byte offset with replace callback. The callback must have the Regex::Replacer interface.

cap1

  method cap1 : string ()

The alias for $re-captures->[0]>.

cap2

  method cap2 : string ()

The alias for $re-captures->[1]>.

cap3

  method cap3 : string ()

The alias for $re-captures->[2]>.

cap4

  method cap4 : string ()

The alias for $re-captures->[3]>.

cap5

  method cap5 : string ()

The alias for $re-captures->[4]>.

cap6

  method cap6 : string ()

The alias for $re-captures->[5]>.

cap7

  method cap7 : string ()

The alias for $re-captures->[6]>.

cap8

  method cap8 : string ()

The alias for $re-captures->[7]>.

cap9

  method cap9 : string ()

The alias for $re-captures->[8]>.

cap10

  method cap10 : string ()

The alias for $re-captures->[9]>.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 24:

Non-ASCII character seen before =encoding in 'Regex->new("あ+");'. Assuming UTF-8