The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

PyFrame Guide to wxPython

Copyright and License information Home

__ A B C D E F G H I L M P R S T U V W

wxStyledTextCtrl - Regular Expressions

Summary:

Regular Expressions may optionally be used when searching. See Searching and Text Operations for more information about searches.

This is the detailed information about how to use regular expressions as gleaned from the Scintilla documentation. There are two versions, a complex one first, and a simpler one afterwards.

 Regular Expressions:
  
        [1]     char    matches itself, unless it is a special
                        character (metachar): . \ [ ] * + ^ $
  
        [2]     .       matches any character.
  
        [3]     \       matches the character following it, except
                        when followed by a left or right round bracket,
                        a digit 1 to 9 or a left or right angle bracket. 
                        (see [7], [8] and [9])
                        It is used as an escape character for all 
                        other meta-characters, and itself. When used
                        in a set ([4]), it is treated as an ordinary
                        character.
  
        [4]     [set]   matches one of the characters in the set.
                        If the first character in the set is "^",
                        it matches a character NOT in the set, i.e. 
                        complements the set. A shorthand S-E is 
                        used to specify a set of characters S upto 
                        E, inclusive. The special characters "]" and 
                        "-" have no special meaning if they appear 
                        as the first chars in the set.
                        examples:        match:
  
                                [a-z]    any lowercase alpha
  
                                [^]-]    any char except ] and -
  
                                [^A-Z]   any char except uppercase
                                         alpha
  
                                [a-zA-Z] any alpha
  
        [5]     *       any regular expression form [1] to [4], followed by
                        closure char (*) matches zero or more matches of
                        that form.
  
        [6]     +       same as [5], except it matches one or more.
  
        [7]             a regular expression in the form [1] to [10], enclosed
                        as \(form\) matches what form matches. The enclosure
                        creates a set of tags, used for [8] and for
                        pattern substution. The tagged forms are numbered
                        starting from 1.
  
        [8]             a \ followed by a digit 1 to 9 matches whatever a
                        previously tagged regular expression ([7]) matched.
  
        [9]  \<         a regular expression starting with a \< construct
             \>         and/or ending with a \> construct, restricts the
                        pattern matching to the beginning of a word, and/or
                        the end of a word. A word is defined to be a character
                        string beginning and/or ending with the characters
                        A-Z a-z 0-9 and _. It must also be preceded and/or
                        followed by any character outside those mentioned.
  
        [10]            a composite regular expression xy where x and y
                        are in the form [1] to [10] matches the longest
                        match of x followed by a match for y.
  
        [11] ^          a regular expression starting with a ^ character
             $          and/or ending with a $ character, restricts the
                        pattern matching to the beginning of the line,
                        or the end of line. [anchors] Elsewhere in the
                        pattern, ^ and $ are treated as ordinary characters.
 
 
    
 Examples:
 
 pattern:       foo*.*
 compile:       CHR f CHR o CLO CHR o END CLO ANY END END
 matches:       fo foo fooo foobar fobar foxx ...
 
 pattern:       fo[ob]a[rz]     
 compile:       CHR f CHR o CCL bitset CHR a CCL bitset END
 matches:       fobar fooar fobaz fooaz
 
 pattern:       foo\\+
 compile:       CHR f CHR o CHR o CHR \ CLO CHR \ END END
 matches:       foo\ foo\\ foo\\\  ...
 
 pattern:       \(foo\)[1-3]\1  (same as foo[1-3]foo)
 compile:       BOT 1 CHR f CHR o CHR o EOT 1 CCL bitset REF 1 END
 matches:       foo1foo foo2foo foo3foo
 
 pattern:       \(fo.*\)-\1
 compile:       BOT 1 CHR f CHR o CLO ANY END EOT 1 CHR - REF 1 END
 matches:       foo-foo fo-fo fob-fob foobar-foobar ...
 
  

The on-line Scintilla documentation has a somewhat less wordy spin on it:

.

Matches any character

\(

This marks the start of a region for tagging a match.

\)

This marks the end of a tagged region.

\n

Where n is 1 through 9 refers to the first through ninth tagged region when replacing. For example if the search string was Fred\([1-9]\)XXX and the replace string was Sam\1YYY applied to Fred2XXX this would generate Sam2YYY.

\<

This matches the start of a word using Scintilla's definitions of words.

\>

This matches the end of a word using Scintilla's definition of words.

\x

This allows you to use a character x that would otherwise have a special meaning. For example, \[ would be interpreted as [ and not as the start of a character set.

[...]

This indicates a set of characters, for example [abc] means any of the characters a, b or c. You can also use ranges, for example [a-z] for any lower case character.

[^...]

The complement of the characters in the set. For example, [^A-Za-z] means any character except an alphabetic character.

^

This matches the start of a line (unless used inside a set, see above).

$

This matches the end of a line.

*

This matches 0 or more times. For example Sa*m matches Sm, Sam, Saam, Saaam and so on.

+

This matches 1 or more times. For example Sa+m matches Sam, Saam, Saaam and so on.