The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Chemistry::Pattern - Chemical substructure pattern matching

SYNOPSIS

    use Chemistry::Pattern;
    use Chemistry::Mol;
    use Chemistry::File::SMILES;

    # Create a pattern and a molecule from SMILES strings
    my $mol_str = "C1CCCC1C(Cl)=O";
    my $patt_str = "C(=O)Cl";
    my $mol = Chemistry::Mol->parse($mol_str, format => 'smiles');
    my $patt = Chemistry::Pattern->parse($patt_str, format => 'smiles');

    # try to match the pattern
    while ($patt->match($mol)) {
        @matched_atoms = $patt->atom_map;
        print "Matched: (@matched_atoms)\n";
        # should print something like "Matched: (a6 a8 a7)"
    }

DESCRIPTION

This module implements basic pattern matching for molecules. The Chemistry::Pattern class is a subclass of Chemistry::Mol, so patterns have all the properties of molecules and can come from reading the same file formats. Of course there are certain formats (such as SMARTS) that are exclusively used to describe patterns.

To perform a pattern matching operation on a molecule, follow these steps.

1) Create a pattern object, either by parsing a file or string, or by adding atoms and bonds by hand by using Chemistry::Mol methods. Note that atoms and bonds in a pattern should be Chemistry::Pattern::Atom and Chemistry::Patern::Bond objects. Let's assume that the pattern object is stored in $patt and that the molecule is $mol.

2) Execute the pattern on the molecule by calling $patt->match($mol).

3) If $patt->match() returns true, extract the "map" that relates the pattern to the molecule by calling $patt->atom_map or $patt->bond_map. These methods return a list of the atoms or bonds in the molecule that are matched by the corresponding atoms in the pattern. Thus $patt->atom_map(1) would be analogous to the $1 special variable used for regular expresion matching. The difference between Chemistry::Pattern and Perl regular expressions is that atoms and bonds are always captured.

4) If more than one match for the molecule is desired, repeat from step (2) until match() returns false.

METHODS

Chemistry::Pattern->new(name => value, ...)

Create a new empty pattern. This is just like the Chemistry::Mol constructor, with one additional option: "options", which expects a hash reference (the options themselves are described under the options() method).

$pattern->options(option => value,...)

Available options:

overlap

If true, matches may overlap. For example, the CC pattern could match twice on propane if this option is true, but only once if it is false. This option is true by default.

permute

Sometimes there is more than one way of matching the same set of pattern atoms on the same set of molecule atoms. If true, return these "redundant" matches. For example, the CC pattern could match ethane with two different permutations (forwards and backwards). This option is false by default.

$patt->reset

Reset the state of the pattern matching object, so that it begins the next match from scratch instead of where it left off after the last one.

$pattern->atom_map

Returns the list of atoms that matched the last time $pattern->match was called.

$pattern->bond_map

Returns the list of bonds that matched the last time $pattern->match was called.

$pattern->match($mol, %options)

Returns true if the pattern matches the molecule. If called again for the same molecule, continues matching where it left off (in a way similar to global regular expressions under scalar context). When there are no matches left, returns false. To force the match to always start from scratch instead of continuing where it left off, the reset option may be used.

    $pattern->match($mol, atom => $atom)

If atom => $atom is given as an option, match will only look for matches that start at $atom (which should be an atom in $mol, of course). This is somewhat analog to anchored regular expressions.

To find out which atoms and bonds matched, use the atom_map and bond_map methods.

VERSION

0.27

SEE ALSO

Chemistry::Pattern::Atom, Chemistry::Pattern::Bond, Chemistry::Mol, Chemistry::File, Chemistry::File::SMARTS.

The PerlMol website http://www.perlmol.org/

AUTHOR

Ivan Tubert-Brohman <itub@cpan.org>

COPYRIGHT

Copyright (c) 2009 Ivan Tubert-Brohman. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.