Chemistry::MidasPattern - Select atoms in macromolecules
use Chemistry::MidasPattern; use Chemistry::File::PDB; # read a molecule my $mol = Chemistry::MacroMol->read("test.pdb"); # define a pattern matching carbons alpha and beta # in all valine residues my $str = ':VAL@CA,CB'; my $patt = Chemistry::MidasPattern->new($str); # apply the pattern to the molecule $patt->match($mol); # extract the results for my $atom ($patt->atom_map) { printf "%s\t%s\n", $atom->attr("pdb/residue_name"), $atom->name; } printf "FOUND %d atoms\n", scalar($patt->atom_map);
This module partially implements a pattern matching engine for selecting atoms in macromolecules by using Midas/Chimera patterns. See http://www.cmpharm.ucsf.edu/~troyer/troff2html/midas/Midas-uh-3.html#sh-2.1 for a detailed description of this language.
This module shares the same interface as Chemistry::Pattern; to perform a pattern matching operation on a molecule, follow these steps.
1) Create a pattern object, by parsing a string. Let's assume that the pattern object is stored in $patt and that the molecule is $mol.
2) Execute the pattern on the molecule by calling $patt->match($mol).
3) If $patt->match() returns true, extract the "map" that relates the pattern to the molecule by calling $patt->atom_map. These method returns a list of the atoms in the molecule that are matched by the pattern. Thus $patt->atom_map(1) would be analogous to the $1 special variable used for regular expresion matching. The difference between Chemistry::Pattern and Perl regular expressions is that atoms are always captured, and that each atom always uses one "slot".
The current implementation does not have the concept of a model, only of residues and atoms.
What follows is not exactly a formal grammar specification, but it should give a general idea:
SELECTOR = ((:RESIDUE)*(@ATOM)*)*
The star here means "zero or more", and the parentheses are used to delimit the effect of the star. The : and @ are used verbatim.
RESIDUE can be a name (e.g., LYS), a sequence number (e.g., 108), a range (e.g., 1-10), or a comma-separated list of RESIDUEs (e.g. 1-10,6,LYS).
ATOM is an atom name, a serial number (this is a non-standard extension) or a comma-separated list of ATOMs.
Names can have wildcards: * matches the whole name; ? matches one character; and = matches zero or more characters. An @ATOM specification is asociated with the closest preceding residue specification.
DISTANCE_SELECTOR = SELECTOR za< DISTANCE
Atoms within a certain distance of those that are matched by a selector can be selected by using the za< operator, where DISTANCE is a number in Angstroms.
EXPR = ( SELECTOR | DISTANCE_SELECTOR ) (& (SELECTOR | DISTANCE_SELECTOR))*
The result of two or more selectors can be intersected using the & operator.
:ARG All arginine atoms :ARG@* All arginine atoms @CA All alpha carbons :*@CA All alpha carbons :ARG@CA Arginine alpha carbons :VAL@C= Valine carbons :VAL@C? Valine carbons with two-letter names :ARG,VAL@CA Arginine and valine alpha carbons :ARG:VAL@CA All arginine atoms and valine alpha carbons :ARG@CA,CB Arginine alpha and beta carbons :ARG@CA@CB Arginine alpha and beta carbons :1-10 Atoms in residues 1 to 10 :48-* Atoms in residues 11 to the last one :30-40@CA & :ARG Alpha carbons in residues 1-10 which are also arginines. @123 Atom 123 @123 za<5.0 Atoms within 5.0 Angstroms of atom 123 @123 za>30.0 Atoms not within 30.0 Angstroms of atom 123 @CA & @123 za<5.0 Alpha carbons within 5.0 Angstroms of atom 123
If a feature does not appear in any of the examples, it is probably not implemented. For example, the zr< zone specifier, atom properties, Chimera extensions such as chains, etc.
The zone specifiers (selection by distance) currently use a brute-force N^2 algorithm. You can optimize an & expression by putting the most unlikely selectors first; for example
:1-20 zr<10.0 & :38 atoms in residue 38 within 10 A of atoms in residues 1-20 (slow) :38 & :1-20 zr<10.0 atoms in residue 38 within 10 A of atoms in residues 1-20 (not so slow)
In the first case, the N^2 search measures the distance between every atom in the molecule and every atom in residues 1-20, and then intersects the results with the atom list of residue 28; the second case only measures the distance between every atom in residue 38 with every atom in residues 1-20. The second way is much, much faster for large systems.
Some day, a future version may implement a smarter algorithm...
0.10
Chemistry::File::MidasPattern, Chemistry::Pattern
The PerlMol website http://www.perlmol.org/
Ivan Tubert <itub@cpan.org>
Copyright (c) 2004 Ivan Tubert. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
To install PerlMol, copy and paste the appropriate command in to your terminal.
cpanm
cpanm PerlMol
CPAN shell
perl -MCPAN -e shell install PerlMol
For more information on module installation, please visit the detailed CPAN module installation guide.