-
-
20 Oct 2005 21:04:30 UTC
- Distribution: Chemistry-File-SMILES
- Module version: 0.45
- Source (raw)
- Browse (raw)
- Changes
- How to Contribute
- Issues
- Testers (60 / 1 / 0)
- Kwalitee
Bus factor: 1- License: unknown
- Activity
24 month- Tools
- Download (11.34KB)
- MetaCPAN Explorer
- Permissions
- Subscribe to distribution
- Permalinks
- This version
- Latest version
- Dependencies
- Chemistry::Bond::Find
- Chemistry::Canonicalize
- Chemistry::Mol
- Chemistry::Ring
- List::Util
- and possibly others
- Reverse dependencies
- CPAN Testers List
- Dependency graph
NAME
Chemistry::File::SMILES - SMILES linear notation parser/writer
SYNOPSYS
#!/usr/bin/perl use Chemistry::File::SMILES; # parse a SMILES string my $s = 'C1CC1(=O)[O-]'; my $mol = Chemistry::Mol->parse($s, format => 'smiles'); # print a SMILES string print $mol->print(format => 'smiles'); # print a unique (canonical) SMILES string print $mol->print(format => 'smiles', unique => 1); # parse a SMILES file my @mols = Chemistry::Mol->read("file.smi", format => 'smiles'); # write a multiline SMILES file Chemistry::Mol->write("file.smi", mols => \@mols);
DESCRIPTION
This module parses a SMILES (Simplified Molecular Input Line Entry Specification) string. This is a File I/O driver for the PerlMol project. http://www.perlmol.org/. It registers the 'smiles' format with Chemistry::Mol.
This parser interprets anything after whitespace as the molecule's name; for example, when the following SMILES string is parsed, $mol->name will be set to "Methyl chloride":
CCl Methyl chloride
The name is not included by default on output. However, if the
name
option is defined, the name will be included after the SMILES string, separated by a tab.print $mol->print(format => 'smiles', name => 1);
Multiline SMILES and SMILES files
A file or string can contain multiple molecules, one per line.
CCl Methyl chloride CO Methanol
Files with the extension '.smi' are assumed to have this format.
Atom Mapping Numbers
As an extension for reaction processing, SMILES strings may have atom mapping numbers, which are introduced after a colon in a bracketed atom. For example, [C:1]. The mapping number need not be unique. This module reads the mapping numbers and stores them as the name of the atom ($atom->name).
On output, atom names are not included by default. See the
number
andauto_number
options below for ways of including them.head1 OPTIONS
The following options are supported in addition to the options mentioned for Chemistry::File, such as
mol_class
,format
, andfatal
.- aromatic
-
On output, detect aromatic atoms and bonds by means of the Chemistry::Ring module, and represent the organic aromatic atoms with lowercase symbols.
- unique
-
When used on output, canonicalize the structure if it hasn't been canonicalized already and generate a unique SMILES string. This option implies "aromatic".
- number
-
For atoms that have a defined name, print the name as the "atom number". For example, if an ethanol molecule has the name "42" for the oxygen atom and the other atoms have undefined names, the output would be:
CC[OH:42]
- auto_number
-
When used on output, number all the atoms explicitly and sequentially. The output for ethanol would look something like this:
[CH3:1][CH2:2][OH:3]
- name
-
Include the molecule name on output, as described in the previous section.
- kekulize
-
When used on input, assign single or double bond orders to "aromatic" or otherwise unspecified bonds (i.e., generate the Kekule structure). If false, the bond orders will remain single. This option is true by default. This uses
assign_bond_orders
from the Chemistry::Bond::Find module.
CAVEATS
Stereochemistry is not supported! Stereochemical descriptors such as @, @@, /, and \ will be silently ignored on input, and will certainly not be produced on output.
Reading branches that start before an atom, such as (OC)C, which should be equivalent to C(OC) and COC, according to some variants of the SMILES specification. Many other tools don't implement this rule either.
The kekulize option works by increasing the bond orders of atoms that don't have their usual valences satisfied. This may cause problems if you have atoms with explicitly low hydrogen counts.
VERSION
0.45
SEE ALSO
Chemistry::Mol, Chemistry::File
The SMILES Home Page at http://www.daylight.com/dayhtml/smiles/
The Daylight Theory Manual at http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html
The PerlMol website http://www.perlmol.org/
AUTHOR
Ivan Tubert-Brohman <itub@cpan.org>
COPYRIGHT
Copyright (c) 2005 Ivan Tubert-Brohman. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
Module Install Instructions
To install Chemistry::File::SMILES, copy and paste the appropriate command in to your terminal.
cpanm Chemistry::File::SMILES
perl -MCPAN -e shell install Chemistry::File::SMILES
For more information on module installation, please visit the detailed CPAN module installation guide.