introduction to smiles & smatrs
TRANSCRIPT
-
7/31/2019 Introduction to SMILES & SMATRS
1/21
08/29/12 1
An Introduction to LineNotations
SMILES and SMARTS
By Bhushan Bapat
-
7/31/2019 Introduction to SMILES & SMATRS
2/21
08/29/12 2
CONTENTS
Line notations to represent molecular structures
SMILES
Specification Rules Isomeric SMILES
General conventions Reaction SMILES
SMARTSSpecifications
Recursive SMARTS
SMILES vs SMARTS
Acknowledgements
-
7/31/2019 Introduction to SMILES & SMATRS
3/21
08/29/12 3
Line Notations
International Chemical Identifier (InChI)
ROSDAL
Wiswesser Line Notation (WLN)
Simplified Molecular Input Line Entry Specification(SMILES)
SMILES Arbitrary Target Specification (SMARTS)
SYBYL Line Notation (SLN)
-
7/31/2019 Introduction to SMILES & SMATRS
4/21
08/29/12 4
SMILES
Simplified Molecular-Input Line-Entry System
Describes the structure of molecules using shortASCII strings
Conversion into two-dimensional drawings or three-dimensionalmodels of the molecules.
By Arthur Weininger and David Weininger in the late 1980s
Modified and extended by Daylight Chemical Information SystemsInc
In 2007, an open standard called "Open SMILES" was developed bythe Blue Obelisk open-source chemistry community.
-
7/31/2019 Introduction to SMILES & SMATRS
5/21
08/29/12 5
Rules for Encoding
Five rules for specifying atoms, bonds, branching, ring closures anddisconnections.
Rule for specifying Atoms Denoted by their atomic symbols in Square brackets []
[Se] , [Au]
B, C, N, O, P, S, P, F, Cl, Br and I do not need []
C methane, O water
Within brackets attached H and charges must be specified
[H+] proton, [OH-] hydroxyl ion[Fe+2] iron (II) cation
Aliphatic atom by capital symbol, aromatic by lower case symbol
-
7/31/2019 Introduction to SMILES & SMATRS
6/21
08/29/12 6
Rules for specifying Bonds
Single bond by - or can be omitted
CC ethane CH2CH3 Double bond by =
C=C ethene CH2CH2 Triple bond by #
C#N hydrogen cyanide
Aromatic bond by : or can be omitted
cccccc - benzene
For linear structures, SMILES notation is simple diagrammatic notationwith Hydrogen and single bonds omitted
C=CCC=CC - 1,4-Hexadiene
Rules for specifying Branches
Branches shown in parenthesis on the right
CCN(CC)CC -
-
7/31/2019 Introduction to SMILES & SMATRS
7/21
08/29/12 7
Rules for specifying Cyclic structures
by breaking one bond in each ring, ring opening (closing) atoms denoted by anumber following them
For cubane where more than one ring closure is present
SMILE for cubane is C12C3C4C1C5C4C3C25
-
7/31/2019 Introduction to SMILES & SMATRS
8/21
08/29/12 8
Rules for specifying Disconnected structures
Written as individual structures separated by .
Example Sodium Phenoxide
Atoms separated by . are not connected / bonded to each other
But
C1.C1 means CC i.e. Ethane
-
7/31/2019 Introduction to SMILES & SMATRS
9/21
08/29/12 9
Isomeric SMILES Used to specify isotopism, configuration around = bonds and chirality
Isotopic specification Desired atomic mass followed by atomic symbol
[12C] carbon-12
[13C] carbon-13
[13CH4] carbon-13 methane
Configuration around double bond Denoted by / and \ called directional bonds
F/C=C/F F/C=C\F
F\C=C\F F\C=C/F
-
7/31/2019 Introduction to SMILES & SMATRS
10/21
08/29/12 10
Chiral specification
Configuration around Tetrahedral Centers Tetrahedral structure is commonest chiral structure with four different
structures attached to C atom Indicated by @ and @@ @ - when neighboring atoms listed anticlockwise
- N[C@](C)(F)C(=O)O
@@ when neighboring atoms listed clockwise
- N[C@@](F)(C)C(=O)O
-
7/31/2019 Introduction to SMILES & SMATRS
11/21
08/29/12 11
Hydrogens
Denoted as explicit when
1) Charged hydrogen proton [H+]
2) Hydrogen molecule [H][H]
3) Bridging hydrogen H connected to two atoms
4) Isotopic hydrogen heavy water
Aromaticity
Uses Huckel rule to identify aromaticity
1) All C sp2 hybrtidized
2) Pi electrons satisfy 4n+2 rule
C1=COC=C1
c1cocc1
C1=CN=C[NH]C(=O)1
c1cnc[nH]c(=O)1
General conventions inSMILES
-
7/31/2019 Introduction to SMILES & SMATRS
12/21
08/29/12 12
Aromatic Nitrogen compounds
All can be represented as lower case atomic symbol, n
1) Pyridine
n1ccccc1
2) Pyridine-N-oxide
O=n1ccccc1
[O-][n+]1ccccc1
3) Mthyl and 1H-pyrrole
Cn1cccc1
[nH]1cccc1
-
7/31/2019 Introduction to SMILES & SMATRS
13/21
08/29/12 13
Reaction SMILES
Reactions written as
reactant > agent > product
Examples
C=CCBr>>C=CCI
C=CCBr.[Na+].[I-]>CC(=O)C>C=CCI.[Na+].[Br-]
-
7/31/2019 Introduction to SMILES & SMATRS
14/21
08/29/12 14
SMARTS
SMILES Arbitrary Target Specification
Language that allows searching of substructure within a structure
Extension of SMILES rules
Includes logical operators with nodes and edges Specifications
Atomic and Bond Primitives
Logical operators and Recursive SMARTS
-
7/31/2019 Introduction to SMILES & SMATRS
15/21
08/29/12 15
Atomic Primitives
Symbol Symbol name Atomic property
requirements
Default
* wildcard any atom (no default)
a aromatic aromatic (no default)
A aliphatic aliphatic (no default)
D degree explicit connections exactly one
H total-H-count attached hydrogens exactly one
h implicit-H-count implicit hydrogens at least one
R ring membership in SSSR rings any ring atom
r ring size in smallest SSSR ring of size
any ring atom
v valence total bond order exactly one
X connectivity total connections exactly one
-
7/31/2019 Introduction to SMILES & SMATRS
16/21
08/29/12 16
x ringconnectivity
total ring connections at least one
- negativecharge
- charge -1 charge (-- is -2, etc)
+ positivecharge
+ formal charge +1 charge (++ is +2,etc)
#n atomic number atomic number (no default)
@ chirality anticlockwise anticlockwise, defaultclass
@@ chirality clockwise clockwise, default class
@ chirality chiral class chirality
(nodefault)
@? chiral or unspec
chirality orunspecified
(no default)
atomic mass explicit atomic mass unspecified mass
-
7/31/2019 Introduction to SMILES & SMATRS
17/21
08/29/12 17
Bond Primitives
Symbol Atomic property requirements
- single bond (aliphatic)
/ directional bond "up"
\ directional bond "down"
/? directional bond "up or unspecified"
\? directional bond "down or unspecified"
= double bond
# triple bond
: aromatic bond
~ any bond (wildcard)
@ any ring bond1
-
7/31/2019 Introduction to SMILES & SMATRS
18/21
08/29/12 18
Logical operators
Atom and Bond specifications combined to form expressions
Example
[CH2] - aliphatic carbon with two hydrogens (methylene carbon) [!C;R] - ( NOT aliphatic carbon ) AND in ring
[X3&H0]atom with 3 total bonds and no H's
[35*]any atom of mass 35
Symbol Expression Meaning
exclamation !e1 not e1
ampersand e1&e2 a1 and e2 (high precedence)
comma e1,e2 e1 or e2
semicolon e1;e2 a1 and e2 (low precedence)
-
7/31/2019 Introduction to SMILES & SMATRS
19/21
-
7/31/2019 Introduction to SMILES & SMATRS
20/21
-
7/31/2019 Introduction to SMILES & SMATRS
21/21
08/29/12 21
ACKNOWLEDGEMENT
Daylight Chemical Information System Inc
Wikipedia
Thank You!