introduction to smiles & smatrs

Upload: bhushan-anil-bapat

Post on 04-Apr-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 Introduction to SMILES & SMATRS

    1/21

    08/29/12 1

    An Introduction to LineNotations

    SMILES and SMARTS

    By Bhushan Bapat

  • 7/31/2019 Introduction to SMILES & SMATRS

    2/21

    08/29/12 2

    CONTENTS

    Line notations to represent molecular structures

    SMILES

    Specification Rules Isomeric SMILES

    General conventions Reaction SMILES

    SMARTSSpecifications

    Recursive SMARTS

    SMILES vs SMARTS

    Acknowledgements

  • 7/31/2019 Introduction to SMILES & SMATRS

    3/21

    08/29/12 3

    Line Notations

    International Chemical Identifier (InChI)

    ROSDAL

    Wiswesser Line Notation (WLN)

    Simplified Molecular Input Line Entry Specification(SMILES)

    SMILES Arbitrary Target Specification (SMARTS)

    SYBYL Line Notation (SLN)

  • 7/31/2019 Introduction to SMILES & SMATRS

    4/21

    08/29/12 4

    SMILES

    Simplified Molecular-Input Line-Entry System

    Describes the structure of molecules using shortASCII strings

    Conversion into two-dimensional drawings or three-dimensionalmodels of the molecules.

    By Arthur Weininger and David Weininger in the late 1980s

    Modified and extended by Daylight Chemical Information SystemsInc

    In 2007, an open standard called "Open SMILES" was developed bythe Blue Obelisk open-source chemistry community.

  • 7/31/2019 Introduction to SMILES & SMATRS

    5/21

    08/29/12 5

    Rules for Encoding

    Five rules for specifying atoms, bonds, branching, ring closures anddisconnections.

    Rule for specifying Atoms Denoted by their atomic symbols in Square brackets []

    [Se] , [Au]

    B, C, N, O, P, S, P, F, Cl, Br and I do not need []

    C methane, O water

    Within brackets attached H and charges must be specified

    [H+] proton, [OH-] hydroxyl ion[Fe+2] iron (II) cation

    Aliphatic atom by capital symbol, aromatic by lower case symbol

  • 7/31/2019 Introduction to SMILES & SMATRS

    6/21

    08/29/12 6

    Rules for specifying Bonds

    Single bond by - or can be omitted

    CC ethane CH2CH3 Double bond by =

    C=C ethene CH2CH2 Triple bond by #

    C#N hydrogen cyanide

    Aromatic bond by : or can be omitted

    cccccc - benzene

    For linear structures, SMILES notation is simple diagrammatic notationwith Hydrogen and single bonds omitted

    C=CCC=CC - 1,4-Hexadiene

    Rules for specifying Branches

    Branches shown in parenthesis on the right

    CCN(CC)CC -

  • 7/31/2019 Introduction to SMILES & SMATRS

    7/21

    08/29/12 7

    Rules for specifying Cyclic structures

    by breaking one bond in each ring, ring opening (closing) atoms denoted by anumber following them

    For cubane where more than one ring closure is present

    SMILE for cubane is C12C3C4C1C5C4C3C25

  • 7/31/2019 Introduction to SMILES & SMATRS

    8/21

    08/29/12 8

    Rules for specifying Disconnected structures

    Written as individual structures separated by .

    Example Sodium Phenoxide

    Atoms separated by . are not connected / bonded to each other

    But

    C1.C1 means CC i.e. Ethane

  • 7/31/2019 Introduction to SMILES & SMATRS

    9/21

    08/29/12 9

    Isomeric SMILES Used to specify isotopism, configuration around = bonds and chirality

    Isotopic specification Desired atomic mass followed by atomic symbol

    [12C] carbon-12

    [13C] carbon-13

    [13CH4] carbon-13 methane

    Configuration around double bond Denoted by / and \ called directional bonds

    F/C=C/F F/C=C\F

    F\C=C\F F\C=C/F

  • 7/31/2019 Introduction to SMILES & SMATRS

    10/21

    08/29/12 10

    Chiral specification

    Configuration around Tetrahedral Centers Tetrahedral structure is commonest chiral structure with four different

    structures attached to C atom Indicated by @ and @@ @ - when neighboring atoms listed anticlockwise

    - N[C@](C)(F)C(=O)O

    @@ when neighboring atoms listed clockwise

    - N[C@@](F)(C)C(=O)O

  • 7/31/2019 Introduction to SMILES & SMATRS

    11/21

    08/29/12 11

    Hydrogens

    Denoted as explicit when

    1) Charged hydrogen proton [H+]

    2) Hydrogen molecule [H][H]

    3) Bridging hydrogen H connected to two atoms

    4) Isotopic hydrogen heavy water

    Aromaticity

    Uses Huckel rule to identify aromaticity

    1) All C sp2 hybrtidized

    2) Pi electrons satisfy 4n+2 rule

    C1=COC=C1

    c1cocc1

    C1=CN=C[NH]C(=O)1

    c1cnc[nH]c(=O)1

    General conventions inSMILES

  • 7/31/2019 Introduction to SMILES & SMATRS

    12/21

    08/29/12 12

    Aromatic Nitrogen compounds

    All can be represented as lower case atomic symbol, n

    1) Pyridine

    n1ccccc1

    2) Pyridine-N-oxide

    O=n1ccccc1

    [O-][n+]1ccccc1

    3) Mthyl and 1H-pyrrole

    Cn1cccc1

    [nH]1cccc1

  • 7/31/2019 Introduction to SMILES & SMATRS

    13/21

    08/29/12 13

    Reaction SMILES

    Reactions written as

    reactant > agent > product

    Examples

    C=CCBr>>C=CCI

    C=CCBr.[Na+].[I-]>CC(=O)C>C=CCI.[Na+].[Br-]

  • 7/31/2019 Introduction to SMILES & SMATRS

    14/21

    08/29/12 14

    SMARTS

    SMILES Arbitrary Target Specification

    Language that allows searching of substructure within a structure

    Extension of SMILES rules

    Includes logical operators with nodes and edges Specifications

    Atomic and Bond Primitives

    Logical operators and Recursive SMARTS

  • 7/31/2019 Introduction to SMILES & SMATRS

    15/21

    08/29/12 15

    Atomic Primitives

    Symbol Symbol name Atomic property

    requirements

    Default

    * wildcard any atom (no default)

    a aromatic aromatic (no default)

    A aliphatic aliphatic (no default)

    D degree explicit connections exactly one

    H total-H-count attached hydrogens exactly one

    h implicit-H-count implicit hydrogens at least one

    R ring membership in SSSR rings any ring atom

    r ring size in smallest SSSR ring of size

    any ring atom

    v valence total bond order exactly one

    X connectivity total connections exactly one

  • 7/31/2019 Introduction to SMILES & SMATRS

    16/21

    08/29/12 16

    x ringconnectivity

    total ring connections at least one

    - negativecharge

    - charge -1 charge (-- is -2, etc)

    + positivecharge

    + formal charge +1 charge (++ is +2,etc)

    #n atomic number atomic number (no default)

    @ chirality anticlockwise anticlockwise, defaultclass

    @@ chirality clockwise clockwise, default class

    @ chirality chiral class chirality

    (nodefault)

    @? chiral or unspec

    chirality orunspecified

    (no default)

    atomic mass explicit atomic mass unspecified mass

  • 7/31/2019 Introduction to SMILES & SMATRS

    17/21

    08/29/12 17

    Bond Primitives

    Symbol Atomic property requirements

    - single bond (aliphatic)

    / directional bond "up"

    \ directional bond "down"

    /? directional bond "up or unspecified"

    \? directional bond "down or unspecified"

    = double bond

    # triple bond

    : aromatic bond

    ~ any bond (wildcard)

    @ any ring bond1

  • 7/31/2019 Introduction to SMILES & SMATRS

    18/21

    08/29/12 18

    Logical operators

    Atom and Bond specifications combined to form expressions

    Example

    [CH2] - aliphatic carbon with two hydrogens (methylene carbon) [!C;R] - ( NOT aliphatic carbon ) AND in ring

    [X3&H0]atom with 3 total bonds and no H's

    [35*]any atom of mass 35

    Symbol Expression Meaning

    exclamation !e1 not e1

    ampersand e1&e2 a1 and e2 (high precedence)

    comma e1,e2 e1 or e2

    semicolon e1;e2 a1 and e2 (low precedence)

  • 7/31/2019 Introduction to SMILES & SMATRS

    19/21

  • 7/31/2019 Introduction to SMILES & SMATRS

    20/21

  • 7/31/2019 Introduction to SMILES & SMATRS

    21/21

    08/29/12 21

    ACKNOWLEDGEMENT

    Daylight Chemical Information System Inc

    Wikipedia

    Thank You!