content tcm ingredients and databases digital representation of tcm ingredients
DESCRIPTION
Lecture 10 Cheminformatics of TCM Y.Z. Chen Department of Pharmacy National University of Singapore Tel: 65-6616-6877; Email: [email protected] ; Web: http://bidd.nus.edu.sg. Content TCM ingredients and databases Digital representation of TCM ingredients Molecular descriptors - PowerPoint PPT PresentationTRANSCRIPT
Lecture 10 Cheminformatics of TCMLecture 10 Cheminformatics of TCM
Y.Z. ChenY.Z. Chen Department of PharmacyDepartment of Pharmacy
National University of SingaporeNational University of Singapore Tel: 65-6616-6877; Email: Tel: 65-6616-6877; Email: [email protected] ; Web: ; Web: http://bidd.nus.edu.sg
ContentContent
• TCM ingredients and databasesTCM ingredients and databases
• Digital representation of TCM ingredientsDigital representation of TCM ingredients
• Molecular descriptors Molecular descriptors
• TCM ingredient classification by molecular descriptorsTCM ingredient classification by molecular descriptors
TCM IngredientsTCM Ingredients
Pharmacology & Therapeutics 2000, 86:191-198
Medicinal Herb Databases at BIDDMedicinal Herb Databases at BIDD
Comparison with existing TCM databases:
Formula: TCM-ID: 1000 TCHFL: 270Herb: TCM-ID: 1200 TCSHL: 520 TCMD: 1500Compound: TCM-ID: 9000 CNPD: 3000 TCMD: 6800
TCM Formula
Herb
Herb
Compound
Compound
Compound
Compound
Protein
Protein
Protein
Protein
Protein
Protein
Protein
Protein
Protein
Protein
Function
Function
Function
Function
Function
Function
Function
Structure
Structure
Structure
Structure
TCMD
TCHF Library
CNPD
TCSH Library
TCM-ID: Traditional Chinese Medicine - Information Database
Only database providing integrated and comprehensive info about:• TCM formula, constituent herbs, herbal ingredients, effect on proteins• Molecular structure • Function at the formula, herb and compound levels
TCM-ID Database at BIDDTCM-ID Database at BIDD
http://bidd.nus.edu.sg/group/TCMsite/Default.aspx
TCM-ID Database at BIDDTCM-ID Database at BIDD
http://bidd.nus.edu.sg/group/TCMsite/Default.aspx
TCM-ID Database at BIDDTCM-ID Database at BIDD
TCM-ID Database at BIDDTCM-ID Database at BIDD
TCM-ID Database at BIDDTCM-ID Database at BIDD
http://bidd.nus.edu.sg/group/TCMsite/Default.aspx
TCM-ID Database at BIDDTCM-ID Database at BIDD
http://bidd.nus.edu.sg/group/TCMsite/Default.aspx
TCM-ID Database at BIDDTCM-ID Database at BIDD
http://bidd.nus.edu.sg/group/TCMsite/Default.aspx
TCM-ID Database at BIDDTCM-ID Database at BIDD
TCM-ID Database at BIDDTCM-ID Database at BIDD
TCM-ID Database at BIDDTCM-ID Database at BIDD
TCM-ID Database at BIDDTCM-ID Database at BIDD
TCM-ID Database at BIDDTCM-ID Database at BIDD
http://bidd.nus.edu.sg/group/TCMsite/Default.aspx
TCM-ID Database at BIDDTCM-ID Database at BIDD
http://bidd.nus.edu.sg/group/TCMsite/Default.aspx
TCM-ID Database at BIDDTCM-ID Database at BIDD
http://bidd.nus.edu.sg/group/TCMsite/Default.aspx
PUBCHEM DatabasePUBCHEM Database
http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=90781
PUBCHEM DatabasePUBCHEM Database
http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=90781
Representation of Herbal Representation of Herbal Ingredients by SMILESIngredients by SMILES
• Simplified Molecular Input Line Entry System (SMILES)
• Widely used AND computationally efficient• Uses atomic symbols and a set of intuitive
rules• Uses hydrogen-suppressed molecular
graphs (HSMG)
SMILES BondsSMILES Bonds
SINGLE*
DOUBLE
TRIPLE
AROMATIC*
* can be omitted
-
=
#
:
ButanolsButanols
2-Butanol
iso-Butanol
tert-Butanol
O
O
O
SMILES BranchesSMILES Branches
• Represented by enclosure in parentheses
• Can be nested or stacked
• Examples:CC(O)CC is 2-Butanol
OCC(C)C is iso-Butanol
OC(C)(C)C is tert-Butanol
SMILES BondsSMILES Bonds
Ethene
Chloroethene
1,1-Dichloroethene
cis-1,2-Dichloroethene
Trichloroethene
Perchloroethene
C=C
ClC=C
ClC(Cl)=C
ClC=CCl
ClC(Cl)=CCl
ClC(Cl)=C(Cl)Cl
SMILES AtomsSMILES Atoms
• Use normal chemical symbols
• Add punctuation symbols if necessary
• No super- or subscripts
SMILES SymbolsSMILES Symbols
• String of alphanumeric characters and certain punctuation symbols
• Terminates at the first space encountered when read left to right
• The ORGANIC SUBSET:
B, C, N, O, P, S, F, Cl, Br, I
Other SMILES AtomsOther SMILES Atoms
• Aliphatic or nonaromatic carbon: C
• Atom in aromatic ring: lowercase letter
• Designate ring closure with pairs of matching digits, e.g.
c1ccccc1 (or C1=CC=CC=C1) is Benzene, whereas
C1CCCCC1 is Cyclohexane
SMILES ChargesSMILES Charges
• Specify attached hydrogens and charges in square brackets
• Number of attached hydrogens is the symbol H followed by optional digit
SMILES ChargesSMILES Charges
[H+]
[OH-]
[OH3+]
[Fe++]
[NH4+]
proton
hydroxyl anion
hydronium cation
iron(II) cation
ammonium cation
SMILES Cyclic StructuresSMILES Cyclic Structures
• Break one single or one aromatic bond in each ring
• Number in any order– Designate ring-breaking atoms by the same
digit following the atomic symbol
Representation of Herbal Ingredients Representation of Herbal Ingredients by Molecular Descriptorby Molecular Descriptor
• Molecular descriptors are numerical values that characterize properties of molecules
• Examples:– Physicochemical properties (empirical)– Values from algorithms, such as 2D
fingerprints
• Vary in complexity of encoded information and in compute time
3232
Molecular DescriptorsMolecular Descriptors• Constitutional
– MW, N atoms,
• Topological– Connectivity,Weiner index
• Electrostatic – Polarity, polarizability, partial charges
• Geometrical Descriptors– Length, width, Molecular volume
• Quantum Chemical– HOMO and LUMO energies– Vibrational frequencies– Bond orders– Total energy
3333
Molecular DescriptorsMolecular Descriptors
• van der Waals volume– The sum of the non-overlaping
volume of van der Waals sphere of each atom of the molecule
• Molecular surface– The area of the surface
contours generated by rolling a probing sphere against the surface atoms of the molecule
3434
• Molecular size vectors– Define ranges for
distances and angles
C
(
u
)
O
(
s
1
)
O
(
s
1
)
A
A
[
O
,
S
]
O
3.6 - 4.6 Å
3.3 - 4.3 Å
6.8 - 7.8 Å
Molecular DescriptorsMolecular Descriptors
Molecular Descriptors Molecular Descriptors for Large Data Setsfor Large Data Sets
• Descriptors representing properties of complete molecules– Examples: LogP, Molar Refractivity
• Descriptors calculated from 2D graphs– Examples: Topological Indexes, 2D
fingerprints
• Descriptors requiring 3D representations• Example: Pharmacophore descriptors
Molecular Descriptors Calculated Molecular Descriptors Calculated From 2D StructuresFrom 2D Structures
• Simple counts of features– Lipinski Rule of Five (H bonds, MW, etc.)– Number of ring systems– Number of rotatable bonds
• Not likely to discriminate sufficiently when used alone
• Combined with other descriptors for best effect
Physicochemical PropertiesPhysicochemical Properties
• Hydrophobicity– LogP – the logarithm of the partition coefficient
between n-octanol and water
• ClogP (Leo and Hansch) – based on small set of values from a small set of simple molecules– BioByte: http://www.biobyte.com/
– Daylight’s MedChem Help page
– http://www.daylight.com/dayhtml/databases/medchem/medchem-help.html
– Isolating carbon: one not doubly or triply bonded to a heteroatom
3838
Molecular Descriptor LogP Molecular Descriptor LogP
Octanol-Water Partition
Coefficients
• P = C(octanol) / C(water)• log P
like rG = - RT ln Keq
• Hydrophobic - hydrophilic character• P increases then more hydrophobic
Octanol
H O2
TCM Ingredient Classification TCM Ingredient Classification by Molecular Descriptorsby Molecular Descriptors
J. Chem. Inf. Model., Vol. 47, No. 6, 2007
TCM Ingredient Classification TCM Ingredient Classification by Molecular Descriptorsby Molecular Descriptors
Classification of TCM ingredients of specific chemical classes by decision trees method
TCM Ingredient Classification TCM Ingredient Classification by Molecular Descriptorsby Molecular Descriptors
TCM Ingredient Classification by Molecular TCM Ingredient Classification by Molecular DescriptorsDescriptors
TCM Ingredient Classification by Molecular TCM Ingredient Classification by Molecular DescriptorsDescriptors
Classification of TCM ingredients of specific chemical classes by decision trees method
TCM Ingredient Classification by Molecular TCM Ingredient Classification by Molecular DescriptorsDescriptors
Distribution of TCM ingredients of specific chemical classes without using decision trees method
TCM Ingredient Classification by Molecular DescriptorsTCM Ingredient Classification by Molecular Descriptors
Distribution of TCM ingredients of specific chemical classes without using decision trees method
AcknowledgementAcknowledgement
Current Group Members: • Computer-Aided Drug Design: CY Ung, XH Ma, XH Liu, Pankaj Kumar, F Zhu, X Liu, J Jia• Protein Function, Interaction, Network: HL Zhang, CY Ung, XH Ma, F Zhu, WK Teo, Z Shi• Databases and Servers: J Jia• Medicinal Herb: CY Ung, Pankaj Kumar, Cao Jinyi(undergraduate students)• Microarray and biomarkers: J Jia, ZQ Tang
Former Members:
PhD:ZW Cao (Prof SCBIT, Tongji U), ZL Ji (Assoc Prof Xiamen U), X Chen (Assoc Prof Zhejiang U), CW Yap (Assist Prof NUS), LY Han (Postdoc NIH), CJ Zheng (Postdoc NIH), HH Lin (Postdoc Harvard ), J Cui (Postdoc U Georgia), H Li (Postdoc Einstein College Med)
Research Fellow/Assistant:ZR Li (Assoc Prof SiChuan U), Y Xue (Prof SiChuan U), W Liu (Assoc Prof DUT), D Mi (Assoc Prof DUT), CZ Cai (Prof ChongQing U), DG Zhi (Postdoc, Berkeley),
MSc:Y.J. Guo (Postdoc NIH), L.Z. Sun (RA, U Tenn.), J. F. Wang (MSU), L.X. Yao (Columbia), S Ong (Washington U), H Zhou (local company), B Xie (local company)
BSc:W.K. Yeo (IMCB, Novartis)