rmsd: routine measure stirs doubts

1
RMSD: routine measure stirs doubts Jeremy J. Yang 3600 Cerrillos Road Suite 1107 Santa Fe, New Mexico 87507 505.473.7385 [email protected] www.eyesopen.com Introduction Root mean square deviation (RMSD) measurements between molecule conformations are routine and widespread in scientific literature. It appears widely accepted that to measure the difference or sameness between conformers, that is, a distance in "conformation space", a single RMSD calculation is sufficient. Is the current widespread use of RMSD justified, and if so how? This study examines the uses and limitations of RMSD, and some alternative or supplementary measures of distance between conformers. 1. "A solution for the best rotation to relate two sets of vectors", Wolfgang Kabsch, Acta Cryst. (1976), A32, p922-923. 2. "Metric properties of the root-mean-square deviation of vector sets", Kaindl K, Steipe B., Acta Crystallogr, 1997, A53, p809. 3. "Efficient RMSD measures for the comparison of two molecular ensembles", Rafael Brüschweile, Proteins: Structure, Function, and Genetics, Volume 50, Issue 1, 2003, p26-34. 4. All algorithms implemented using OEChem (OpenEye Scientific Software). 5. All conformations generated with Omega (OpenEye Scientific Software). 6. Shape similarity calculated using ROCS (OpenEye Scientific Software). References and Notes The root mean square deviation can be calculated for any two equal sized vectors: Definition of RMSD Conclusions RMSD is a useful and convenient measure but has limitations which can lead to errors and oversights. At minimum, investigators should be alert to cases where RMSD should be supplemented by other measures. • In some cases RMSD is insufficient. • In general larger RMSDs warrant closer inspection. • As molecule size increases, RMSD range increases, and descriptive power decreases. • Several other measures are available to help characterize geometric relationships not well handled by RMSD. • Low correlations indicate RMSD does not reveal information provided by these alternative methods for geometry comparison. • Conformer discrimination tests should include a variance or max atom- distance test in addition to RMSD. RMSD = 1 N ( x i y i ) 2 i = 1 N RMSD reduces N comparisons to a single scalar measure. In the realm of molecular geometry, RMSD is used to compare two sets of atomic coordinates. x i and y i represent two 3D positions of the ith atom. The set of atoms may comprise a complete molecule or substructure. The coordinates may exist in a defined reference frame (e.g. protein receptor site) in which case each geometry is said to represent a pose of the molecule. Or, coordinates may only have relative meaning and thus comprise a conformer for the molecule. In that case, the RMSD is normally minimized by finding the optimal alignment of the conformers. For any two conformers, an alignment exists for which RMSD is minimized. This alignment can be determined analytically 1 . This alignment can be the means or the end. Many modelling tasks require geometrical alignment. Minimized RMSD • Compare two conformations. • Compare two poses of the same or different conformations. • Compare the coordinates of substructures. • Measure the quality of a computed model vs. reference data (X-ray crystallographic or NMR). • Measure the diversity of an ensemble of conformers or poses. • Characterize and compare ensembles of conformations. For what tasks is RMSD used? • Easily calculated. • Unique, analytic minimum 1 . • Metric property 2 (triangle inequality satisfied), thus more intuitive measure of conformational space. • Emphasizes variance (relative to ordinary mean) Advantages of RMSD What about the variance? A large substructure may be perfectly aligned despite large overall RMSD. The variance among distances and/or the maximum distance can reveal this. Fundamental RMSD problem: the tasks of interest require an assessment of a geometric relationship which cannot always be summarized well by one scalar measurement. New methods yield new information 4 Max distance (of RMSD-aligned conformations) (+): simple, intuitive, provides criterion for further analysis Big RMSDs less informative Where bigis dependent on molecule size. Hence while RMSD can define a conformation spacefor a single molecule, its scale is dependent on molecule size and other graph-topological descriptors, thus its use in describing heterogeneous databases is hampered. Symmetry - a critical detail Calculating the correct min-RMSD requires the optimal auto-isomorphism for cases of molecular symmetry, an implementation detail which requires rigorous chemoinformatics to avoid errors. RMSD=4.47 Variance=11.34 Maxd=12.33 ShapeTanimoto=0.88 Example: 2 conformations of NCI 130813 Variance (of RMSD-aligned conformations) (+): simple, intuitive, provides criterion for further analysis Mean distance (+): most intuitive (-): no unique analytic minimum (-): no variance weighting Methods for further study: RMST, all torsions including rings Could also be centrality weighted (+): more comprehensive than straight RMST Uncolored graph RMSD Like shape similarity, indicates some chemistry-suppressed geometrical equivalence. Median distance (+): can reveal substructure equality (-): ignores outliers N-atoms is an arbitrary descriptor of size. Other descriptors investigated: N- rotors, mol-length (3D), enhanced Wiener index, Randic coefficient (branching). Test molecule: benzylpenicillin (41 atoms, 28 conformations 5 ) Also tested: dopamine (22 atoms, 7 conformations), methotrexate (54 atoms, 290 conformations) Pharmacophore RMSD Instead of all atoms, use pharmacophore points (-): requires expertise (-): not readily automated (+): involves expertise 3D max common substruct 3D match criteria needed (-): high computational cost Shape similarity (+): intuitive, rigorous, physical, fast using OE Shape Toolkit (-): shape alone ignores chemistry NCI 130813 RMST (torsion) (+): not size dependent (+): fast, no minimization (-): ignores rings Low correlation reflects information missed by RMSD. RMST, centrality-weighted Weight = subtree ratio (+): lever-arm effect considered (+): still fast Increased correlation confirms that centrality-weighting works. Weights could be squared to increase effect. Distance matrix RMSD (+): no minimization, fast (-): less intuitive benzylpenicillin dopamine methotrexate

Upload: jeremy-yang

Post on 04-Jun-2015

520 views

Category:

Documents


0 download

DESCRIPTION

Poster presented at the 230th National ACS meeting in Washington, D. C., 2005.

TRANSCRIPT

Page 1: RMSD: routine measure stirs doubts

RMSD: routine measure stirs doubts

Jeremy J. Yang

3600 Cerrillos Road Suite 1107 Santa Fe, New Mexico 87507

505.473.7385 [email protected]

www.eyesopen.com

Introduction

Root mean square deviation (RMSD) measurements between molecule conformations are routine and widespread in scientific literature. It appears widely accepted that to measure the difference or sameness between conformers, that is, a distance in "conformation space", a single RMSD calculation is sufficient. Is the current widespread use of RMSD justified, and if so how? This study examines the uses and limitations of RMSD, and some alternative or supplementary measures of distance between conformers.

1.  "A solution for the best rotation to relate two sets of vectors", Wolfgang Kabsch, Acta Cryst. (1976), A32, p922-923.

2.  "Metric properties of the root-mean-square deviation of vector sets", Kaindl K, Steipe B., Acta Crystallogr, 1997, A53, p809.

3.  "Efficient RMSD measures for the comparison of two molecular ensembles", Rafael Brüschweile, Proteins: Structure, Function, and Genetics, Volume 50, Issue 1, 2003, p26-34.

4.  All algorithms implemented using OEChem (OpenEye Scientific Software). 5.  All conformations generated with Omega (OpenEye Scientific Software). 6.  Shape similarity calculated using ROCS (OpenEye Scientific Software).

References and Notes

The root mean square deviation can be calculated for any two equal sized vectors:

Definition of RMSD

Conclusions

RMSD is a useful and convenient measure but has limitations which can lead to errors and oversights. At minimum, investigators should be alert to cases where RMSD should be supplemented by other measures. •  In some cases RMSD is insufficient. •  In general larger RMSDs warrant closer inspection. •  As molecule size increases, RMSD range increases, and descriptive power decreases. •  Several other measures are available to help characterize geometric relationships not well handled by RMSD. •  Low correlations indicate RMSD does not reveal information provided by these alternative methods for geometry comparison. •  Conformer discrimination tests should include a variance or max atom-distance test in addition to RMSD.

RMSD =1N

(xi − yi )2

i=1

N

RMSD reduces N comparisons to a single scalar measure. In the realm of molecular geometry, RMSD is used to compare two sets of atomic coordinates. xi and yi represent two 3D positions of the ith atom. The set of atoms may comprise a complete molecule or substructure. The coordinates may exist in a defined reference frame (e.g. protein receptor site) in which case each geometry is said to represent a pose of the molecule. Or, coordinates may only have relative meaning and thus comprise a conformer for the molecule. In that case, the RMSD is normally minimized by finding the optimal alignment of the conformers.

For any two conformers, an alignment exists for which RMSD is minimized. This alignment can be determined analytically1. This alignment can be the means or the end. Many modelling tasks require geometrical alignment.

Minimized RMSD

•  Compare two conformations. •  Compare two poses of the same or different conformations. •  Compare the coordinates of substructures. •  Measure the quality of a computed model vs. reference data (X-ray crystallographic or NMR). •  Measure the diversity of an ensemble of conformers or poses. •  Characterize and compare ensembles of conformations.

For what tasks is RMSD used?

•  Easily calculated. •  Unique, analytic minimum1. •  Metric property2 (triangle inequality satisfied), thus more intuitive measure of “conformational space”. •  Emphasizes variance (relative to ordinary mean)

Advantages of RMSD

What about the variance?

A large substructure may be perfectly aligned despite large overall RMSD. The variance among distances and/or the maximum distance can reveal this. Fundamental RMSD problem: the tasks of interest require an assessment of a geometric relationship which cannot always be summarized well by one scalar measurement.

New methods yield new information4

Max distance (of RMSD-aligned conformations) (+): simple, intuitive, provides criterion for further analysis

Big RMSDs less informative Where “big” is dependent on molecule size. Hence while RMSD can define a conformation “space” for a single molecule, its scale is dependent on molecule size and other graph-topological descriptors, thus its use in describing heterogeneous databases is hampered.

Symmetry - a critical detail Calculating the correct min-RMSD requires the optimal auto-isomorphism for cases of molecular symmetry, an implementation detail which requires rigorous chemoinformatics to avoid errors.

RMSD=4.47 Variance=11.34

Maxd=12.33 ShapeTanimoto=0.88

Example: 2 conformations of

NCI 130813

Variance (of RMSD-aligned conformations) (+): simple, intuitive, provides criterion for further analysis

Mean distance (+): most intuitive (-): no unique analytic minimum (-): no variance weighting

Methods for further study: RMST, all torsions including

rings Could also be centrality weighted (+): more comprehensive than straight RMST

Uncolored graph RMSD

Like shape similarity, indicates some chemistry-suppressed geometrical equivalence.

Median distance (+): can reveal substructure equality (-): ignores outliers

N-atoms is an arbitrary descriptor of size. Other descriptors investigated: N-rotors, mol-length (3D), enhanced Wiener index, R a n d i c c o e f f i c i e n t (branching).

Test molecule: benzylpenicillin (41 atoms, 28 conformations5) Also tested: dopamine (22 atoms, 7 conformations), methotrexate (54

atoms, 290 conformations)

Pharmacophore RMSD Instead of all atoms, use pharmacophore points (-): requires expertise (-): not readily automated (+): involves expertise

3D max common substruct

3D match criteria needed (-): high computational cost

Shape similarity (+): intuitive, rigorous, physical, fast using OE Shape Toolkit (-): shape alone ignores chemistry

NCI 130813

RMST (torsion) (+): not size dependent (+): fast, no minimization (-): ignores rings

Low correlation reflects information missed by RMSD.

RMST, centrality-weighted Weight = subtree ratio (+): lever-arm effect considered (+): still fast

Increased correlation confirms that centrality-weighting works. Weights could be squared to increase effect.

Distance matrix RMSD (+): no minimization, fast (-): less intuitive

benzylpenicillin! dopamine! methotrexate!