consideration of glycosidic torsion angle …
Post on 05-Feb-2022
5 Views
Preview:
TRANSCRIPT
CONSIDERATION OF GLYCOSIDIC TORSION ANGLE PREFERENCES AND
CH/π INTERACTIONS IN PROTEIN-CARBOHYDRATE DOCKING
by
Anita Karen Nivedha
(Under the direction of Robert J. Woods)
ABSTRACT
Carbohydrates play a pivotal role in various life processes including energy metabolism,
storage, immune recognition, transportation, signaling and biosynthesis. In these roles,
they often interact with other integral components of the living system such as proteins
and lipids. An understanding of how these molecules interact can further our knowledge
of crucial biological processes, and begins with the knowledge of the three-dimensional
structures of these complexes. However, owing to challenges involved in crystallizing
oligosaccharide structures, theoretical modeling methods such as molecular docking are
often used to predict how oligosaccharides interact with protein receptors. But, docking
programs have generalized scoring functions which often produce unnatural
oligosaccharide conformations during docking. In this thesis, we present two approaches
to improve protein-carbohydrate docking by accounting for specific intra- and
intermolecular interaction energies relating to carbohydrates, which are not currently
dealt with by existing docking methodologies. In the first approach, we developed a set of
Carbohydrate Intrinsic (CHI) energy functions in order to account for intramolecular
energies of carbohydrate ligands primarily determined by the conformations of glycosidic
torsion angles connecting individual saccharides. This work resulted in the development
of Vina-Carb (incorporation of the CHI energy functions within the scoring function of
AutoDock Vina), which significantly improved the conformations of oligosaccharide
binding mode predictions. In the second approach, we developed a scoring function by
fitting a mathematical model to data from literature describing the energy contributed by
CH/π interactions. This energy function was used to score the crucial interactions
between CH groups lining the carbohydrate ring and the π electron densities in aromatic
amino acids of interacting proteins. Employing the CH/π interaction energy function to
rescore docked protein-carbohydrate complexes improved the rankings of accurate pose
predictions made by both AutoDock Vina and Vina-Carb. The scoring functions
developed and used in this work are transferable and can therefore be used with other
docking programs and also in the refinement of experimental carbohydrate structures.
INDEX WORDS: Autodock, AutoDock Vina, Molecular Docking, Protein-Carbohydrate
Docking, Docking Scoring Functions, Internal Energies, Carbohydrate, Carbohydrate
Intrinsic Energy Functions, CHI Energy Functions, Vina-Carb, Antibody, Antigen,
Lectin, Enzyme, Carbohydrate Binding Module, CH/π Interactions
CONSIDERATION OF GLYCOSIDIC TORSION ANGLE PREFERENCES AND
CH/π INTERACTIONS IN PROTEIN-CARBOHYDRATE DOCKING
by
Anita Karen Nivedha
B. Tech., Vellore Institute of Technology University, India, 2008
A Thesis Submitted to the Graduate Faculty of The University of Georgia in Partial
Fulfillment of the Requirements for the Degree
DOCTOR OF PHILOSOPHY
ATHENS, GEORGIA
2015
© 2015
Anita Karen Nivedha
All Rights Reserved
CONSIDERATION OF GLYCOSIDIC TORSION ANGLE PREFERENCES AND
CH/π INTERACTIONS IN PROTEIN-CARBOHYDRATE DOCKING
by
Anita Karen Nivedha
Major Professor: Robert J. Woods
Committee: James H. Prestegard
Liming Cai
Donald Evans
Electronic Version Approved:
Suzanne Barbour
Dean of the Graduate School
The University of Georgia
December 2015
iv
DEDICATION
I would like to dedicate this work to my beloved parents, Jenetta and Joshwa.
v
ACKNOWLEDGEMENTS
Firstly, I would like to acknowledge and extend my gratitude to my major
Professor, Dr. Robert J. Woods for his support, encouragement, guidance and for giving
me the wonderful opportunity to be a part of the Woods’ Group Family. I would like to
thank my PhD Advisory Committee, Dr. James H. Prestegard, Dr. Liming Cai and Dr.
Donald L. Evans for their valuable advice, insight and suggestions over the years as my
dissertation took shape. I would like to thank colleagues who were directly involved in
my research, Dr. B. Lachele Foley, Dr. Matthew B. Tessier, Dr. Spandana Makeneni and
David F. Thieker. It has been a great learning experience and a pleasure collaborating and
working with each one of you.
I would like to acknowledge the support of my peers in the Woods’ group: Dr.
Arunima Singh, Amika Sood, Dr. Jodi Hadden, Mark Baine, Dr. Xiaocong Wang, Dr.
Keigo Ito, Dr. Oliver Grant, Huimin Hu, Dr. Valerie Murphy, Dr. Mari DeMarco, Mia Ji,
Dr. Elisa Fadda, Dr. Joanne Martin and Dr. Hannah Smith. Matt, thank you for helping
me when I was a newbie in the group, and amongst other things, for teaching me to do
docking, which constitutes a major portion of my dissertation today. Arunima, Amika,
Spandana and Jodi, thank you for being with me through the ups and downs in Graduate
School. Keigo, thank you for helping me with all my QM questions and for your tips on
scientific writing. Mark, thank you for being a huge support during my time in the group
and for all of your efforts in keeping everything around the lab in order.
I am thankful to God for being my Provider and for all of His blessings at every stage
of my life as a graduate student. I would like to acknowledge the unconditional love,
vi
support and encouragement given by Mama and Papa. Thank you for being my greatest
cheerleaders. I would like to extend my heartfelt thanks to Amy, Ashley, Niranjana,
Madison, Jagadish, Cookieday, Adwoa, Ken, Femi, Anna, Ebenezer, Adeline, Savior
Karnik and Manikins, for being there for me, for believing in me, cheering me on and
supporting me throughout Graduate School. I could not have done it without your solid
support.
I would certainly not be where I am if not for all of the wonderful people who have
sown into my life and my career. For them, I am forever grateful.
vii
LIST OF TABLES
Table 4.1 PDB IDs and ligand sequences employed in the study, including the shape
RMSD (SRMSD) values for the ligands generated by GLYCAM, relative to the
crystallographic ligands. ....................................................................................... 25
Table 5.1 Comparison between ADV and VC at the four settings of CHI-coefficient and
CHI-cutoff. ............................................................................................................ 68
Table 5.2 PRMSDmin(5) produced by ADV and VC1|2 for the 12 test systems with ligands
containing 1,6-linkages. ........................................................................................ 69
Table 5.3 Comparison between ADV and VC1|2 for the apo proteins Test Set. ............... 76
Table 6.1 Average rank of accurate PRMSDmin pose predictions by ADV and VC1|2
before and after rescoring as a function of the CH/π interaction energy
coefficients. The systems are divided into different groups based on the number of
detected CH/π interactions. ................................................................................... 95
viii
LIST OF FIGURES
Figure 2.1. An illustration of the conversion from the chain and ring form of glucose. .... 6
Figure 2.2 A representation of two chair conformations of Glucose, namely, 4C1 and
1C4.
................................................................................................................................. 7
Figure 2.3 A 1-3 glycosidic linkage formation between a glucopyranose (Glcp) unit and a
galactopyranose (Galp) unit. The D in the name refers to the molecule being
dextrorotatory, which refers to it rotating plane polarized light to the right. .......... 8
Figure 2.4 Carbohydrate epimers: galactose and glucose are C4 epimers, while glucose
and mannose are C2 epimers. .................................................................................. 8
Figure 3.1 a.) Rigid Docking b.) Flexible Ligand Docking .............................................. 14
Figure 3.2 The workflow within the AutoDock Vina algorithm. ..................................... 18
Figure 4.1 (a) Illustration of an antibody with its variable fragment (Fv) aligned to the
grid box. The yellow dot represents the CoM of the CDRs (0,0,0), and the green
dot represents the center of the grid box (0,0,11). (b) Aligned orientation of an
antibody antigen-binding fragment (Fab), with respect to the internal reference
axes. The region in red + pink represents the VH domain (CDRs (red) and
framework regions (pink) of the heavy chain) of the antibody, while the region in
blue represents the VL domain (CDRs (dark blue) and framework regions (cyan)
of the light chain). The X-axis for the alignment was defined by a vector passing
through the CoM of the variable light chain (VL domain, which contains the light
chain CDRs and framework sequences), and the CoM of the variable heavy chain
(VH domain). The Z-axis was defined as a vector normal to the X-axis, and
passing through the CoM of the entire variable region, or variable fragment (Fv).
ix
The antibody was then translated so that the CoM of the CDRs was placed at the
origin. The Y-axis was defined as a vector perpendicular to the XZ-plane, and
passing through the origin. The docking grid box was aligned to the internal co-
ordinate axes with its center offset from the origin by 11Å along the Z-axis, so as
to optimally encompass the CDR loops, while also permitting adequate volume
for the movement of the ligand during docking. Such a definition enabled the
docking grid box to be consistently aligned with respect to the CDRs. ............... 28
Figure 4.2 PRMSD and SRMSD calculation. Shown in (a) and (b) are the PRMSD and
SRMSD, respectively, of a representative docked pose with respect to its crystal
ligand. (a) The PRMSD is the RMSD between the ring atoms of a representative
docked structure (white) and the corresponding crystal structure (black). (b) The
SRMSD is the RMSD value obtained after the docked structure (white) is
superimposed on the crystal structure (black). ..................................................... 30
Figure 4.3 The φ and ψ angle distributions from 100 docked structures, for selected
linkages, as indicated by the dashed rectangle. Data are presented, in order, for
AD3 (black bars), AD4.2 (white bars) and ADV (grey bars). The bin containing
the experimentally-determined values is highlighted with a light blue outline. The
bin containing the structure with the lowest docked energy is indicated as follows:
AD3, yellow; AD4.2, orange; ADV, green. ......................................................... 34
Figure 4.4 Representation of the 8 model disaccharides pertinent to the development of
CHI energy functions. The models depicting 1,2-linkages can be used to model
1,4-linkages due to symmetry about the O5 atom. ................................................ 35
x
Figure 4.5 Individual (dashed lines) and average (solid line) rotational energy curves for
models (see Figure 4.4) whose linkages have similar local geometries. .............. 36
Figure 4.6 Comparison of the CHI energy functions (solid line) to the glycosidic torsion
angle distributions of carbohydrates from experimental co-crystal structures
(histograms). ......................................................................................................... 38
Figure 4.7 Scatter plots demonstrating improvement in the linear correlation between
SRMSD and docked energies after rescoring, for each of the three docking
programs. Points before rescoring are shown in dark grey and points after
rescoring are shown in light grey. Shown in the insets are SRMSD vs. docked
energy plots of only the overall lowest PRMSD structure for each of the six
antibody systems before (dark grey) and after (light grey) rescoring. The black
rectangles in all insets enclose plot areas with SRMSD ≤ 1 Å and energies ≤ 0
kcal/mol................................................................................................................. 39
Figure 4.8 Graphs showing the distribution of conformations produced by AD3 ( ),
AD4.2 ( ) and ADV ( ) plotted onto the corresponding CHI energy curves for
each of the representative linkage combinations; the curves are offset from each
other by 6 kcal/mol. .............................................................................................. 41
Figure 4.9 a) SRMSDs of the lowest energy poses for all six systems from AD3, AD4.2
and ADV, before (dark grey) and after (light grey) rescoring. (b) PRMSDs of the
lowest energy poses for all six systems from all three docking programs, before
(dark grey) and after (light grey) rescoring. .......................................................... 43
Figure 4.10 (a) AD3 lowest energy pose for 1S3K before rescoring (white) compared to
the crystal ligand (black); PRMSD = 5.7 Å. (b) Lowest energy pose after
0
2
4
6
8
10
12
14
16
18
20
22
0 60 120 180 240 300 360
ΔE
[k
cal/
mo
l]
ψ [deg]
0
2
4
6
8
10
12
14
16
18
20
22
0 60 120 180 240 300 360
ΔE
[k
cal/
mo
l]
ψ [deg]0
2
4
6
8
10
12
14
16
18
20
22
0 60 120 180 240 300 360
ΔE
[k
cal/
mo
l]
ψ [deg]
xi
inclusion of the CHI energy (white) compared to the crystal ligand (black);
PRMSD = 0.6 Å. ................................................................................................... 44
Figure 4.11 Docking the trisaccharide to the Salmonella antibody (in 1MFD and 1MFA).
(a) Lowest energy pose from ADV for 1MFD before rescoring (white) compared
to the crystal ligand (black); PRMSD = 5.5Å. (b) Lowest energy pose from ADV
for 1MFD after rescoring (white) compared to the crystal ligand (black); PRMSD
= 1.0Å. (c) and (d) show the 1MFD antibody in transparent surface representation
along with the oxygen atom belonging to the water molecule from the
crystallographic co-complex, WAT 601; in (c) the crystal ligand from 1MFD is
shown in CPK representation, and in (d) the lowest energy pose from ADV for
1MFD before rescoring (in CPK representation) showing the Gal residue
replacing Abe within the binding pocket is shown. (e) The Gal residue from the
ligand in 1MFD (in van der Waals representation) after being superimposed onto
the Abe residue from the ligand in 1MFA is shown within the 1MFA binding site.
A cross-section of the 1MFA antibody is represented as a transparent surface with
potential steric clashes visible between the Gal residue and the antibody. (f) Same
as (e) but with the 1MFA antibody represented as an opaque surface thus more
clearly depicting potential steric clashes between the O-3 and O-6 groups of the
Gal residue and the interior of the binding pocket. ............................................... 48
Figure 4.12 Docking to the antibody in 1M7I using AD4.2. (a) Lowest energy pose
before rescoring (white) compared to the crystal ligand (black); PRMSD = 3.9Å.
(b) Lowest energy pose after rescoring (white) compared to the crystal ligand
(black); PRMSD = 10.7Å. .................................................................................... 49
xii
Figure 5.1 a.) The effect of applying CHI-coefficient values of 1 (solid line), 2 (dashed
line) and 5 (dotted line) to the original VCΦ|β curve. b.) The effect of applying a
CHI-cutoff value of 2 to the original CHIΦ|β curve (VC1|2). ................................. 58
Figure 5.2 Assessment of docking to 14 antibody systems with ADV and various CHI-
energy coefficients of VC a.) SRMSDavg amongst the 5 top-ranked poses b.)
PRMSDmin(5). ......................................................................................................... 65
Figure 5.3 Comparison of the VC1|2 (dotted line) and VC2|1 (solid line) CHIΦ|β curve to
the distribution of glycosidic linkages in carbohydrate crystal structures in the
PDB. The bottom X-axis and left Y-axis correspond to the histogram which
depicts the distribution of PDB structures, while the top X-axis and right Y-axis
correspond to the CHI-energy curves. .................................................................. 67
Figure 5.4 Distribution of ω angles produced by ADV (blue) and VC1|2(green) for 12 test
systems containing one or more 1,6-linkages overlaid against the reference crystal
structure ω angles (red dots) and the corresponding CHI energy curve. .............. 70
Figure 5.5 a.) The PRMSDmin(5) pose from ADV compared to the reference ligand (blue).
b.) The PRMSDmin(5) pose from VC1|2 compared to the reference ligand (blue). c.)
The Φ torsion angles of α-sugars from the docked poses of the 3C6S ligand from
both ADV (yellow triangles) and VC1|2 (green squares) plotted on to the CHI
curve. The torsion angles corresponding to the reference are plotted as blue
circles. ................................................................................................................... 72
Figure 5.6 The crystal structure of a CBM from endoglucanase Cel5A (PDB ID: 4AFD)
is depicted in complex with a tetrasaccharide ligand. All amino acids further than
xiii
5 Å away from the ligand are colored grey. Those residues within 5 Å are colored
orange if they are cyclic and red if acyclic. .......................................................... 74
Figure 5.7 a.) Models representing the PRMSDmin(5) produced by docking 1MFC with
ADV (yellow) and VC at CHI1|2 (green). The primary difference between docked
models is a rhamnose ring that is flipped approximately 180 degrees, highlighted
by the orange arrows. b.) Ligands from two crystal structures, 1MFB (blue) and
1MFC (cyan), also differ by the orientation of the RAM 524 ring....................... 75
Figure 5.8 A depiction of the ranks of acceptable poses (Rankacc), i.e., the lowest-ranked
pose with PRMSD ≤ 2Å, produced by ADV and VC1|2 from docking
oligosaccharide ligands onto apo protein structures. ............................................ 77
Figure 5.9 a) The ligands from five crystal structures (PDB ID: 2E0P, 2EO7, 2EEX,
2EJ1, and 2EQD) of the Cel44A enzyme are superimposed on the protein from
PDB ID: 2EQD. Amino acids reported to be involved in substrate binding (N45,
R47, W64, W71, W327, W331, E359, and W392) are colored orange or red,
depending on whether the residue is aromatic or not. 146
The catalytic residue
(Q186) is colored yellow. All other amino acids are grey. The active site has been
separated into a (-) and (+) site. The circled values represent the position of each
residue relative to the glycosidic linkage that is cleaved during catalysis. The
ligands exclusive to the (-) side of the active site are depicted by varying shades
of purple. The octasaccharide that extends across both the (-) and (+) site (2EQD)
is colored blue. Each carbohydrate ring is colored according to whether the CHI
energy penalty is applied to the surrounding Φ/Ψ values. Rings are either green or
red depending on whether VC is or is not applied, respectively. b) A
xiv
representation of the PRMSDmin(5) and PRMSDmin(20) poses from ADV and VC1|2.
c) The glycosidic linkages of the octasaccharide that extends across the active site
(2EQD) are labeled according to the penalty received by the CHI energy curve.
Penalties greater than 2 kcal/mol are highlighted in red. VC is not applied to the (-
1) residue since it is neither a 4C1 nor
1C4 chair, so the ring is colored red and the
penalties are unlisted. ............................................................................................ 80
Figure 6.1 The replacement of the aromatic group in (A) by the group aliphatic group in
(B) in the study by Water et al. 154
in an interaction with a tetraacetylglucose
molecule led to a decrease in the interaction energy of the system. ..................... 85
Figure 6.2 The carbohydrate antigen from Salmonella stacking against two aromatic
amino acids, namely, a Tryptophan and a Tyrosine in the binding pocket of an
antibody Fab fragment. (PDB ID: 1MFE)33
......................................................... 87
Figure 6.3 Representation of CH/π interactions between β-D-Glucopyranose (βDGlcp)
and Phenylalanine. ................................................................................................ 88
Figure 6.4 The mathematical model (Lennard-Jones potential) used in this study to
describe the interaction between a CH-group and an aromatic moiety. ............... 90
Figure 6.5 Detection of CH/π interactions a.) An average position of the co-ordinates of
the atoms C2, O5 and O1 is determined. In order to find the vector C1H1, the
negative of the vector between points C1 and the average of atom positions C2, O5
and O1 (computed in (a.)) is determined. b.) The distance between the centroid of
the aromatic ring and the plane of the carbohydrate ring delineated by atoms O5,
C2, C3 and C5 is determined, dcenters (≤ 7Å). c.) The carbon atoms in the
carbohydrate ring are projected onto the aromatic ring plane and the distances
xv
between each of these projections and the centroid of the aromatic ring is
determined, dcp (≤ 2.5Å). Shown in green are the CH bond vectors pointing
towards the aromatic ring (scored), and shown in red are the CH bond vectors
pointing away from the aromatic ring (not scored). ............................................. 93
Figure 6.6 The effect of applying the CH/π interaction to the top-ranked pose produced
by VC1|2 before and after rescoring. Shown in green is the crystal ligand, in white
is the top-ranked pose before rescoring (PRMSD = 5.6Å) and in blue is the top-
ranked pose after rescoring (PRMSD = 0.9Å). ..................................................... 97
Figure 6.7 Model Systems used by Ringer et al. to quantify CH/π interactions using
quantum mechanical calculations ......................................................................... 99
Figure 6.8 a.) The individual interaction energy curves for the models (as described in
Figure 6.7) used by Ringer et al. 155
, alongside the average of the individual
curves. b.) The average curve (a) shown alongside the mathematical model used
in the current study.............................................................................................. 100
xvi
CONTENTS
ACKNOWLEDGEMENTS ................................................................................................ v
LIST OF TABLES ............................................................................................................ vii
LIST OF FIGURES ......................................................................................................... viii
1. Introduction ................................................................................................................. 1
2. Carbohydrates: Biological Significance and Structure ................................................ 4
3. Computational Methods/Molecular Docking ............................................................ 12
4. The Importance of Ligand Conformational Energies in Carbohydrate Docking:
Sorting the Wheat from the Chaff ..................................................................................... 19
Abstract ......................................................................................................................... 20
Introduction ................................................................................................................... 20
Methods......................................................................................................................... 23
Results and Discussion ................................................................................................. 30
Conclusions ................................................................................................................... 50
Individual Author Contributions ................................................................................... 51
5. Vina-Carb: Improving Glycosidic Angles During Carbohydrate Docking ............... 52
Abstract ......................................................................................................................... 53
Introduction ................................................................................................................... 54
Methods......................................................................................................................... 56
Results & Discussion .................................................................................................... 62
xvii
Conclusions ................................................................................................................... 81
Individual Author Contributions ................................................................................... 83
6. The Consideration of CH/π Interactions in Carbohydrate-Protein Docking ............. 84
Introduction ................................................................................................................... 84
Methods......................................................................................................................... 89
Results and Discussion ................................................................................................. 94
Conclusions ................................................................................................................... 97
Future Directions .......................................................................................................... 98
7. CONCLUSIONS ..................................................................................................... 101
8. REFERENCES ........................................................................................................ 103
9. Appendix ................................................................................................................. 127
Supplementary Information Chapter 4........................................................................ 127
Supplementary Information Chapter 5........................................................................ 138
Supplementary Information Chapter 6........................................................................ 149
1
1. INTRODUCTION
This dissertation can be sub-divided into the following sections:
1. The comparison of docking programs for carbohydrate docking and the
development of Carbohydrate Intrinsic (CHI) Energy Functions, which describe
the rotational preferences of oligosaccharides about the glycosidic linkage.
2. The development and evaluation of Vina-Carb, formed by incorporating the CHI
energy functions within the scoring function of AutoDock Vina, and comparison
to the original program, AutoDock Vina.
3. The development of a CH/π interaction energy term to score CH/π interactions in
protein-carbohydrate complexes and the application of the function to docked
protein-carbohydrate complexes.
The above topics, along with a literature review of background information and the
computational methods applied in each case are presented in the following manner:
CHAPTER 2: CARBOHYDRATES: BIOLOGICAL SIGNIFICANCE AND
STRUCTURE
Chapter 2 is a discussion on the structure and biological significance of carbohydrate and
protein-carbohydrate interactions.
CHAPTER 3: MOLECULAR DOCKING
Chapter 3 discusses the theory behind the molecular docking computational method to
predict intermolecular interactions. It further discusses the challenges associated with
carbohydrate ligands, and specifically describes the AutoDock Vina docking algorithm.
2
Additionally in this chapter, an introduction to the research described in the following
chapters is presented.
CHAPTER 4: IMPORTANCE OF LIGAND CONFORMATIONAL ENERGIES IN
CARBOHYDRATE DOCKING: SORTING THE WHEAT FROM THE CHAFF
Chapter 4 is an original research study, in which the performances of various versions of
the popular docking program, AutoDock is compared using a set of antibody-
carbohydrate complexes. A set of Carbohydrate Intrinsic (CHI) energy functions are
developed, which are used to describe the conformational preferences of glycosidic
linkages constituting oligosaccharides. The CHI energy functions are then employed to
rescore the docked poses. The results from this study was published as a journal article.
A. K. Nivedha, S. Makeneni, B. L. Foley, M. B. Tessier , R. J. Woods, J. Comput. Chem.
2014, 35, 526–539.
CHAPTER 5: VINA-CARB: IMPROVING GLYCOSIDIC ANGLES DURING
CARBOHYDRATE DOCKING
Chapter 5 describes original research in which the CHI energy functions were
incorporated within AutoDock Vina’s scoring function, leading to the development of
Vina-Carb. The performances of Vina-Carb and AutoDock Vina were evaluated using a
set of protein-carbohydrate complexes consisting of antibodies, lectins, carbohydrate
binding modules and enzymes. This work has been accepted for publication.
A. K. Nivedha, D. F. Thieker, R. J. Woods, J. Chem. Theory. Comput. 2015
3
CHAPTER 6: THE CONSIDERATION OF CH/Π INTERACTIONS IN
CARBOHYDRATE-PROTEIN DOCKING
Chapter 6 describes original research in which, utilizing available literature, a
mathematical model to score CH/π interactions in protein-carbohydrate complexes has
been developed and employed in rescoring docking results from AutoDock Vina and
Vina-Carb, for a test set consisting of lectin-carbohydrate complexes.
CHAPTER 7: CONCLUSIONS AND FUTURE DIRECTIONS
Chapter 7 summarizes the main conclusions from the preceding chapters and discusses
future directions.
4
2. CARBOHYDRATES: BIOLOGICAL SIGNIFICANCE AND STRUCTURE
Carbohydrates play a central role in energy metabolism, biological recognition
and as structural components in living organisms. 1-3
4-6
Carbohydrate-binding proteins
are required for transportation, degradation, biosynthesis, storage, antigen-binding and
signaling. 7,8
They may exist both as freestanding entities or covalently linked to
macromolecules such as proteins (glycoproteins) and lipids (glycolipids), frequently
found attached to the outer cell surfaces, where they are conveniently positioned to
modulate interactions between various components of the living system by mediating
cell-cell and cell-molecule interactions. 9 When oligosaccharides are organized in the
form of glycoconjugates, the mere size of the attached oligosaccharides influences the
interactions of the glycoconjugates with other molecules. For example, N-glycosylation
and O-glycosylation are common post-translational modifications which occur in
proteins. 10
11-14
, which protect the protein from degradation and in intracellular
trafficking and secretion. 2 Aberrant glycosylation is often a hallmark of diseases such as
rheumatoid arthritis 15-19
and cancer.20-23
Many carbohydrate-based host-pathogen interactions are currently known. 24
Surface polysaccharides are the most common structures found on the outer surfaces of
bacterial cells. 25,26
In gram negative bacteria, carbohydrates are found constituting the
lipopolysaccharides, lipooligosaccharides or capsular polysaccharides.27
The conjugation
of a polysaccharide to a carrier protein has resulted in the production of commercially
available vaccines such as those against Haemophilus influenzae 28
and Streptococcus
pneumoniae 29
Many bacterial and viral pathogens bind to host tissue via interactions
5
with carbohydrates on the surfaces of the host cell. Antibodies contain glycans as part of
their structure and some antibodies are reactive against sugars found on cell surfaces of
bacteria such as Shigella and Salmonella. 30-35
Of the four major classes of macromolecules found in living organisms, namely,
nucleic acids, proteins, carbohydrates and lipids, carbohydrates are the most structurally
diverse. 36
They are primarily defined as polyhydroxyaldehydes or polyhydroxyketones,
and in their simplest form exist as monosaccharides, which combine with each other via
glycosidic linkages forming oligosaccharides. Monosaccharides can exist in both the
open chain and ring forms. When the chain-form of the monosaccharide has a carbonyl
group (C==O) on one end which forms an aldehyde, it is called an aldose, whereas if this
carbonyl group is in the middle forming a ketone, it is referred to as a ketose. The ring
form of a monosaccharide, which is the preferred form in aqueous solutions and in
oligosaccharides, is formed when the oxygen on C5, i.e., O5 links with the carbon
comprising the carbonyl group (C1), transferring its hydrogen to the carbonyl oxygen
forming a hydroxyl group. This forms a chiral anomeric center at C1. The oxygen at C1
(O1) can be either axial or equatorial with respect to the carbohydrate ring. This
electronegative O1 atom prefers to adopt the axial orientation due to steric and
stereoelectronic effects, instead of the less hindered equatorial orientation which would
be expected to be the preferred orientation based on steric effects alone. This is known as
the anomeric, or more accurately, the endo-anomeric effect.
6
Figure 2.1. An illustration of the conversion from the chain and ring form of glucose.
Monosaccharides forming a five-membered ring are called furanoses and those
which form a six-membered ring are called pyranoses. Similar to cyclohexanes, 6-
membered monosaccharides exist most often in one of two isomeric chair conformations,
which are specified as 1C4 and
4C1, where the letter C stands for ‘chair’ and the numbers
indicate the carbon atoms above and below the reference plane of the chair conformation
formed by the atoms C2, C3, C5 and O5. (Figure 2.2)
chain form of glucose
anomeric carbon
α-glucopyranose β-glucopyranose
7
Figure 2.2 A representation of two chair conformations of Glucose, namely, 4C1 and
1C4.
The individual units constituting proteins and nucleic acids are generally
connected in a linear fashion by a single type of linkage, namely, the amide linkage
between amino acids in proteins and the 3’ to 5’ phosphodiester bonds in nucleic acids. 37
Oligosaccharides however, can be linear or branched and each monosaccharide unit can
be linked to another via a glycosidic linkage which can be if different types depending on
the stereochemistry of the C1 atom on the non-reducing sugar and that of the linking atom
on the reducing sugar. A disaccharide is formed when two monosaccharides combine via
a condensation reaction, resulting in the release of a water molecule and the formation of
a glycosidic bond. The formation of a glycosidic linkage results in the formation of a
reducing sugar on one end and a non-reducing sugar on the other.
4C11C4
8
Figure 2.3 A 1-3 glycosidic linkage formation between a glucopyranose (Glcp) unit and a
galactopyranose (Galp) unit. The D in the name refers to the molecule being
dextrorotatory, which refers to it rotating plane polarized light to the right.
Different kinds of sugars exist in nature and the main difference between most
saccharides is in the orientation of the hydroxyl groups with respect to the plane of the
carbohydrate ring, resulting in significant differences in the physical and chemical
properties of the sugars. Glucose and mannose are C2-epimers while glucose and
galactose are C4-epimers. (Figure 2.4) These hexoses have the molecular formula
C6H12O6. The stereoisomers for these aldohexoses were identified by the German chemist
Emil Fischer in the early 19th
century. 38
Figure 2.4 Carbohydrate epimers: galactose and glucose are C4 epimers, while glucose
and mannose are C2 epimers.
H2O
bDGlcp bDGalp bDGlc1-3bDGal
bDGalp bDGlcp bDManp
9
The three-dimensional structures of carbohydrates are greatly influenced by the
conformations of the glycosidic linkages connecting individual monosaccharide units.
The lone pair of electrons on the O5 atom of the sugar ring has a significant effect on the
conformational stability and orientation of the glycosidic linkage. 39,40
The anomeric
effect is observed in saccharides, due to which the electronegative substituent at the C1
position tends to adopt the axial orientation rather than the equatorial orientation in
contrast with expectations based solely on sterics. 41-46
From previous work analyzing the preferences of glycosidic bonds, it is clear that
carbohydrates most prefer a single rotamer at both the Φ and Ψ linkages. The preferred
range of glycosidic angle values is broader for the Ψ angle compared to the Φ linkage. It
is also known that some proteins distort the carbohydrate ring shapes, and consequently
the glycosidic linkages upon binding. A survey of the PDB for protein-carbohydrate
crystal complexes in which the oligosaccharide is bound to enzymes in addition to other
proteins such as lectins an antibodies, revealed that the distortion of glycosidic linkage by
binding partners of carbohydrates is a rare occurrence. 47,48
Carbohydrate-Protein Complexes
Proteins that bind to carbohydrates have a great diversity of binding site
topologies and functions, and include enzymes, lectins, antibodies and periplasmic
receptors. 49
Complex formation is driven primarily by hydrogen bonding, van der Waals
contacts, and hydrophobic interactions. 50
Whereas the former contributes to specificity,
51 by virtue of the directionality of the hydroxyl groups, the latter two contribute to
affinity through non-specific interactions. 52
Being highly polar molecules, sugars are
highly solvated in an aqueous solution. The hydroxyl groups in a sugar molecule are
10
involved in cooperative hydrogen bonds, bidentate hydrogen bonds and hydrogen
bonding networks. 53
Each hydroxyl group in a saccharide can engage in two kinds of
hydrogen bonds, as a donor of one hydrogen bond and an acceptor of two through the sp3
lone pairs. When the sugar hydroxyl group is a donor, the hydrogen bonds formed are
shorter or stronger than those formed when the sugar hydroxyl group is an acceptor. 54
In
cooperative hydrogen bonds, the hydroxyl group in the sugar acts as both a donor and
acceptor of hydrogen bonds. A bidentate hydrogen bond is formed when two adjacent
hydroxyl groups in a 4C1 sugar interact with a different atom of the same planar polar
side-chain residue. The presence of both cooperative and bidentate hydrogen bonds leads
to the creation of networks of hydrogen bonds between the sugars and interacting amino
acids. And when these planar polar residues hydrogen bond with nearby polar residues, it
results in the formation of a more elaborate hydrogen bond network. Hydrogen bonds
formed as a result are strong enough to stabilize the complex but are also weak enough to
accommodate ligand dynamics. Amino acids with polar planar side-chain groups, capable
of forming all three kinds of hydrogen bonds, such as Glu, Gln, Asp, Asn, Arg and His,
are abundant in the binding sites of sugars. 51
Van der Waals interactions make a significant contribution to protein-
carbohydrate complex-formation, in addition to contributions from other interactions
such as the stacking of the hydrophobic patches of carbohydrate rings against aromatic
amino acids lining the binding site. An analysis of protein-carbohydrate complexes in the
PDB has revealed that carbohydrate binding sites have a higher propensity for aromatic
amino acids namely, tryptophan, tyrosine, phenylalanine and histidine compared to the
rest of the protein. 55-57
The presence of aromatic amino acids in the sugar binding site
11
also contributes to specificity by allowing or disallowing particular sugar epimers
through the combination of steric hindrance and a favorable or unfavorable polar
environment. 58
A wealth of information can be gained from an understanding of the structure and
dynamics of protein-carbohydrate interactions, however, carbohydrates are extremely
flexible molecules 59
, making protein-carbohydrate complexes particularly challenging to
crystallize. As a result, computational methods such as molecular docking and molecular
dynamics simulations can be employed to gain insight into the physical and biochemical
properties carbohydrate molecules, both freely in solution and in complex with proteins.
The knowledge thus gained has various applications including gene therapy and the
design of carbohydrate-based biotherapeutic agents.
12
3. COMPUTATIONAL METHODS/MOLECULAR DOCKING
A detailed understanding of the three-dimensional structure and subsequently the
function of carbohydrates is vital in increasing our understanding of crucial biological
processes. However, obtaining experimental 3D structures of carbohydrates is a
challenge, 60
and as a result, theoretical modeling methods can be employed to aid in
understanding the relationship between the structure and function of oligosaccharides.
Molecular docking and molecular dynamics simulations are key computational
approaches used in the study of carbohydrate molecules. In this chapter we will focus on
molecular docking methodologies, specifically in relation to oligosaccharide ligands.
Molecular docking predicts the binding orientation and affinity of a small molecule
(ligand), with respect to a larger molecule (macromolecule). The area around the
predicted ligand binding site on the macromolecule is specified using a gridbox. The two
main steps in docking are searching and scoring. The search algorithm searches the
available conformational space for favorable binding modes of the ligand with respect to
the macromolecule, while the docking scoring function evaluates each pose generated by
the algorithm. During docking, a compromise between speed and effectiveness in
sampling the conformational space available has to be made. The program typically
produces several models at the end of a docking run, which are then ranked based on
calculated binding affinities.
There are different approaches to docking, such as rigid docking and flexible docking.
Figure 3.1 When all torsion angles are frozen during docking, it is termed as rigid
13
docking. During flexible docking, some if not all of these parameters are allowed to vary.
If upon complex-formation significant conformational change occurs in either the protein
or ligand or in both molecules, rigid docking is inadequate to model such a binding event.
In such cases, flexible docking should be the method of choice, which allows for induced
fit during complex formation. The level of computational complexity allowed during a
docking run can be set by the user, by adjusting the level of flexibility of the ligand and
macromolecule. Proteins can be docked rigidly, because, a comparison of experimental
protein-ligand complexes to their unbound counterparts has revealed that in most cases,
only a few side-chains in the active site of the protein change conformation.
1.
2.
n.
.
.
.
.
.
.
.
.
Macromolecule Ligand
Gridbox
Docked Complexes Ranked
according to Binding Affinities.a.)
14
Figure 3.1 a.) Rigid Docking b.) Flexible Ligand Docking
The application of a scoring function helps to assess protein-ligand
complementarity more than calculating binding affinity, as even non-binder ligands can
be docked and given a binding affinity score using molecular docking. However, docking
has proved to be an indispensable computational tool which helps in obtaining a 3D
starting structure for a bound protein-ligand complex, which could not be obtained
experimentally. It also helps to assess the binding of multiple small molecules against a
single protein target and compare binding affinities. Protein-ligand complementarity is a
prerequisite for binding to occur, but cannot be used as the sole criterion for evaluation.
Docking scoring functions evaluate how well the predicted binding pose of a
ligand complements the protein binding site, and can be empirical or knowledge-based
scoring functions. Empirical scoring functions operate on the assumption that binding
1.
2.
n.
.
.
.
.
.
.
.
.
Macromolecule Ligand
Gridbox
Docked Complexes Ranked
according to Binding Affinities.b.)
15
affinities can be evaluated by the summation of independent interaction energy terms,
which in most cases is a weighted sum of electrostatics, hydrogen bonding, hydrophobic
interaction and repulsion terms. The coefficients for the individual terms of the scoring
function are derived by fitting to experimentally determined Ki values of protein-ligand
complexes with solved crystal structures. In general, these scoring functions suffer from a
significant dependence on ligand size, i.e., greater the size of the docked ligand, greater
or better the calculated binding affinity. Knowledge-based scoring functions are derived
by performing a statistical analysis of experimentally-determined protein-ligand
complexes based on the assumption that if certain contacts occur at a statistically
significant rate, it must be favorable and vice versa.
Several parameters affect the performance of the docking scoring function,
including the physical and chemical properties of input molecules, the preparation of the
input and the individual terms of the docking scoring function. Docking scoring functions
are usually developed for the purpose of high-throughput virtual screening of relatively
small, rigid, drug-like molecules. In this thesis, we will study the performance of such
docking methodologies with respect to carbohydrate ligands, which are larger, more
flexible molecules ranging from a disaccharide to a dodecasaccharide connected by 1,x-
linkages (x = 2, 3, 4 or 6). Applying these generalized docking scoring functions to
carbohydrate docking usually leads to an unfavorable deviation of the carbohydrate
ligands from their natural conformations. It may be useful to customize docking scoring
functions to specifically dock carbohydrate ligands.
The glycosidic torsion angles connecting individual monosaccharide units have a
major influence on the overall conformation of an oligosaccharide ligand. Although these
16
linkages are generally flexible, this flexibility spans a limited range of preferred torsion
angles, which has been identified from a survey of carbohydrate crystal structures in the
PDB. 48
All protein-carbohydrate complexes found in the PDB were included in this
survey which consisted of carbohydrates both covalently and non-covalently interacting
with proteins such as lectins, antibodies, enzymes, carbohydrate binding modules, etc. In
the past, efforts have been made to model the conformational preferences of
carbohydrates into molecular docking; the approaches used include a re-calibration of an
existing docking scoring function to model carbohydrate properties, the inclusion of
additional interaction energy terms in the scoring function which are crucial to protein-
carbohydrate binding and the inclusion of a carbohydrate conformational energy score to
an existing docking scoring function.
In this thesis, the performances of a few docking programs are evaluated and
compared using a set of antibody-carbohydrate complexes with solved X-ray crystal
structures from the PDB. A standardized docking protocol for docking oligosaccharide
ligands onto antibodies has also been described. A set of energy functions which
calculate the conformational energies of carbohydrates has been derived using quantum
mechanical methods. These carbohydrate internal energy functions, known as
Carbohydrate Intrinsic (CHI) energy functions score a disaccharide molecule based on
the orientations of the glycosidic torsion angles. The CHI energies were then added to
docked energies, showing a significant improvement in the ranking of accurate binding
poses. Finally, the CHI energy functions were coded to constitute the docking program’s
(AutoDock Vina) scoring function leading to the development of Vina-Carb. The
performance of Vina-Carb was evaluated against a set of 72 protein-carbohydrate
17
complexes with solved crystallographic structures from the PDB, and compared to the
performance of the original docking program without the CHI energy functions,
AutoDock Vina.
For each AutoDock Vina docking job, multiple runs are started from random
conformations. The number of individual runs are determined by the exhaustiveness
parameter, which can be set by the user. Each run consists of a set of sequential steps,
which are determined heuristically based on the number of flexible bonds in the system
under study. Each step consists of 3 stages, namely a random perturbation of the system,
followed by a local optimization using the Broyden-Fletcher-Goldfarb-Shanno algorithm
and a selection step in which the step is either accepted or not. Each local optimization
involved numerous evaluations of the docking scoring function, and is decided based on
convergence and other criteria. Each run can produce multiple promising results, which
are stored, and finally merged, clustered and sorted to produce the final result of docked
poses. (Figure 3.2)
18
Figure 3.2 The workflow within the AutoDock Vina algorithm.
Run R1
Run R2
Run RN
Step S1
Step S2
Step SN
Random
Perturbation
Local Optimization
(BFGS)
AutoDock Vina Each Run, Ri
Each Step, Si
Evaluations
of Scoring
Function
Selection
Merged. Refined. Clustered. Sorted. Final Result
19
4. THE IMPORTANCE OF LIGAND CONFORMATIONAL ENERGIES IN
CARBOHYDRATE DOCKING: SORTING THE WHEAT FROM THE CHAFF
_____________________________
A. K. Nivedha, S. Makeneni, B. L. Foley, M. B. Tessier , R. J. Woods, J. Comput. Chem.
2014, 35, 526–539. Reprinted here with the permission of publisher.
20
Abstract
Docking algorithms that aim to be applicable to a broad range of ligands suffer reduced
accuracy because they are unable to incorporate ligand-specific conformational energies.
Here, we develop internal energy functions, Carbohydrate Intrinsic (CHI), to account for
the rotational preferences of the glycosidic torsion angles in carbohydrates. The relative
energies predicted by the CHI energy functions mirror the conformational distributions of
glycosidic linkages determined from a survey of oligosaccharide-protein complexes in
the Protein Data Bank. Addition of CHI energies to the standard docking scores in
Autodock 3, 4.2, and Vina consistently improves pose ranking of oligosaccharides
docked to a set of anti-carbohydrate antibodies. The CHI energy functions are also
independent of docking algorithm, and with minor modifications, may be incorporated
into both theoretical modeling methods, and experimental NMR or X-ray structure
refinement programs.
Introduction
Protein-carbohydrate interactions are crucial in numerous aspects of biology, including
metabolism, gene expression, cell-cell communication, growth, development, and
immune response 9. In vivo, complex carbohydrates (glycans) are found on cell surfaces
as glyconjugates (glycoproteins/glycolipids) or polysaccharides, mediating biological
function by their direct interaction with proteins, such as receptors (lectins), enzymes,
and antibodies. Cancer is marked by aberrant glycosylation which can serve as a disease-
related marker, or as a target for therapeutic intervention 22,61-63
. Conversely, endogenous
cell-surface glycans are frequently exploited by infectious agents, as in the
21
hemagglutinin-mediated adhesion of influenza A virus. 64-66
A physical understanding of
carbohydrate-protein interactions aids in the development of therapeutic agents designed
to block such interactions, 67-70
such as antibodies which target specific glycans. 71,72
A
better understanding of the immune system’s response to carbohydrate-based vaccines, 73-
76 facilitates the prediction and rationalization
71 of hazardous or misleading cross-
reactivities between antibodies against disease-related carbohydrates, and endogenous
glycans. 77,78
The challenges involved in obtaining co-complexed carbohydrate-protein structures using
experimental methods such as X-ray crystallography and NMR spectroscopy include,
production and purification of the protein, isolation or synthesis of the glycan, and co-
crystallization of the complex.60
Therefore, there is a long-standing interest in applying
theoretical modeling methods (automated docking) to aid in the characterization of the
3D structure of carbohydrate-protein complexes. 71,79-84
However, these methods also
have limitations. Automated docking faces the triple challenge of accurately predicting 1)
the ligand orientation in the binding site (pose); 2) the ligand conformation in the binding
site (shape); and 3) the relative affinity of the optimal pose (interaction energy). Ligand
internal energies are only approximately modeled within docking algorithms by mainly
considering energies associated with internal steric repulsion. Such an approximation
inherently degrades the accuracy of docking predictions as various ligand classes have
specific conformational properties. The glycosidic torsion angles between individual
monosaccharides forming glycans are crucial in defining their 3D structure and
dynamics. The accurate prediction of oligosaccharide conformations requires the
22
additional consideration of stereo-electronic properties responsible for the anomeric, exo-
anomeric, and gauche effects. 85
Their omission frequently leads to the incorrect
prediction of docked oligosaccharide conformations. 86-88
Docking programs treat interaction energy terms as empirically-adjustable components,
which may be tuned for a particular ligand class, such as carbohydrates. 89
Inclusion of
carbohydrate conformational energies in the docking energy function would likely
require reoptimization of the empirical weighting resulting in a non-transferable
carbohydrate-specific implementation of the algorithm. Alternatively, we wished to
develop a carbohydrate-specific conformational energy function which predicts
oligosaccharide energies independent of docking algorithm, and could potentially also be
employed to evaluate the conformational energies of experimentally-determined
oligosaccharide structures. We focused on modeling conformational properties intrinsic
to glycosidic linkages between pyranoses, with the criterion that the method should also
be generalizable to other carbohydrate ring forms, such as furanoses, as well as to other
linkages, such as 1-6, 2-3, 2-6, etc. Tetrahydropyran, and related analogs, have long been
employed as representative carbohydrates in quantum mechanical calculations for this
purpose. 90-97
The assumption being that any additional effects on the conformational
properties, for example from hydrogen bonding, overlay the intrinsic properties of the
linkages between pyran rings. Quantum mechanical calculations were employed on a set
of glycosidically-linked tetrahydropyrans representing all two-bond linkages between
pyranoses. The rotational energy profiles for these linkages were used to derive the
desired carbohydrate intrinsic (CHI) energy functions. Given a 3D oligosaccharide
23
structure, the CHI energy functions may be employed to estimate the energy arising from
any distortion of the glycosidic linkages, relative to their lowest energy conformations.
Because of the important roles of anti-carbohydrate antibodies in therapeutic and
diagnostic applications, and the challenges associated with experimentally defining their
3D structures, they have been the subject of numerous automated docking studies. 98-104
We chose six crystallographically-determined antibody-carbohydrate complexes to
evaluate the ability of CHI energy functions to improve predicted rankings of the docked
poses. These systems were selected based on the diversity of the antibody binding site
topologies (canyon, valley, crater), 105
and size variations of the carbohydrate ligands (tri-
to penta saccharides including linear and branched sequences).
Methods
System selection and docking protocol
Docking was performed using AutoDock 3.0.5 (AD3), 106
4.2 (AD4.2) 107
and Vina 1.1.2
(ADV). 108
Details of the reference systems, including PDB IDs, ligand sequences and
biological origin are presented in Table 4.1. In each case, the protein chain containing the
ligand with the lowest average B-factor was selected for docking. The carbohydrate
ligands in systems 1UZ8, 1S3K and 1M7I were built using the Carbohydrate Builder on
GLYCAM-Web (www.glycam.org). 109
The remaining ligands contain the non-standard
sugar residues abequose and 2-deoxy-rhamnose. Oligosaccharides containing these
deoxy residues were assembled using the tLEaP 110
module from the AMBER package
employing GLYCAM06i force field parameters and PREP residue structure files,
available for download at www.glycam.org (S4.11). The antibody structures were
24
obtained from the PDB (www.rcsb.org). 111
All protein and ligand files were prepared for
docking using AutoDock Tools 1.5.4 (ADT). 107
The choice of partial charge was based
on the method used to calibrate the scoring functions of the individual docking programs;
Kollman charges 112
were added to the protein for docking with AD3, while Gasteiger
charges 113
were used to prepare proteins for docking with AD4.2 and ADV, and in each
case Gasteiger charges were assigned to the ligands. AutoDock distributes any non-zero
residual net charge across the macromolecule. Hydrogen atoms were added to the protein
using ADT, whereas GLYCAM hydrogens were retained in the ligands. A standard grid
box (dimensions: 26.25 x 26.25 x 37.50Å) was employed for all runs, centered relative to
the complementarity determining regions (CDRs) of the antibody (Figure 1a). Before
docking, the ligand was translated to the center of mass (CoM) of the CDRs but
maintained in the default GLYCAM orientation and conformation. VMD 109
was used for
molecular visualization and image-rendering.
25
Table 4.1 PDB IDs and ligand sequences employed in the study, including the shape
RMSD (SRMSD) values for the ligands generated by GLYCAM, relative to the
crystallographic ligands.
PDB ID:
Chain ID
(Resolution)
a
Ligand
(average B-
factor)b
Graphic representation of
the ligand
SRMSD
a,c
Biological
Origin
1MFA69,d
:
L/H
(1.7)
DAbepα1-
3[DGalpα1-
2]DManpα-
OMe
(25.1)
0.6
Mus
musculus
1MFD70,d
:
L/H
(2.1)
DAbepα1-
3[DGalpα1-
2]DManpα-
OMe
(30.1)
0.5
Mus
musculus
1UZ871
:
A/B
(1.8)
DGalpβ1-
4[LFucpα1-
3]DGlcpNAc
β-OMe
(41.8)
0.3
Mus
musculus β 4
α3
26
1M7D72
:
A/B
(2.3)
LRhapα1-3(2-
deoxy)LRhap
α1-
3DGlcpNAcβ
-OMe
(39.8)
0.3
Mus
musculus
1S3K73
:
L/H
(1.9)
LFucpα1-
2DGalpβ1-
4[LFucpα1-
3]DGlcpNAc
α-OH
(26.6)
0.4
Homo
sapiens, Mus
musculus
1M7I72
:
A/B
(2.5)
LRhapα1-
2LRhapα1-
3LRhapα1-
3DGlcpNAcβ
1-2LRhapα-
OMe
(35.4)
1.1
Mus
musculus
α 3α 3
β 4
α3
α2
α 3α 3α 2 β 2
= Mannose (Man) = Galactose (Gal) = Fucose (Fuc) = 2-Deoxy Rhamnose
= Abequose (Abe) = N-Acetyl Glucosamine(GlcNAc) = Rhamnose (Rha) = Aglycon (OME/OH)
27
aIn Å.
bIn Å
2.
cSRMSD defined in Section Shape, and pose, RMSD values.
d1MFA and
1MFD, consisted of the trisaccharide antigen from Salmonella serotype B. In 1MFD, the
trisaccharide is bound to a Fab antibody fragment, while in 1MFA the trisaccharide is
bound to a single-chain Fv fragment of the antibody. Although the antigen-binding site in
both the Fab and scFv fragments are essentially the same, and bound to the same
trisaccharide antigen, in the Fv-complex a water molecule has become inserted into an
internal hydrogen bond within the trisaccharide, leading to a perturbation of the
trisaccharide conformation.
In all ligands, the hydroxyl groups and glycosidic torsion angles were defined as
being flexible, while the C5-C6 bonds were restrained at the orientation present in the
reference crystal structures. The protein was maintained rigid. In AD3 and AD4.2, 100
runs of the Lamarckian Genetic Algorithm were employed, with 800,000 energy
evaluations per run, and a population size of 200. The translation step size was 2Å, while
the quaternion and dihedral step sizes were each 50°. The ADV source code was
modified to increase the total number of output structures from 20 to 100 (Supplementary
Material, S4.1). The maximum energy difference between the best and worst binding
modes was set at 10 kcal/mol while the exhaustiveness value was 8. The complete set of
docking parameters used is given in S4.2, S4.3 and S4.4.
Antibody and docking grid box alignment
Consistent grid box placement on the CDRs was achieved by positioning the box
relative to three points defined by specific CoM’s within the CDRs. The CDRs were
28
identified using the AbM definition, 114,115
based on both the Kabat 116
and Chothia 117
numbering schemes. To ensure consistent orientation of the antibody surface relative to
the box grid points, the protein coordinates were transformed with respect to a set of
internal coordinate axes, as shown in Figure 4.1. This protocol removes any issues arising
from the fact that the grid is cubic and not spherical, which can otherwise result in varied
regions of each antibody being included within the grid.
Figure 4.1 (a) Illustration of an antibody with its variable fragment (Fv) aligned to the
grid box. The yellow dot represents the CoM of the CDRs (0,0,0), and the green dot
represents the center of the grid box (0,0,11). (b) Aligned orientation of an antibody
antigen-binding fragment (Fab), with respect to the internal reference axes. The region in
red + pink represents the VH domain (CDRs (red) and framework regions (pink) of the
heavy chain) of the antibody, while the region in blue represents the VL domain (CDRs
(dark blue) and framework regions (cyan) of the light chain). The X-axis for the
alignment was defined by a vector passing through the CoM of the variable light chain
29
(VL domain, which contains the light chain CDRs and framework sequences), and the
CoM of the variable heavy chain (VH domain). The Z-axis was defined as a vector
normal to the X-axis, and passing through the CoM of the entire variable region, or
variable fragment (Fv). The antibody was then translated so that the CoM of the CDRs
was placed at the origin. The Y-axis was defined as a vector perpendicular to the XZ-
plane, and passing through the origin. The docking grid box was aligned to the internal
co-ordinate axes with its center offset from the origin by 11Å along the Z-axis, so as to
optimally encompass the CDR loops, while also permitting adequate volume for the
movement of the ligand during docking. Such a definition enabled the docking grid box
to be consistently aligned with respect to the CDRs.
Quantum mechanical calculations
Quantum mechanical calculations were performed using Gaussian09. 118
Structures were optimized at the HF/6-31G++(2d, 2p) level of theory, and single-point
energies calculated at the B3LYP/6-31G++(2d, 2p) level, consistent with the approach
used in the GLYCAM force field development. 94
Rotational energy profiles were
computed at 15° increments, allowing complete relaxation of other coordinates.
Shape, and pose, RMSD values
Pose RMSD (PRMSD) values were obtained by calculating the RMSD between
the ring atoms of the crystal ligand maintained in its native co-crystallised position, and
the corresponding ring atoms in the docked ligand maintained in its docked position
(Figure 4.2a). A pose with a PRMSD ≤ 2Å was considered to have been successfully
docked. Shape RMSD (SRMSD) values were obtained by first superimposing the crystal
and docked ligands followed by calculating the RMSD between their respective ring
30
atoms (Figure 4.2b). The SRMSD is a quantification of the dissimilarity in the 3D
conformations of the docked and crystal ligands, irrespective of their relative positions on
the protein surface.
Figure 4.2 PRMSD and SRMSD calculation. Shown in (a) and (b) are the PRMSD and
SRMSD, respectively, of a representative docked pose with respect to its crystal ligand.
(a) The PRMSD is the RMSD between the ring atoms of a representative docked
structure (white) and the corresponding crystal structure (black). (b) The SRMSD is the
RMSD value obtained after the docked structure (white) is superimposed on the crystal
structure (black).
Results and Discussion
Assessment of current docking methodologies
The six ligands extracted from their co-crystal structures could successfully be
docked back rigidly into the same structure of the protein (results not shown); this is an
outcome observed previously in studies of carbohydrate-protein docking. 103,119
Although
necessary, this docking experiment is not a sufficient prerequisite for any docking
method, since both molecules in a co-crystallized complex are already in the correct
conformation for binding, and do not require induced fit to occur during docking.
Pose RMSD = 5.5Å Shape RMSD = 1.1Åa
b
31
Independently-generated oligosaccharide 3D structures were employed as ligands
to test the performance of the docking methodologies in predicting bound conformations
of unknown carbohydrate-protein complexes. These starting structures were generated
using GLYCAM, known to produce low-energy conformations of carbohydrates; the
structures generated were found to be essentially equivalent to the same ligands found in
the co-crystal structures, as indicated by their SRMSDs (Table 1), and by a comparison of
their glycosidic torsion angles (S4.5). The average SRMSD between the crystallographic
ligands and theoretical structures was 0.53Å. The preliminary SRMSD analysis also
showed that the ligand in each antibody complex adopted a low energy conformation,
similar to that expected for the free ligand.
A second requirement for a general docking protocol is to permit the ligands a
reasonable level of freedom by allowing their glycosidic torsion angles and hydroxyl
groups complete flexibility. This approach enables comparisons to be made between
structures of the experimental and theoretical ligands, facilitating an assessment of the
impact of induced fit in the ligand on the outcome from docking analysis.
After docking, the φ (O5’-C1’-Ox-Cx) and ψ (C1’-Ox-Cx-Cx-1) glycosidic torsion angles
of the docked poses (Figure 4.4-I) were measured, and compared to the torsion angles of
corresponding linkages in the experimental co-crystal structure, and in the initial
GLYCAM theoretical structure. The analysis indicated that the distribution of the torsion
angle values amongst the docked poses frequently deviated considerably from both the
crystal and GLYCAM reference values (S4.5). Five examples of this analysis are
highlighted in Figure 4.3. Presented in Figure 4.3a is an instance in which all three
docking programs identified the lowest energy pose correctly (that is, with the glycosidic
32
angles falling within 30° of the corresponding torsion angles in the crystal structure).
Presented in Figure 4.3b, c, and d are cases in which only one of the docking programs
identified the correct pose, and finally an example is shown in which all three programs
failed to produce the correct torsion angles (Figure 3e). All of the methods were able to
generate some number of conformations that were within 30° of the crystallographic φ
and ψ values, however, these were often not the poses that had the best docking energy.
Thus, in a routine application of docking, they would not be identified as the most likely
(highest-ranked) pose. Overall, a very broad range of torsion angles (and therefore 3D
shapes) were generated by each algorithm, indicating a potential opportunity to employ a
conformational energy function as an additional filter to identify unlikely conformations
in the docking data.
33
Perc
enta
ge
of
stru
ctu
res
0
10
20
30
40
50
0
10
20
30
40
50
60
70
0
10
20
30
40
50
60
70
80
0
10
20
30
40
50
60
0
10
20
30
40
50
60
70
80
0
10
20
30
40
50
60
Expt.: 76.1
Expt.: 277.3 Expt.: 260.6
Expt.: 220.6
Expt.: 71.5 Expt.: 224.9
a
b
c
1UZ8
1MFD
1MFA
Perc
enta
ge
of
stru
ctu
res
φ ψ
0
10
20
30
40
50
0
10
20
30
40
50
60
0
10
20
30
40
50
φ [30 deg bins]
0
10
20
30
40
50
60
ψ [30 deg bins]
Expt.: 282.2 Expt.: 256.6
Expt.: 269.8 Expt.: 53.4
d
e
1S3K
1M7I
34
Figure 4.3 The φ and ψ angle distributions from 100 docked structures, for selected
linkages, as indicated by the dashed rectangle. Data are presented, in order, for AD3
(black bars), AD4.2 (white bars) and ADV (grey bars). The bin containing the
experimentally-determined values is highlighted with a light blue outline. The bin
containing the structure with the lowest docked energy is indicated as follows: AD3,
yellow; AD4.2, orange; ADV, green.
Development and validation of the CHI energy functions
Quantum mechanical conformational energies for a variety of model
disaccharides were obtained by employing tetrahydropyran (THP) as the minimal model
of a carbohydrate ring. Two THP molecules were used to model each glycosidic linkage
(1-2, 1-3 and 1-4) between pyranoses in the 4C1 and
1C4 configurations. Given that there
are two anomeric configurations (α and β), and two hydroxyl configurations (axial (ax)
and equatorial (eq)), associated with each linkage, the development of each CHI energy
function required the analysis of the glycosidic rotational energies of at least four
structures per linkage. For example, the different models used in modeling the 1-3
linkage are presented in Figure 4.4.
35
Figure 4.4 Representation of the 8 model disaccharides pertinent to the development of
CHI energy functions. The models depicting 1,2-linkages can be used to model 1,4-
linkages due to symmetry about the O5 atom.
Individual rotational energy profiles were determined for both the φ (O5’-C1’-Ox-
Cx) and ψ (C1’-Ox-Cx-Cx-1) glycosidic torsion angles of the various disaccharide models
(Figure 4.5). A similar approach has been employed by A. D. French to examine the
properties of various disaccharides and disaccharide analogs. 96,98,112,120
Models with
similar local symmetries gave rise to similar torsional energy profiles and were grouped
together. Average energy curves were then obtained for each group. Based on similar
energy profiles, two average energy curves for the Φ-linkage were computed: one, for all
models with an α-linkage (Figure 4.5a), and the other for all models with a β-linkage
(Figure 4.5b). Similarly, two average curves for the Ψ-linkage were computed, based on
division of the linkages into the following two groups: 1-2ax, 1-4ax, 1-3eq (Figure 4.5c);
and 1-2eq, 1-4eq, 1-3ax (Figure 4.5d).
I II
V VI
φψ
III IV
VII VIII
(eq)
(ax)
(ax)
(ax) (ax)
(ax)
(eq)
(eq)
(eq)(eq)
(eq)
(eq)
(eq)
(ax)
(ax)
(ax)
36
Figure 4.5 Individual (dashed lines) and average (solid line) rotational energy curves for
models (see Figure 4.4) whose linkages have similar local geometries.
The CHI energy functions (S4.6) were generated by fitting Gaussian expansions
(Eqn 4.1) to the average energy values for each of the curves in Figure 5 using the default
fitting routine in Gnuplot ver. 4.0 113
:
𝑓(𝑥) = ∑ 𝑎𝑖𝑁𝑖=1 𝑒
−(𝑥−𝑏𝑖)
2
𝑐𝑖
(Eqn 4.1)
where, N is the number of individual Gaussian functions used for each CHI energy
equation, x refers to the glycosidic torsion angle (φ or ψ), and ai, bi, and ci refer to the
37
magnitude, width, and mid-point of the distribution respectively. All curves (S4.7) were
adjusted to a minimum value of 0 kcal/mol, and may therefore be considered
conformational energy penalty functions. In order to apply the energy curves shown in
Figure 4.5 to linkages containing L-sugars, it is simply necessary to employ the mirror
images of the relevant energy curve.
The experimental distribution of glycosidic angles in carbohydrate-protein crystal
structures in the PDB provides an independent metric for comparison with the predicted
CHI energies. Glycosidic torsion angle data for over 13,000 glycosidic linkages were
extracted using the GlyTorsion web-tool 121
(S4.8), binned, and plotted against the
corresponding CHI energy curves (Figure 4.6). The comparison leads to the important
conclusion that the majority of proteins that recognize oligosaccharides select low energy
(solution-like) conformations of the glycosidic linkage. This has considerable importance
for carbohydrate docking, as it supports the view that biasing selection toward low energy
linkage conformations should enhance the likelihood of correct pose prediction.
38
Figure 4.6 Comparison of the CHI energy functions (solid line) to the glycosidic torsion
angle distributions of carbohydrates from experimental co-crystal structures (histograms).
Refinement of the docking results using the CHI energy functions
An assessment of the performance of each of the docking algorithms can be made
by plotting the difference between the conformations of the ligands, relative to that in the
co-complex (SRMSDs), against the predicted interaction energies. Ideally, poses with
correct ligand shapes should have lower interaction energies than seen for incorrect
shapes. Plots of interaction energy versus SRMSD were generated for AD3, AD4.2 and
ADV (Figure 4.7), and the coefficient of determination (R2) computed by linear
regression. In each case, only weak linear relationships between ligand shape and
0 60 120 180 240 300 360
0
2
4
6
8
10
12
02468
10121416
0 t
o 4
25
to 2
9
50
to 5
4
75
to 7
9
10
0 t
o 1
04
12
5 t
o 1
29
15
0 t
o 1
54
17
5 t
o 1
79
20
0 t
o 2
04
22
5 t
o 2
29
25
0 t
o 2
54
27
5 t
o 2
79
30
0 t
o 3
04
32
5 t
o 3
29
35
0 t
o 3
54
φ [deg]
ΔE
[k
ca
l/m
ol]
Perc
en
tag
e o
f st
ructu
res
φ [5 deg bins]
0 60 120 180 240 300 360
0123456789
02468
10121416
0 t
o 4
25
to 2
9
50
to 5
4
75
to 7
9
10
0 t
o 1
04
12
5 t
o 1
29
15
0 t
o 1
54
17
5 t
o 1
79
20
0 t
o 2
04
22
5 t
o 2
29
25
0 t
o 2
54
27
5 t
o 2
79
30
0 t
o 3
04
32
5 t
o 3
29
35
0 t
o 3
54
φ [deg]
ΔE
[k
ca
l/m
ol]
Perc
nta
ge o
f st
ructu
res
φ [5 deg bins]
0 60 120 180 240 300 360
0
1
2
3
4
5
6
02468
10121416
0 t
o 4
25
to 2
9
50
to 5
4
75
to 7
9
10
0 t
o 1
04
12
5 t
o 1
29
15
0 t
o 1
54
17
5 t
o 1
79
20
0 t
o 2
04
22
5 t
o 2
29
25
0 t
o 2
54
27
5 t
o 2
79
30
0 t
o 3
04
32
5 t
o 3
29
35
0 t
o 3
54
ψ [deg]
ΔE
[k
ca
l/m
ol]
Perc
en
tag
e o
f st
ructu
res
ψ [5 deg bins]
a b
c d0 60 120 180 240 300 360
0
1
2
3
4
5
6
02468
10121416
0 t
o 4
25
to 2
9
50
to 5
4
75
to 7
9
10
0 t
o 1
04
12
5 t
o 1
29
15
0 t
o 1
54
17
5 t
o 1
79
20
0 t
o 2
04
22
5 t
o 2
29
25
0 t
o 2
54
27
5 t
o 2
79
30
0 t
o 3
04
32
5 t
o 3
29
35
0 t
o 3
54
ψ [deg]
ΔE
[kca
l/m
ol]
Perc
en
tag
e o
f st
ructu
res
ψ [5 deg bins]
VII, VIII, VI, V III, IV, I, II
VII, III, V, II VIII, IV, VI, I
39
interaction energy were observed (R2 ≤ 0.19), and in the case of ADV a slight negative
slope was observed. Following rescoring of the docked poses by addition of the CHI
energy from each glycosidic angle to the docked energy of the structure, a clear
enhancement of the R2 values was observed, across all three programs (0.60 ≤ R
2 ≤ 0.68).
It should be reiterated here that none of the three docking algorithms include internal
rotational energies (torsion terms), and at best account for ligand internal energies in a
general steric sense. In the case of glycosidic linkages, this internal energy was found to
be less than approximately 0.2 kcal/mol. Thus, while some double counting of internal
energy is introduced by adding the CHI energy directly to the total docking energy, it
does not result in a significant error.
Figure 4.7 Scatter plots demonstrating improvement in the linear correlation between
SRMSD and docked energies after rescoring, for each of the three docking programs.
Points before rescoring are shown in dark grey and points after rescoring are shown in
light grey. Shown in the insets are SRMSD vs. docked energy plots of only the overall
lowest PRMSD structure for each of the six antibody systems before (dark grey) and after
R² = 0.09
R² = 0.60
-20
-10
0
10
20
30
40
50
60
0 2 4 6
AD3
R² = 0.19
R² = 0.68
-20
-10
0
10
20
30
40
50
60
0 2 4 6
AD4.2
R² = 0.12
R² = 0.66
-20
-10
0
10
20
30
40
50
60
0 2 4 6
ADV
SRMSD [Å]
En
erg
y[k
cal/
mo
l]
-20
0
20
0 1 2 3
-20
0
20
0 1 2 3
-20
0
20
0 1 2 3
40
(light grey) rescoring. The black rectangles in all insets enclose plot areas with SRMSD ≤
1 Å and energies ≤ 0 kcal/mol.
Prior to inclusion of the CHI energies, all poses from AD3 and ADV and a
majority of those from AD4.2 were predicted to have favorable (negative) interaction
energies; a result of the nearly horizontal slope of the SRMSD-versus-interaction-energy
curves. Addition of the CHI energies led to positive slopes and frequently unfavorable
interaction energies (positive) for high-energy ligand conformations. Therefore, an
intuitive interaction energy cut-off of 0 kcal/mol could be defined as a convenient filter
for eliminating the most unlikely structures.
For all six antibody complexes, the poses that are most similar to the co-crystal
(lowest PRMSD poses) also have CHI-adjusted interaction energies ≤ 0 kcal/mol, with
the single exception being the AD4.2 results for 1M7I (Figure 4.7b). All 100 docked
poses of that pentasaccharide received positive rescored interaction energies, reflecting
the sub-optimal quality of the conformations produced by AD4.2 for this system. In this
case, the pose closest to the co-complex displayed a PRMSD = 3.4Å, and a CHI-
corrected interaction energy of 14.7 kcal/mol; rescoring can’t correct for the absence of a
correct pose. Thus, the addition of the CHI energy to the docked energy scores provides
a cutoff (0 kcal/mol), below which all poses may be considered possible binders.
Presented in Figure 4.8, are the φ and ψ torsion angles for the docked poses from
all 6 antibody-carbohydrate systems, overlaid onto the corresponding CHI energy curves.
They provide a clear indication that the docking algorithms sample a disproportionately
large number of high-energy ligand conformations, particularly evident for AD4.2 and
ADV. Several low energy regions, particularly for the ψ angles, are also not well-
41
represented. In quantitative terms, for AD3 >45% of the poses contain ligands with at
least one bond in a high energy conformation (CHI energies > 2 kcal/mol); the numbers
for AD4.2 and ADV being 73 and 77 %, respectively.
Figure 4.8 Graphs showing the distribution of conformations produced by AD3 ( ),
AD4.2 ( ) and ADV ( ) plotted onto the corresponding CHI energy curves for each of
the representative linkage combinations; the curves are offset from each other by 6
kcal/mol.
Pose ranking after including the CHI energy:
In 9 of the 18 cases (6 antibodies x 3 docking algorithms), the top-ranked pose
remained the same before and after inclusion of the CHI energies (Figure 4.9), with an
0
2
4
6
8
10
12
14
16
18
20
22
0 60 120 180 240 300 360
ΔE
[k
cal/
mo
l]
φ [deg]
0
2
4
6
8
10
12
14
16
18
20
22
24
26
0 60 120 180 240 300 360
ΔE
[k
cal/
mo
l]
φ [deg]
0
2
4
6
8
10
12
14
16
18
20
0 60 120 180 240 300 360
ΔE
[k
cal/
mo
l]
ψ [deg]
0
2
4
6
8
10
12
14
16
18
20
0 60 120 180 240 300 360
ΔE
[k
cal/
mo
l]
ψ [deg]
VII, VIII, VI, V III, IV, I, II
VII, III, V, II VIII, IV, VI, I
0
2
4
6
8
10
12
14
16
18
20
22
0 60 120 180 240 300 360
ΔE
[k
cal/
mo
l]
ψ [deg]
0
2
4
6
8
10
12
14
16
18
20
22
0 60 120 180 240 300 360
ΔE
[k
cal/
mo
l]
ψ [deg]0
2
4
6
8
10
12
14
16
18
20
22
0 60 120 180 240 300 360
ΔE
[k
cal/
mo
l]
ψ [deg]
42
average SRMSD of 0.3Å. That the ranking of these poses did not change is unsurprising,
given that inclusion of the CHI energy function does not greatly alter the interaction
energy if the ligand is already in a low-energy conformation. However, in 7 of the 9
remaining cases, the SRMSD of the top-ranked pose improved by an average of 0.8Å,
after rescoring and reranking.
Prior to rescoring, from the 100 docking runs, poses with PRMSDs ≤ 1Å were
obtained in 17 out of the 18 cases, however, they were not necessarily lowest energy
poses, highlighting the challenge in recognizing a correctly docked pose amongst all
poses produced by a docking run. The impact of the CHI energy on the ability of
docking to both produce a correctly docked pose and rank it as the lowest energy
structure is indicated in terms of PRMSDs in Figure 4.9b. In several instances in which
the lowest energy pose produced by the docking program was incorrect (PRMSD > 2Å),
reranking after including the CHI energy led to lowest energy structures having both
PRMSD and SRMSD < 1Å.
43
Figure 4.9 a) SRMSDs of the lowest energy poses for all six systems from AD3, AD4.2
and ADV, before (dark grey) and after (light grey) rescoring. (b) PRMSDs of the lowest
energy poses for all six systems from all three docking programs, before (dark grey) and
after (light grey) rescoring.
The impact of rescoring on the conformations (SRMSDs) and orientations
(PRMSDs) of the top-ranked poses are presented for several examples in the following
section. Docking of the tetrasaccharide ligand onto the 1S3K antibody, using AD3
0.7
0.20.3
0.9
1.2
0.70.7
0.20.3
0.50.4
0.8
0.6
1.1
0.2
1.6
0.4
2.8
0.60.4
0.20.1
0.4
1.5
0.4
1.1
0.2 0.20.3
1.2
0.60.5
0.2 0.20.3
1.1
1.8
5.4
1
2.6
5.7
1
1.8
5.4
0.8
2.4
0.61.1
1.6
5.5
0.5
5
0.6
3.9
1.6
5.5
0.5 0.5 0.6
10.7
0.5
5.5
0.4 0.3 0.3
1.31.51
0.4 0.3 0.3
2
AD3
AD4.2
ADV
SR
MS
D o
f th
e lo
wes
t en
ergy
pose
[Å]
PR
MS
D o
f th
e lo
wes
t en
erg
y p
ose
[Å]
a b1MFA 1MFD 1S3K1UZ8 1M7D 1M7I 1MFA 1MFD 1S3K1UZ8 1M7D 1M7I
44
(Figure 4.10), and docking of the trisaccharide ligand onto the 1M7D antibody, using
AD4.2 yielded top-ranked poses with PRMSDs > 5Å. Both these structures obtained high
CHI energy scores of 7.0 kcal/mol and 11.6 kcal/mol, respectively. The lowest energy
poses after reranking had PRMSDs of 0.6Å (1S3K/AD3), and 0.5Å (1M7D/AD4.2), with
lower CHI energies of 1.0 kcal/mol and 0.9 kcal/mol respectively.
Figure 4.10 (a) AD3 lowest energy pose for 1S3K before rescoring (white) compared to
the crystal ligand (black); PRMSD = 5.7 Å. (b) Lowest energy pose after inclusion of the
CHI energy (white) compared to the crystal ligand (black); PRMSD = 0.6 Å.
Prior to rescoring, lowest energy structures obtained for 1MFD from all three
programs had PRMSDs > 5Å, with CHI energies > 4 kcal/mol for the poses from AD4.2
and ADV, and 1.3 kcal/mol for the pose from AD3 (Figure 4.11a, S4.9). After rescoring,
the lowest energy pose from AD3 remained unchanged, whereas, the corresponding pose
from AD4.2 was replaced by a pose with a lower CHI energy score, however, the newly
top-ranked pose still had a high PRMSD. Even though rescoring did not result in
a PRMSD = 5.7Å b PRMSD = 0.6Å
45
correctly docked lowest energy poses in either of these cases, it improved the overall
ranking of the lowest PRMSD structures (PRMSDs < 1Å) from 18 to 9 in AD3, and 13 to
2 in AD4.2 (S4.10). It should also be noted that the second lowest energy pose in AD3
(PRMSD = 1Å) remained unchanged in ranking after rescoring. In contrast, the relatively
high CHI energy score of the lowest energy pose from ADV contributed to this pose
being replaced by a correctly docked structure, with a lower CHI energy score, after
rescoring (Figure 4.11b, S4.9, S4.10).
The ligand in 1MFD is a branched trisaccharide comprised of mannose (Man),
galactose (Gal), and abequose (Abe). Abe is an analog of Gal (3,6-dideoxyGal), and the
anchoring residue for the trisaccharide in the crystal structure 32
(Figure 4.11c). An
examination of the docking results indicated that all three docking programs consistently
generated better scores for poses in which the Gal residue replaces Abe in the binding site
(Figure 4.11d), with little increase in the SRMSD for the incorrect pose. That is, the
trisaccharide can fit equally well into the binding site in the two possible orientations
effectively flipped by 180°. The theoretical preference for Gal in the binding site appears
to be a consequence of its ability to make additional hydrogen bonds with the protein
relative to the more hydrophobic Abe. This observation suggests that the balance
between contributions from hydrogen bonding versus hydrophobic interactions is
imperfect in these docking algorithms. In addition, the 1MFD crystal structure reveals
the presence of a water molecule within the binding pocket, mediating hydrogen bond
interactions between the antibody and the ligand’s Abe residue. Given that explicit
waters are not generally included in docking studies, the algorithms may be
compensating for their absence by placing the more polar Gal inside the binding pocket.
46
This conclusion is supported by the observation that one of the hydroxyl groups of the
Gal residue (O-4) occupies a position in close proximity to this water molecule (PDB
residue name: WAT 601) originally found in the crystal complex (Figure 4.11d).
The flipping of the carbohydrate ligand that was observed in 1MFD, was not
observed in the case of its scFv counterpart (1MFA); instead, all three lowest energy
poses (AD3; AD4.2; ADV) for 1MFA had orientations similar to that of the crystal
ligand (PRMSDs < 2 Å). Since the ligands being docked to both antibodies are identical,
we can infer that the two binding sites are not identical (Table 1). To facilitate a better
understanding of the difference between the two binding pockets, their volumes were
calculated using Fpocket 122
; the volume of the 1MFA binding pocket was calculated to
be 423.01Å3, while that of 1MFD was 582.51Å
3. The 1MFD binding pocket, being
150Å3 larger, is able to accommodate the flipped orientation of the Gal residue, whereas,
the smaller 1MFA binding pocket is not as accommodating of this ligand orientation, due
to possible steric clashes. This potential steric clash was confirmed by superimposing the
coordinates of the Gal residue onto those of Abe in 1MFA (Figure 4.11e, f).
47
ba
PRMSD = 5.5Å PRMSD = 1.0Å
GAL
ABE
O4
d
WAT 601:O
GAL
ABE
c
WAT 601:O
fe O3
O6
48
Figure 4.11 Docking the trisaccharide to the Salmonella antibody (in 1MFD and 1MFA).
(a) Lowest energy pose from ADV for 1MFD before rescoring (white) compared to the
crystal ligand (black); PRMSD = 5.5Å. (b) Lowest energy pose from ADV for 1MFD
after rescoring (white) compared to the crystal ligand (black); PRMSD = 1.0Å. (c) and
(d) show the 1MFD antibody in transparent surface representation along with the oxygen
atom belonging to the water molecule from the crystallographic co-complex, WAT 601;
in (c) the crystal ligand from 1MFD is shown in CPK representation, and in (d) the
lowest energy pose from ADV for 1MFD before rescoring (in CPK representation)
showing the Gal residue replacing Abe within the binding pocket is shown. (e) The Gal
residue from the ligand in 1MFD (in van der Waals representation) after being
superimposed onto the Abe residue from the ligand in 1MFA is shown within the 1MFA
binding site. A cross-section of the 1MFA antibody is represented as a transparent surface
with potential steric clashes visible between the Gal residue and the antibody. (f) Same as
(e) but with the 1MFA antibody represented as an opaque surface thus more clearly
depicting potential steric clashes between the O-3 and O-6 groups of the Gal residue and
the interior of the binding pocket.
The known challenge associated with docking large, flexible molecules using
AD4.2 108,123
was encountered with the linear pentasaccharide ligand in 1M7I. None of
the 100 poses were correctly docked (all PRMSDs > 2Å); the lowest energy pose had a
PRMSD of 3.9Å and a CHI energy of 18.5 kcal/mol (Figure 12a). After rescoring, the
lowest energy pose had a considerably improved CHI energy score of 4.3 kcal/mol,
however, it still had a high PRMSD (Figure 12b). It has been suggested that the
maximum number of rotatable bonds be limited to 10 when employing AD4.2.123
The
49
ligand in 1M7I has nearly double that number at 19, making this quite a challenging
system to dock using AD4.2. In AD3, although only 4 of the 100 output poses were
correctly docked, they occupied the top 4 ranks, before and after rescoring. In ADV, 7 of
the 100 output poses were correctly docked, of which 5 were amongst the 8 top-ranked
poses, before and after rescoring. Although both AD3 and ADV seem to have had
difficulty in finding the correct pose for the pentasaccharide, whenever such a pose was
found, both programs scored them favorably. As these poses also had low SRMSD
values, they were identified as lowest energy poses after rescoring.
Figure 4.12 Docking to the antibody in 1M7I using AD4.2. (a) Lowest energy pose
before rescoring (white) compared to the crystal ligand (black); PRMSD = 3.9Å. (b)
Lowest energy pose after rescoring (white) compared to the crystal ligand (black);
PRMSD = 10.7Å.
a PRMSD = 3.9Å b PRMSD = 10.7Å
50
Conclusions
A solution to a major challenge encountered in flexible carbohydrate docking has
been presented in this study by the development of intrinsic energy terms for
carbohydrates, which quantify the relative energy of their glycosidic torsion angles. In 7
of the 18 cases (6 systems x AD3/AD4.2/ADV), the lowest energy poses generated by the
docking programs had PRMSDs > 2Å, however, after rescoring using the CHI energy
functions, the PRMSDs in 4 of the 7 cases improved, with correctly docked poses
(PRMSDs ≤ 2Å) replacing incorrect poses, and increasing the total count of correctly
docked lowest energy poses to 15 out of 18. Rescoring also led to lowest energy poses
that had SRMSDs ≤ 1Å in 16 out of 18 cases, and SRMSDs ≤ 1.5Å in the two remaining
cases. Among the three docking programs employed in this study, ADV was most
successful in producing and appropriately ranking the correct ligand pose, with a success
rate of 83% before rescoring, and 100% after rescoring. Inclusion of the CHI energy term
in rescoring docked poses enabled the filtering of poses based on their conformations,
increasing the chances of finding the correct pose amongst all output poses generated.
In most docking applications, locating the correctly docked pose amongst the
numerous output poses largely depends on the ranking of these poses based on their
energy scores. The CHI energy functions may in principle be used in the assessment of
carbohydrate structures obtained from any theoretical or experimental method. By
favoring energetically reasonable ligand conformations, the CHI energies significantly
improve the pose ranking for structures obtained from docking algorithms, making the
rescored energy a better predictor of the quality of the docked pose. This improvement
was observed across all three programs indicating that the CHI energy functions may be
51
employed independently of the scoring functions. The CHI energy functions could also
be incorporated directly within docking programs as a component of the scoring function,
although that might require a reoptimization of the scoring functions. Application to
crystallographic data leads to the conclusion that proteins primarily recognize low-energy
conformations of carbohydrates. This final observation has considerable relevance to the
design of carbohydrate-based inhibitors and vaccines.
Individual Author Contributions
Anita K. Nivedha: Authored portions of the paper and prepared figures for the paper;
designed docking protocols and the antibody alignment algorithm; performed the
dockings; developed the CHI energy functions and applied the functions to docking
results; provided tools for analysis, analyzed and interpreted the data.
Spandana Makeneni: Authored portions of the paper; co-designed docking protocols and
the antibody alignment algorithm; performed binding site volume calculations and
provided tools for the analysis of data.
B. Lachele Foley: Contributed to the design of the antibody alignment algorithm and the
development of the CHI energy functions.
Matthew B. Tessier: Contributed to the design of preliminary docking protocols,
provided PREP files for the non-standard sugar residues, and scripts for the collection of
quantum mechanical data.
Robert J. Woods: Authored the paper; conceived and designed the experiment, and
contributed to the analysis and interpretation of data.
52
5. VINA-CARB: IMPROVING GLYCOSIDIC ANGLES DURING
CARBOHYDRATE DOCKING
_____________________________
A. K. Nivedha, D. F. Thieker, R. J. Woods. Accepted by J. Chem. Theory Comput.
Reprinted here with permission of publisher.
53
Abstract
Docking programs are primarily designed to dock rigid, drug-like fragments onto
macromolecules, and frequently encounter issues predicting more flexible carbohydrate
molecules. The primary source of flexibility within a carbohydrate is the glycosidic
linkage. Previous efforts have developed Carbohydrate Intrinsic (CHI) energy functions
that reflect glycosidic torsion angle preferences. The following work represents the
incorporation of the CHI-energy functions into the AutoDock Vina (ADV) scoring
function, subsequently termed Vina-Carb (VC). Carbohydrate models generated by VC
are penalized according to the CHI-energy profiles. Two new, user-adjustable parameters
have been introduced; namely, a CHI-energy weight term (chi_coeff) that affects the
magnitude of the CHI-energy penalty, and a CHI-cutoff term (chi_cutoff) that negates
CHI-energy penalties lower than the specified value. A dataset consisting of 76 protein-
carbohydrate complexes and 29 apoprotein structures were used in the development of
VC, including antibodies, lectins and carbohydrate binding modules. Accounting for the
intramolecular energies of carbohydrate ligands produced docked models that better
reflected the natural configuration on the protein surface. VC produced accurate
structures ranked within the top five models amongst 68% of the systems tested,
compared to a success rate of 49% for ADV. Finally, a single enzyme system was
employed in order to demonstrate the potential application of VC to proteins which
distort glycosidic linkages of carbohydrate ligands upon binding. VC represents a
significant step towards accurately predicting protein-carbohydrate interactions. In
addition, the approach we present is generalizable to any other class of ligands that
populate multiple well-defined conformational states.
54
Introduction
Carbohydrates represent one of the four major classes of organic macromolecules,
and are involved in a range of processes that are critical for proper cellular function.9
Structural characterization of glycans and their binding partners (i.e. antibodies, lectins,
carbohydrate binding modules, enzymes, etc.) has advanced our understanding of the
molecular recognition process; however, obtaining three dimensional structures of these
interactions is particularly challenging due to the inherent flexibility of glycans.124,125
This flexibility stems from either two or three freely rotatable bonds constituting the
glycosidic linkages. 126
In contrast, rotation about the peptide backbone is restricted by
the partial double-bond character of amide linkages. 127
As a result of the increased
molecular motion present within carbohydrates, the majority of glycan-binding partners
are not resolved in complex with their substrate. 59
Theoretical methods offer an
alternative means for studying intermolecular glycan interactions that can complement
experimental results. 94,95,128
Molecular docking is one such method that aims to predict various modes of non-
covalent interaction between a macromolecule and a ligand, ranking the results based on
binding energies. 129
In general, docking energy functions are a summation of the energy
contributions from various non-bonded interactions in protein-ligand complexes such as
electrostatics, van der Waals, hydrogen bonding, and hydrophobic interactions.129,130
These semi-empirical scoring functions are generalized for small molecule ligands with
limited flexibility and often produce unnatural glycosidic angles when docking
carbohydrates.48
This distortion is especially pronounced for large oligosaccharides
which contain a higher number of degrees of freedom. 108
55
Previous studies have customized docking scoring functions for carbohydrates by either
re-calibration of existing terms 89
or the inclusion of additional energy terms which model
specific protein-carbohydrate interactions 131
. For example, the SLICK scoring function
131 within BALLDock
132 includes an energy term for CH/π stacking interactions, and was
calibrated using a set of carbohydrate-lectin complexes. In contrast, the previously
reported CHI-energy functions 48
assign relative energies to the torsion angles of the
glycosidic linkages. The CHI-energy functions were derived quantum mechanically
based on the torsional energy profiles of several tetrahydropyran-based disaccharide
models. Although the functions were developed using unbound carbohydrate models, the
distribution of glycosidic torsion angles in protein-carbohydrate complexes obtained from
the Protein Data Bank (PDB) has corresponded with the CHI-energy profiles.48
The
conformational similarity between bound and unbound carbohydrates suggested that the
CHI-energy functions would perform well within a docking program. The CHI energy
functions are transferable between scoring functions, and could also be employed in the
evaluation and refinement of carbohydrate conformations obtained using experimental
methods.
Vina-Carb (VC) represents the incorporation of the CHI-energy functions 48
into
the AutoDock Vina 1.1.2 (ADV) scoring function. 108
The CHI-energy is calculated for
each carbohydrate pose generated by VC, and added to the respective intermolecular
interaction energy. Energetically unfavorable carbohydrate conformations generated by
the program are penalized, and often rejected, within the Metropolis subroutine. The user
can control how the CHI-energy penalty is applied in VC by adjusting the values of two
input variables: a CHI-energy coefficient term (chi_coeff) and an energy cutoff value
56
(chi_cutoff). Changing the CHI-energy coefficient term affects the relative magnitude of
the CHI-energy penalty compared to other energy terms within the ADV scoring
function. The CHI-energy cutoff variable prevents penalization of poses with
conformations which deviate from the ideal due to induced fit. Models with glycosidic
torsion angles that would receive energetic penalties less than the CHI-energy cutoff
value are reduced to zero. Here we expand the previous set of CHI-energy functions to
include the ω-angle associated with glycosidic linkages to the O6 atom.
Unlike BALLDock/SLICK, which was calibrated on a set of lectin-sugar
complexes, the optimum settings for Vina-Carb were determined using a set of 72
carbohydrate ligands crystallized with antibodies, lectins or carbohydrate binding
modules from the PDB. Ligands within the development set range from a disaccharide to
an undecasaccharide in length. A test set consisting of apo-proteins of receptors from the
development set was used to examine and compare the optimized settings of VC with the
original ADV. Finally, an application of VC to an enzyme system is demonstrated.
Methods
File Preparation
Antibody, lectin and CBM complexes containing carbohydrate ligands were
collected from the Protein Data Bank (PDB) and employed as the Development Set for
VC. Details about the test systems used are provided in the S5.1. When duplicate protein
chains were present in the PDB file, the chain corresponding to the lowest average B
value of the corresponding ligand's atoms was used for docking. The apo-protein
structures were employed as a Test Set, and the average B value of the individual protein
57
chains was used to select between duplicate chains. The antibodies were aligned to the Z-
axis based on their CDR regions, as described previously48
The protein and ligand co-
ordinates were formatted for docking with AutoDock Tools (ADT) 107
using the protocol
described previously 48
. Each docking event consists of a rigid macromolecule and a
flexible ligand. Unless otherwise noted, all of the rotatable bonds within the ligand were
flexible except for carbon-carbon and carbon-nitrogen bonds.
Docking Parameters
The dimensions and centers of the grid boxes are described in the SI. The
maximum number of binding modes was limited to 20, and the energy range set at 10
kcal/mol. Two parameters have been added to the scoring function in VC that can be
adjusted by the user: 1) chi_coeff, a weighting term for the CHI energies that augments
the strength of the energetic penalty applied to the glycosidic torsion angles within the
ligand (Figure 5.1a), 2) chi_cutoff, a parameter that introduces a flat-bottom potential by
neutralizing the penalty assigned by the CHI energy curves to those glycosidic torsion
angles which would receive a penalty less than the cutoff value (Figure 5.1b). For
example, employing a chi_coeff of 2 is represented in the paper as VC2, and employing
both a chi_coeff of 2 and chi_cutoff of 4 is depicted as VC2|4.
58
Figure 5.1 a.) The effect of applying CHI-coefficient values of 1 (solid line), 2 (dashed
line) and 5 (dotted line) to the original VCΦ|β curve. b.) The effect of applying a CHI-
cutoff value of 2 to the original CHIΦ|β curve (VC1|2).
Analysis
The results of each ADV docking experiment are variable due to the random seed
implemented within the genetic algorithm. In order to account for this variation, the
results from multiple independent docking experiments were averaged for each system
tested. Unless otherwise stated, each Root Mean Square Deviation (RMSD) provided in
this article represents the average result of 10 docking events. This method of analysis
aims to eliminate spurious results and allows for a more accurate comparison between
ADV and VC. To increase comparability, the 10 random seeds generated for each of the
10 ADV docking experiments were explicitly defined for the 10 corresponding VC
docking events.
Docking accuracy is determined through two types of RMSDs; namely, pose and
shape RMSD. Both RMSDs compare the location of the docked ligand's ring atoms (C1,
0
5
10
15
20
25
30
35
40
45
0 60 120 180 240 300 360
ΔE
[k
ca
l/m
ol]
ϕ [deg]
0
1
2
3
4
5
6
7
8
9
0 60 120 180 240 300 360
ΔE
[k
ca
l/m
ol]
ϕ [deg]
b.)a.)
59
C2, C3, C4, C5, and O5) to that of the crystal structure's equivalent atoms. A pose RMSD
(PRMSD) represents the deviation of the docked model from the location of the reference
structure in space. In this manner, the PRMSD represents the accuracy of docking the
ligand to the receptor. In contrast, the shape RMSD (SRMSD) uses least squares fitting to
compare the docked model to the reference structure irrespective of their locations in
space. The SRMSD represents the deviation of the docked model’s shape from that of the
reference structure. The rmsd and match functions within Chimera 133
were used to
calculate the PRMSD and SRMSD values. The PRMSDmin(5) and PRMSDmin(20)
represents the minimum PRMSD from the top 5 ranked and top 20 models respectively,
averaged across the 10 docking events. The SRMSDavg was calculated by averaging the
SRMSD values for each of the 20 models from the 10 docking experiments. The standard
deviation values were calculated as the standard deviation of a sample.
Images of the molecules were prepared using the Visual Molecular Dynamics
(VMD) program. 134
The ligands are colored according to the source of the file. Crystal
structures are colored blue, and output from ADV and VC are colored yellow and green,
respectively. Additionally, each carbohydrate ring is colored according to whether the
CHI energy penalty is applied to the surrounding Φ/Ψ values. The 1C4 and
4C1 chair
conformations are colored green, and other conformations that would be skipped by VC
are colored red. Ring conformations have been determined according to the Cremer-
Pople definition. 135
60
CHI Energy Integration
Parsing the Ligand: The atom names for carbohydrate residues within the ligand file must
follow established atom naming to be identified by the CHI energy scoring function of
VC. While the carbohydrate ligand file is parsed within parse_pdbqt.cpp, information
about the atoms and residues of the ligand is stored within the data structure ligand_info.
Relevant glycosidic linkages, namely (1,2), (1,3), (1,4) and (1,6) linkages are detected.
Since the CHI energy functions were originally developed for chair conformations of
oligosaccharide rings, it is necessary to determine the conformations of the residues
comprising the input oligosaccharide ligand before the application of the energy
functions.
Determination of Ligand Carbohydrate Ring Conformation: The ring conformations are
identified based on a modified version of the Best-Fit-Four-Membered-Plane (BFMP)
method 136
Selections made about the appropriate CHI energy functions to be used for
each linkage are stored in the data structures glyco_info and ligand_glyco_info.
According to the BFMP method, a carbohydrate ring must fit three criteria in order to be
classified as a 1C4 or
4C1 sugar; namely, the internally defined
2d5,
4d1, and
6d3 or
5d2,
1d4,
and 3d6 conformations, respectively. When the program encounters carbohydrate
conformations for which the CHI_energy functions are not applicable, it simply ignores
the associated linkages. In certain protein-carbohydrate systems the sugar rings are only
slightly distorted from the standard 4C1 and
1C4 conformations and still merit application
of CHI energy penalties. To accommodate such minor conformational distortions of the
carbohydrate ring, in the current implementation of the BFMP method, a saccharide is
classified as a 1C4 or a
4C1 sugar if any 2 of the 3 criteria can be identified for the ring.
61
Scoring Individual Ligand Poses: Each docking run consists of a certain number of steps,
determined heuristically. Each step is characterized by a random perturbation and a local
optimization, which is followed by an evaluation of the generated pose. The random
perturbation is performed by either transposing or rotating the ligand, or by adjusting any
of the flexible torsion angles. A new function, eval_chi has been introduced within
model.cpp in order to calculate the CHI energy penalty for each ligand pose. This
function uses data from ligand_glyco_info to calculate the CHI energy penalty for every
oligosaccharide pose generated. The CHI energy penalty calculated for each glycosidic
torsion angle within eval_chi is modified according to two user-adjustable parameters
(chi_coeff and chi_cutoff). The total CHI energy of a given oligosaccharide is the
summation of the CHI energies for each glycosidic torsion angle comprising the model,
which is combined with the interaction energy natively calculated by ADV within the
function eval_deriv. This composite energy is implemented within the metropolis_accept
function in monte_carlo.cpp to calculate the acceptance probability of each ligand pose.
A ligand pose with unfavorable glycosidic torsion angles would be penalized by the
application of CHI energies, thereby increasing its probability of rejection within the
function.
Log file: A VC log file (called, VC_log.txt) is written out for each execution of the
program and contains information about the glycosidic linkages identified by the program
and details about whether CHI energy penalties were applied to each linkage.
62
Results & Discussion
Implementation of the CHI energy function aims to improve docking accuracy by
correcting the shape of the carbohydrate ligand. In order to determine whether correcting
the ligand shape would be sufficient to produce an accurate model for a complex, each of
the crystal structures were initially subjected to a unique docking procedure in which the
glycosidic linkages of the ligand were restrained to the angles that were present in the
crystal structure. Of the 87 crystal structures selected for evaluation, 11 failed this initial
positive control. Failure during this step suggests that alternative modifications to the
ADV scoring function would be necessary to produce accurate models for these 11
complexes; therefore, optimization of VC continued with the remaining 76 structures.
Optimization of the CHI-Energy Coefficient
Incorporation of the CHI-energy term into the ADV scoring function immediately
produced output carbohydrate conformations comparable to X-ray crystal structures
(ADV vs. VC1 in Figure 5.1a). However, since the CHI-energy term was developed
independently of the ADV scoring function, it may be disproportionate in magnitude.
Therefore, a range of CHI-energy coefficients (1, 2, 3, 4, 5, 10, and 50) were examined.
The effect of varying the CHI-coefficient for a set of 14 antibody-carbohydrate systems is
reported in Figure 5.1. Each CHI-coefficient value led to poses with improved ligand
conformations (lower SRMSDavg(20) values) than those produced with ADV. The CHI-
coefficient imposes a higher penalty for torsions outside of the local minima of the CHI
energy curves, thereby attenuating the production of incorrect oligosaccharide
conformations during docking. Increasing the magnitude of the CHI-energy contribution
generally led to a corresponding decrease in the SRMSDavg(20). This trend was
63
particularly noticeable for systems containing more than 5 carbohydrate residues, due to
the increasing number of glycosidic linkages that were affected (Figure 5.1a).
Interestingly, the largest CHI-coefficient (CHI50) increased the SRMSDavg(20) for ligands
containing less than 4 carbohydrate residues. This result is most likely due to an induced
fit that occurred upon ligand binding, which caused the glycosidic linkages of the
crystallized ligand to deviate from the theoretical minima that are heavily biased by
CHI50.
Notably, the accuracy of the pose (Figure 5.1b) diminished as the CHI
contribution became increasingly large (i.e. VC10 and VC50), despite producing ligand
conformations similar to the reference structure (Figure 5.1a). This suggests a problem
associated with pose identification. To demonstrate this, the lowest energy model
generated from flexibly docking the 3C6S 35
ligand using VC50 (SRMSD = 1.13 Å;
PRMSD = 23.8 Å) was rigidly re-docked. Results from ten docking experiments
consistently produced an accurate model with a PRMSDmin(5) of 1.98 Å. Rigidly re-
docking the ligand allowed the docking scoring function to segregate poses solely based
on intermolecular interactions between the protein and ligand. However, during flexible
docking, the harsh penalty applied by VC50 eliminated any model that deviated from the
minima of the energy curve. Since very few of the generated models met this criterion,
only those models that were unaffected by the CHI-energy penalty remained, including
those positioned incorrectly. The intramolecular forces imparted by a high CHI-energy
penalty appear to outweigh contributions from intermolecular interactions between the
protein and ligand.
64
The effect of over-weighting the CHI contribution suggests that a fine balance
between inter- and intramolecular interactions is required to successfully dock
carbohydrate ligands. As a result, lower coefficients of the CHI-energy function (less
than 4) produced more accurate models by enabling the generation of favorable
glycosidic torsion angles without overshadowing the intermolecular forces involved in
ligand binding. The performance of ADV and VC are comparable for systems containing
di-, tri-, tetra- and pentasaccharide ligands; however, VC outperforms ADV with regards
to larger oligosaccharide ligands. For example, the improvements in PRMSDmin amongst
the 5 top-ranked poses produced by ADV and VC1 for 1MFB34
, 3BZ435
, and 3C6S were
1.1, 2.0, and 2.27 Å, respectively. Using VC1 and VC2 produced acceptable PRMSDmin(5)
poses for 13 out of 14 systems. As a result, only CHI coefficients of 1 or 2 were
considered for subsequent experiments.
65
Figure 5.2 Assessment of docking to 14 antibody systems with ADV and various CHI-
energy coefficients of VC a.) SRMSDavg amongst the 5 top-ranked poses b.)
PRMSDmin(5).
Optimization of the CHI-Energy Cutoff
The CHI-energy functions were originally developed by modeling the rotational
properties of disaccharide analogs in vacuo. The minima of the CHI-energy curves
generally corresponded to experimentally-determined oligosaccharide structures as
determined crystallographically; 48
however, oligosaccharides often undergo
conformational changes resulting from induced fit, which may cause glycosidic linkages
to deviate from idealized low energy values. Rather than defining the well bottom in
terms of a range of allowable torsion angles, the limits are defined by CHI-energy range.
The chi_cutoff term negates the penalty associated with glycosidic linkage conformations
surrounding the absolute energy minima in the CHI-energy curves. Use of a flat-
bottomed CHI-energy potential allows induced fit to occur with no internal energy
di- tri- tetra- penta- hepta- deca- undeca-
0
1
2
3
4
5
6S
RM
SD
avg
(20)[Å
]
ADV
VC1
VC2
VC3
VC4
VC5
VC10
VC50
0123456789
1011121314
2G12 291-2G3-A SYA/J6 scFv SE155-4 Fab SE155-4 SE155-4 HU3S193 BR96 BR96 SE155-4 SYA/J6 SE155-4 F22-4 F22-4
1OP3 1UZ8 1M7D 1MFA 1MFD 1MFE 1S3K 1CLY 1CLZ 1MFC 1M7I 1MFB 3BZ4 3C6S
PR
MS
Dm
in(5
)[Å
]
ADV
VC1
VC2
VC3
VC4
VC5
VC10
VC50
ADV
VC1
VC2
VC3
VC4
VC5
VC10
VC50
di- tri- tetra- penta- hepta- deca- undeca-
0
1
2
3
4
5
6
SR
MS
Davg
(20)[Å
]
ADV
VC1
VC2
VC3
VC4
VC5
VC10
VC50
0123456789
1011121314
2G12 291-2G3-A SYA/J6 scFv SE155-4 Fab SE155-4 SE155-4 HU3S193 BR96 BR96 SE155-4 SYA/J6 SE155-4 F22-4 F22-4
1OP3 1UZ8 1M7D 1MFA 1MFD 1MFE 1S3K 1CLY 1CLZ 1MFC 1M7I 1MFB 3BZ4 3C6S
PR
MS
Dm
in(5
)[Å
]
ADV
VC1
VC2
VC3
VC4
VC5
VC10
VC50
ADV
VC1
VC2
VC3
VC4
VC5
VC10
VC50
a.
b.
66
penalty. Within this region, the pose is scored solely on the basis of the intermolecular
interactions dictated by the native ADV scoring function.
To identify the optimal setting that permits an acceptable range of glycosidic
angles, a CHI-energy cutoff was evaluated at integer values from 1 to 5 kcal/mol (Table
S5.2). Optimal results were obtained for each CHI-coefficient (VC1 and VC2) using CHI-
cutoff values of either 1 or 2 kcal/mol (VC1|1, VC1|2, VC2|1 and VC2|2). These four settings
of VC identified acceptable binding modes ranked within the top 20 poses for each of the
14 antibody systems, and ranked within the top 5 poses for 13 of the 14 antibodies. In
order to examine the applicability of VC to protein-carbohydrate complexes other than
antibody systems, as well as to further optimize the VC parameters, the study was
extended to 62 additional carbohydrate-protein complexes, including carbohydrate
binding modules (CMBs), lectins, and enzymes. The best performance was attained using
a CHI-coefficient of 1 and a CHI-cutoff of 2 (VC1|2), which generated an acceptable pose
amongst the top 5 models for 75% of the systems, compared to a 56% success rate for
ADV (Table 1). Although each of the 76 systems passed a positive control in which the
reference structure was successfully docked with rigid glycosidic linkages, VC1|2 was
unable to identify an acceptable pose for 25% of these systems. Challenges which may
have prevented VC from identifying correct models will be discussed in the following
section.
67
Figure 5.3 Comparison of the VC1|2 (dotted line) and VC2|1 (solid line) CHIΦ|β curve to
the distribution of glycosidic linkages in carbohydrate crystal structures in the PDB. The
bottom X-axis and left Y-axis correspond to the histogram which depicts the distribution
of PDB structures, while the top X-axis and right Y-axis correspond to the CHI-energy
curves.
Similar to the analysis performed by Nivedha et al.48
, the carbohydrate crystal
structures in the PDB were surveyed using the GlyTorsion tool from
www.glycosciences.de 137
in order to calculate the percentage of glycosidic linkages
exempted from penalization as a consequence of applying VC1|1, VC1|2, VC2|1, and VC2|2.
At VC1|2, the CHI energy penalty for 87% of glycosidic linkages in the PDB was nullified
(Figure 5.3), compared to values of 77%, 62% and 76% for VC1|1, VC2|1 and VC2|2,
respectively. Therefore, using VC1|2 allowed for the maximum flexibility of glycosidic
0 60 120 180 240 300 360
0
2
4
6
8
10
12
14
0
2
4
6
8
10
12
14
16
0 t
o 4
25 t
o 2
9
50 t
o 5
4
75 t
o 7
9
100 t
o 1
04
125 t
o 1
29
150 t
o 1
54
175 t
o 1
79
200 t
o 2
04
225 t
o 2
29
250 t
o 2
54
275 t
o 2
79
300 t
o 3
04
325 t
o 3
29
350 t
o 3
54
ϕ [deg]
ΔE
[kc
al/m
ol]
Perc
nta
ge o
f str
uctu
res
ϕ [5 deg bins]
68
linkages without penalization by the CHI-energy functions. Although VC1|2 was selected
as default, the alternatives (VC1|1, VC2|1 and VC2|2) were nearly as efficient in binding
mode prediction (Table 5.1); therefore, the CHI-cutoff and CHI-coefficient parameters
remain user-adjustable.
Table 5.1 Comparison between ADV and VC at the four settings of CHI-coefficient and
CHI-cutoff.
System types No. of Systems
Success Rate* [%]
ADV VC1|1 VC1|2 VC2|1 VC2|2
Antibodies 14 79 100 93 100 100
Lectins 42 55 64 71 67 67
CBMs 20 35 50 60 55 45
Totals 76 56 71 75 74 71
*Success Rate is defined as producing an accurate binding mode (PRMSDmin(5) < 2 Å)
Performance with ligands containing 1-6 linkages
In total there are 12 systems, consisting of lectins and CBMs, with ligands containing one
or more 1,6 glycosidic linkages. The success rates for these systems (producing an
accurate pose prediction amongst the top-5 poses) for ADV and VC1|2 were 25% and
42% respectively (Table 5.2).
69
Table 5.2 PRMSDmin(5) produced by ADV and VC1|2 for the 12 test systems with ligands
containing 1,6-linkages.
PDB
ID
1jpc 1k9i 1tei 1zhs 2vco 4gk9 2vuz 2yfz 1oh4 2ypj 2j73 2i74
ADV 5.0 4.7 1.1 1.4 3.5 4.3 3.4 3.3 3.6 5.2 0.7 3.6
VC1|2 5.7 5.0 0.6 1.6 1.7 1.2 7.6 1.8 3.0 9.9 2.7 4.0
The performances of VC1|2 and ADV were further compared for the above systems by
binning the values of the ω angle of all the docked poses from both programs into 10°
bins. The histogram thus obtained was compared to the corresponding CHI-energy
curves, for example, the data pertaining to sugars with 1,6 linkages in which the O4 atom
is equatorially attached to the reducing sugar is shown in Figure 5.4. The distribution of
ω angles produced by VC1|2 can be divided into three energy regions centered around 60°,
180° and 300°, which is in agreement with the low-energy regions of the corresponding
CHI-energy curve. Additionally, the ω angles corresponding to the reference crystal
structures also fall within the range of the two lowest energy wells of the CHI energy
curve. In contrast, the distribution of ω angles produced by ADV are more evenly
distributed across the 0° to 360° range. The challenges faced by the docking programs
with docking the test set used in this study is outlined below.
70
Figure 5.4 Distribution of ω angles produced by ADV (blue) and VC1|2(green) for 12 test
systems containing one or more 1,6-linkages overlaid against the reference crystal
structure ω angles (red dots) and the corresponding CHI energy curve.
Docking Challenges
Both ADV and VC encountered recurring difficulties while docking ligands in the
development set. These challenges resulted from issues inherent to the docking program,
as well as ambiguities in atomic placement within the crystal structures that were used as
a reference.
Excessive Carbohydrate-Protein Interactions
Obtaining a docked oligosaccharide in which part of the ligand extends away
from the protein is particularly difficult for automated docking algorithms.138,139
Docking
predicts complexes using a scoring function that maximizes favorable intermolecular
0 60 120 180 240 300 360
0
2
4
6
8
10
12
0
2
4
6
8
10
12
14
16
ω [ ]
En
erg
y [
kcal/
mo
l]
Perc
en
tag
e o
f S
tru
ctu
res
ω [10 bins]
ADV
VC1.2
Series3
VC1|2
71
interactions. This approach promotes models that contain many residues interacting with
the protein. For example, both ADV and VC1|2 fail to identify an acceptable pose
amongst the 5 top-ranked poses when docking the tetrasaccharide ligand to the lectin
binding domain of lectinolysin (PDB ID: 4GWI140
). Only one residue of the ligand
completely interacts with the protein surface in the crystal structure; however, the models
produced during docking are unable to reproduce this orientation. Although VC1|2
produced poses similar to the crystal ligand (PRMSD=2.2Å), they were ranked lower
than the other models which interact with the protein surface in their entirety. One
approach to surmount this problem would be to dock only the component of the
oligosaccharide that is in direct contact with the protein. Such a minimal binding
determinant may be inferred from experimental binding data, such as glycan array
screening141,142
. VC improves the likelihood that the non-interacting segment will remain
distal from the protein surface by penalizing unlikely glycosidic torsion angles. As an
example, docking results produced by ADV and VC1|2 for the largest oligosaccharide in
this test set (PDB ID: 3C6S) are displayed in Figure 5.5. While those residues that
interact with the protein are correctly predicted in both instances, the model produced by
VC1|2 better represents the solvent-exposed residues. Glycosidic torsion angles obtained
from the reference structure have been plotted as a function of the CHIφ|α energy curve
alongside those of the 20 models produced by either ADV or VC1|2 (Figure 5.5c).
Approximately half of the ADV torsions exceeded the 2 kcal/mol cutoff, some of which
would receive CHI energy penalties greater than 8 kcal/mol. In contrast, none of these
torsion angles produced by VC were penalized by the CHI energy function for exceeding
the 2 kcal/mol cutoff.
72
Figure 5.5 a.) The PRMSDmin(5) pose from ADV compared to the reference ligand (blue).
b.) The PRMSDmin(5) pose from VC1|2 compared to the reference ligand (blue). c.) The Φ
torsion angles of α-sugars from the docked poses of the 3C6S ligand from both ADV
(yellow triangles) and VC1|2 (green squares) plotted on to the CHI curve. The torsion
angles corresponding to the reference are plotted as blue circles.
0
2
4
6
8
10
12
0 60 120 180 240 300 360
CH
I E
ne
rgy [
kc
al/m
ol]
CHIΦ|α torsion angle [°]
a. b.
c.
73
Aromatic Stacking
The importance of aromatic residues within the binding site has been
demonstrated by the corresponding decrease in affinity upon their substitution with other
amino acids 143
; however, aromatic stacking interactions are currently omitted from
consideration in most docking scoring functions. As a result, docking algorithms can
encounter difficulties when predicting binding modes of ligands that stack against
aromatic amino acids. As an example, the carbohydrate ligand in 4AFD 144
stacks against
four Tryptophan residues (Trp 55, 60, 99 and 108) in the binding groove of the
corresponding CBM. Neither ADV nor VC accurately predict the binding mode,
obtaining high PRMSDmin(5) values of 8.9Å and 5.4Å, respectively (Figure 5.6). In these
situations, consideration of aromatic stacking interactions within the docking scoring
function would be expected to improve the results. Previously, efforts have been made to
incorporate CH/ᴨ stacking effects during carbohydrate docking 132,145
.
74
Figure 5.6 The crystal structure of a CBM from endoglucanase Cel5A (PDB ID: 4AFD)
is depicted in complex with a tetrasaccharide ligand. All amino acids further than 5 Å
away from the ligand are colored grey. Those residues within 5 Å are colored orange if
they are cyclic and red if acyclic.
Low-Resolution Experimental Data
Docking the tetrasaccharide ligand to the Se155-4 antibody (PDB ID: 1MFC34
)
appeared more challenging for VC than ADV (Figure 5.7a); however, the results were
comparable for the other three ligands that have been crystallized with this antibody
(PDB ID: 1MFA31
, 1MFD32
, and 1MFE33
). These three systems contain the same
trisaccharide ligand, but differ from the tetrasaccharide by a rhamnose (Rha) residue.
This extra residue is responsible for the difference in PRMSDmin(5) values between VC1|2
and ADV (Figure 5.7a). While the positions of three of the four residues in the individual
75
structures closely align with one another, the pyranose ring of Rha-524 in the model
produced by VC is flipped approximately 180° around the glycosidic ψ-angle, compared
to the model produced by ADV. In the reported crystal structure for this complex 34
,
residue Rha-524 was described as “disordered,” and was placed in both the expected 94
and the flipped orientation in structures 1MFB and 1MFC, respectively. The ADV
orientation more closely aligns with the “flipped” ligand from 1MFC, giving rise to a low
PRMSDmin(5) relative to VC, which predicts the normal conformation 94
to be preferred.
While it is expected that complexation with the protein will distort the conformation of a
bound oligosaccharide, the preponderance of crystallographic data (Figure 5.7b) indicates
that large distortions, such as the flip of the glycosidic ψ-angle in 1MFC are rare. Thus,
there is a clear role for the CHI-energy functions to aid in crystal structure refinement
and/or curation by identifying such distorted glycosidic linkages as high energy.
Figure 5.7 a.) Models representing the PRMSDmin(5) produced by docking 1MFC with
ADV (yellow) and VC at CHI1|2 (green). The primary difference between docked models
is a rhamnose ring that is flipped approximately 180 degrees, highlighted by the orange
a. b.
76
arrows. b.) Ligands from two crystal structures, 1MFB (blue) and 1MFC (cyan), also
differ by the orientation of the RAM 524 ring.
An Assessment of ADV and VC using a Test Set of Apo Proteins
Cognate docking is useful for determining the ability of the docking algorithm to
correctly place the ligand when the binding site is already preordered to receive the
ligand; however, if the ultimate goal of docking is to successfully predict protein-ligand
interactions in the absence of a pre-configured binding site, it is necessary to assess the
performance on apo proteins. Apo protein crystal structures were available for a subset of
systems from the cognate development set, and were employed as test cases to compare
the performance of ADV and VC1|2. The average difference in amino acid positions
between the apo and corresponding cognate proteins for residues within 5Å of the ligand
was 0.77Å. ADV correctly predicted the binding modes in 35% of the systems, whereas
VC1|2 succeeded in 55% of the systems. If the top-20 poses were considered, instead of
only the top-5, the success rates for ADV and VC1|2 increased to 55% and 83%
respectively (
Table 5.3). VC1|2 also improved the rankings of these acceptable pose predictions
(Figure 5.8). In a given docking run, if there are multiple poses with PRMSD ≤ 2Å, the
pose with a higher rank is considered an acceptable pose.
Table 5.3 Comparison between ADV and VC1|2 for the apo proteins Test Set.
77
System types
No. of Systems
Success Rate* [%]
PRMSDmin(5) PRMSDmin(20)
ADV VC1|2 ADV VC1|2
Antibodies 7 71 86 71 100
Lectins 10 50 50 70 90
CBMs 12 0 42 33 67
Totals 29 35 55 55 83
*Success Rate is defined as finding an accurate binding mode. (PRMSDmin < 2Å)
Figure 5.8 A depiction of the ranks of acceptable poses (Rankacc), i.e., the lowest-ranked
pose with PRMSD ≤ 2Å, produced by ADV and VC1|2 from docking oligosaccharide
ligands onto apo protein structures.
64%
22%
14%
VC1|2
Rank < 5
6 < Rank < 10
Unacceptable53%
10%
37%
ADV
acc
acc
78
Evaluation of Docking to an Enzyme System using ADV and Optimized VC
Enzyme active sites often distort monosaccharide ring shapes during catalysis,
which makes docking to this class of proteins particularly challenging. As the CHI-
energy functions were developed for use with low energy ring conformations, they would
not necessarily be applicable to the distorted glycans found in enzyme complexes, and
hence VC is unlikely to offer considerable improvement over ADV when applied to
carbohydrate-processing enzymes. An exception to this general statement is for segments
of the oligosaccharides extending beyond the active site, in which case the CHI-functions
in VC should provide some enhanced accuracy. A single example of docking to a
retaining glycoside hydrolase is presented here in order to demonstrate the potential
application of VC to enzymes. Kitago et al. produced a series of crystal structures of the
WT cellulase 44A (Cel44A), and a catalytic knockout, in combination with cellulosic
fragments. 146
Of the five structures produced, four of the ligands were bound to the (-)
site (relative to the catalytic nucleophile), while only one contained a ligand that spanned
the entire active site (PDB ID: 2EQD146
) (Figure 5.9a). In that work, a reaction
mechanism was proposed in which initial substrate binding enhanced activity through an
assortment of interactions with the carbohydrate in the (-) site, while a dearth of
interactions in the (+) site promoted product release. 146
VC successfully produced a model of the complex for the four ligands in the (-)
site, but failed to correctly position the largest ligand that crosses the (+) site (Figure
5.9b). Although ADV failed to generate a correct model for the ligands bound to the (-)
site, it outperformed VC when docking the ligand that extends across the active site. This
result is unsurprising considering the high torsional penalties that would be applied by
79
VC to some of the glycosidic linkages within the crystal structure (Figure 5.9c). Although
VC would not penalize the glycosidic linkage of the (-1) residue due to the non-chair ring
conformations, there are other uncommon torsion values in the distal regions of the
ligand. For example, the Φ linkage between residues (+1) and (+2) of the reference
structure would receive a penalty of 8 kcal/mol by the CHI energy function, effectively
precluding selection of such a model by VC.
80
Figure 5.9 a) The ligands from five crystal structures (PDB ID: 2E0P, 2EO7, 2EEX,
2EJ1, and 2EQD) of the Cel44A enzyme are superimposed on the protein from PDB ID:
2EQD. Amino acids reported to be involved in substrate binding (N45, R47, W64, W71,
W327, W331, E359, and W392) are colored orange or red, depending on whether the
2E0P 2EO7 2EEX 2EJ1 2EQD
0
2
4
6
8
10
12
14
PR
MS
Dm
in[Å
]
ADV
VC1|2
ADV
VC1|2
VC1|2
ADV
ADV
VC1|2
PRMSDmin(5)
PRMSDmin(20)
a.
b.
c.
0.1 / 0.1 --- / 0.7 1.3 / 0.3 0.1 / 0.5
3.2 / --- 8.0 / 0.1 5.2 / 3.5
-3
-2
-1
+1
+2 +3
+4
+5
81
residue is aromatic or not. 146
The catalytic residue (Q186) is colored yellow. All other
amino acids are grey. The active site has been separated into a (-) and (+) site. The circled
values represent the position of each residue relative to the glycosidic linkage that is
cleaved during catalysis. The ligands exclusive to the (-) side of the active site are
depicted by varying shades of purple. The octasaccharide that extends across both the (-)
and (+) site (2EQD) is colored blue. Each carbohydrate ring is colored according to
whether the CHI energy penalty is applied to the surrounding Φ/Ψ values. Rings are
either green or red depending on whether VC is or is not applied, respectively. b) A
representation of the PRMSDmin(5) and PRMSDmin(20) poses from ADV and VC1|2. c) The
glycosidic linkages of the octasaccharide that extends across the active site (2EQD) are
labeled according to the penalty received by the CHI energy curve. Penalties greater than
2 kcal/mol are highlighted in red. VC is not applied to the (-1) residue since it is neither a
4C1 nor
1C4 chair, so the ring is colored red and the penalties are unlisted.
Conclusions
The CHI energy functions were incorporated into ADV in order to improve
carbohydrate docking results. Docking performance was evaluated with 72 antibody,
lectin, or CBM systems. Although various CHI-energy coefficients were evaluated, the
original energy profiles (chi_coeff = 1) produced accurate models with the highest
frequency. Although exocyclic groups have been omitted from consideration during the
modeling of the CHI energy curves by the use of tetrahydropyran molecules, the
remaining interaction energy terms within the ADV scoring function account for the
interactions of the molecule arising from the presence of these exocyclic groups. An
82
additional term that allows a range of glycosidic torsion angles to remain unpenalized has
been implemented to enhance docking performance (chi_cutoff = 2). Although these
settings have been selected as default values, the variables remain user-adjustable. VC1|2
produced accurate docked models for more systems than ADV when docking to either
holo- or apo-protein receptors; however, ADV outperformed VC in a few cases where the
reference ligands contained high-energy glycosidic linkages according to the CHI energy
curves. This result suggests that accurately predicting warped glycosidic linkages, such as
those found within the active site of an enzyme, would be difficult for VC. Although VC
was not designed for enzymes, results from docking to a cellulase demonstrate the
potential application of VC towards accurately predicting enzyme-glycan interactions.
There were a few commonalities within the systems that neither ADV nor VC
could accurately reproduce. Ligands that partially extend into solution were difficult to
reproduce due to the lack of intermolecular interactions. For these ligands, better results
may be produced by docking only those parts of the ligand which are expected to interact
with the protein. A few other systems were identified which may benefit from a term that
accounts for aromatic stacking. Finally, a few low-resolution crystal structures were
identified which contained ambiguous coordinates for the reference ligands, indicating a
potential role for the CHI energy functions as a validation technique for crystallographic
models.
VC is currently applicable to the most common saccharide moieties and linkages,
such as chair conformations and 1,x-linkages. Additional residues, such as sialic acid,
may be incorporated into VC once the CHI-energy functions become available.
83
The source code for VC is freely available at http://glycam.org/publication-
materials/vina-carb
Individual Author Contributions
Anita K. Nivedha: Authored portions of the paper; coded CHI energy functions within
AutoDock Vina; co-designed docking protocols and analysis methodologies; provided
tools for the analysis of data and made images for the paper.
David F. Thieker: Authored portions of the paper; co-designed docking protocols and
analysis methodologies; provided tools for the analysis of data and made images for the
paper.
Robert J. Woods: Authored the paper; conceived and designed the experiment, and
contributed to the analysis and interpretation of data.
84
6. THE CONSIDERATION OF CH/Π INTERACTIONS IN CARBOHYDRATE-
PROTEIN DOCKING
Introduction
CH/π interactions occur between -CH groups and the π-electron density in
aromatic molecules. These interactions were first postulated by Tamres in 1952 147
, who
noted that dissolving benzene in chloroform was an exothermic reaction. This result was
followed up by extensive NMR and IR studies 148,149
which showed that this type of non-
covalent interaction is qualitatively similar to hydrogen bonds. CH/π interactions have
been described as interactions between a weak acid (C-H donor) and a weak base (π-
acceptor), the interaction between a weak acid and a weak base, and are stable in both
polar and non-polar solvents. 150
Individually, these bonds are relatively weak, with each
interaction contributing 0.5-1.0 kcal/mol to the overall stabilization energy of the
complex, 150,151
but the cumulative effect of multiple CH/π interactions has a pronounced
influence on stability.
It has also been proposed that the strength of the CH/π interaction primarily
originates from charge transfer, 152
indicating that dispersive forces play a major role in
these interactions. 153
The hydrophobic effect also contributes favorably to this interaction
when it is present in water as the solvent. However, it is not the major contributing factor,
as shown in the study by Waters et al.,154
in which the replacement of an aromatic moiety
by a more hydrophobic aliphatic group, led to a decrease in the interaction energy of the
system under study, in which they showed that the mutation of a phenylalanine by a
synthetic analog, in which the phenyl ring was replaced by a cyclohexane ring, weakened
the interaction energy of the system with an acetylated monosaccharide from -0.5
85
kcal/mol to -0.1 kcal/mol. This result showed that the hydrophobic effect was not the
major contributing factor to the interaction energy when there was a potential to form
CH/π interactions. Additionally, CH/π interactions may occur even in vacuum, 155,156
whereas hydrophobicity stems from a molecule’s interaction with water.
Figure 6.1 The replacement of the aromatic group in (A) by the group aliphatic group in
(B) in the study by Water et al. 154
in an interaction with a tetraacetylglucose molecule led
to a decrease in the interaction energy of the system.
Multiple surveys of the Protein Data Bank (PDB) have been performed to
investigate the presence of CH/π interactions in protein crystal structures, and the sheer
number of these interactions reveals their importance in protein structure stability and
function. 50,157
For example, in a 2001 study, a survey of PDB was conducted on a set of
1154 non-redundant protein structures to detect CH/π interactions, and the authors
A. B.
86
detected 31,087 individual interactions which satisfied their selection criterion. 151
They
discovered that nearly three-fourths of the Tryptophan residues, half of all Tyrosine and
Phenylalanine residues and one-fourth of all Histidine residues were involved as
acceptors in CH/π interactions. In addition to their contribution to the stabilization of
protein structures151
, CH/π interactions are also found occurring in complexes of proteins
with ligands or cofactors, nucleotides, carbohydrates or peptides.151,158
159
They are
particularly common in carbohydrate-binding proteins, and affect binding affinity and
conformation. For example, human lysozyme is an endoglycosidase which binds to the β-
1,4-linked homopolymer of N-acetylglucosamine (GlcNAc), the main cell wall
component in fungi. The enzyme has several aromatic amino acids in its binding pocket
crucial for ligand recognition. An alteration of these aromatic residues using site-specific
mutagenesis affected the affinity and the catalytic efficiency of the enzyme. 158
Protein-carbohydrate interactions are at the heart of several life processes
including fertilization, embryogenesis, tissue maturation and tumor metastasis. 160
The
affinities associated with this class of molecular recognition phenomenon are often
strengthened by multivalency 161,162
, as well as by interactions between polar or charged
groups (hydrogen bonds, salt bridges), van der Waals contacts, and aromatic amino acids
and –CH groups in carbohydrate residues (CH/π interactions) 55
(Figure 6.3). These CH/π
interactions cause carbohydrate rings to stack roughly parallel or perpendicular to
aromatic amino acids. 51,163
They have been observed in most protein-carbohydrate
complexes, including enzymes and receptors, and more specifically for example in,
lectins, plant toxins, antibodies and transport proteins. 56,164
Antibodies can also be raised
87
against carbohydrate antigens, and can therefore interact with sugars intermolecularly 25
via stacking interactions.
Figure 6.2 The carbohydrate antigen from Salmonella stacking against two aromatic
amino acids, namely, a Tryptophan and a Tyrosine in the binding pocket of an antibody
Fab fragment. (PDB ID: 1MFE)33
Any pyranoside has two distinctive faces that can interact with an aromatic
residue. From experimental and theoretical studies it can be seen that the presence of
several axially oriented CH bonds facing the aromatic ring is favored, while interactions
with axially oriented OH bonds is disfavored. 160
In a typical carbohydrate CH/π
interaction, the hydrogen atoms in two or three CH groups on the hydrophobic face of a
monosaccharide overlap with the π-electron density in an aromatic amino acid (Fig.
88
3.1b). It has also been shown experimentally that the elimination of aromatic residues
within these binding sites leads to a decrease in the affinity of the protein-carbohydrate
interaction, 165
and replacing one aromatic residue by another can be performed to
modulate the properties of the interaction. It was found also that as the size of the
interacting amino acid ring increased, there was a corresponding increase in affinity. At
the same time, if electron-withdrawing groups, such as Flourine were added to the ring, it
led to a decrease in affinity.147,158,166,167
Figure 6.3 Representation of CH/π interactions between β-D-Glucopyranose (βDGlcp)
and Phenylalanine.
In this the present work, we obtained a CH/π interaction energy function using
knowledge from previous experiments about the nature of interaction between the two
groups and their contribution to the overall interaction energy of the complex.154
We
examined the use of the resulting CH/π function in improving the ranking of theoretical
89
interaction energies for a test set of 60 lectin-carbohydrate systems that consisted of
complexes in which CH/π interactions visibly contributed to binding, those which had a
fewer than four CH/π interactions in the binding site, and also systems in which these
interactions were absent. The theoretical structures were generated by automated
docking employing AutoDock Vina 108
and Vina-Carb. 48
The CH/π function was applied
after docking to assess its ability to improve the ranking of the theoretical poses, relative
to the known crystal structures for the 60 systems.
Methods
CH/π interaction energy function
The CH/π interaction energy curve between a CH model and an aromatic ring moiety can
be described using a Lennard-Jones’ potential with the minimum of the curve at ~0.5
kcal/mol, which is known to be the contribution from an individual CH/π interaction.154
The equation used to model these interactions is shown in Figure 6.4.
90
Figure 6.4 The mathematical model (Lennard-Jones potential) used in this study to
describe the interaction between a CH-group and an aromatic moiety.
Evaluation of Results
The RMSD of the docked ligand pose was computed relative to that in the crystal
structure (PRMSD) for the carbohydrate ring atoms. Previously, we have reported that
PRMSD values are a convenient quantitative measure of the quality of a theoretical
carbohydrate pose. 48
168
For re-scoring of the docking energies, the cumulative CH/π interaction energy
score for each pose was combined with the docked energy obtained from ADV and VC,
and the new energies are used to re-rank the docked models. The rank of the model with
the lowest PRMSD (PRMSDmin) was calculated before and after rescoring. 48
Test systems
The test set consisted of 60 lectin-carbohydrate crystal structures extracted from
the PDB. Details about the systems are provided in Supplementary Information (SI). In
-1
-0.5
0
0.5
1
1.5
2
3 4 5 6
En
erg
y [
kc
al/
mo
l]
Distance, x [Å]
Model
where, 4ε = 1.84; σ = 3.26
x
91
the case of PDB files with multimers of the complex, the monomer with the lowest
average B-factor for the carbohydrate ligand was selected. The systems were prepared for
docking using AutoDockTools. 107
The docking grid box was centered on the binding site
of the protein, and docking was repeated ten times. The (x, y, z) co-ordinates of the grid
box center are provided in S6.3. Docking was performed ten times, and the lowest Pose
PRMSD model determined each time. The average value of these ten lowest PRMSD
values was calculated as the PRMSDmin in each case, as described in the work done by
Nivedha et al. 168
The requested number of output models was set to 20 for each of the 10
independent docking runs. Following docking, the CH/π interaction energy scoring
function was applied to each docked model. The algorithm to perform this post-docking
application of the CH/π function is described in detail in the following section.
Automatic Detection of CH/π interactions
The program reads in protein and carbohydrate structure files (PDB format) and
calculates the equation of the plane, ax + by + c for each pyranose ring using the co-
ordinates of the ring atoms. For the 5 carbon atoms in the ring, the positions of the
attached hydrogen atoms are calculated as shown in 48
. Using the H and C atomic co-
ordinates, CH vectors are generated, and the center of the plane of the carbohydrate ring
demarcated by atoms O5, C2, C3 and C5 is determined.
The program detects all Tyrosine, Tryptophan and Phenylalanine residues
according to their residue name in the PDB file, and stores the coordinates for all the
atoms comprising the aromatic rings. For each aromatic ring, the centroid is calculated
(one for each ring, therefore a total of two in the case of Tryptophan) and the distances
between each aromatic center and all centers of the carbohydrate ring planes are
92
determined (dcenters) (Figure 6.5b). If any of the dcenters distances calculated is found to be
less than 7Å, distances between the projections of the pyranose ring carbon atoms and the
centroids of the aromatic rings are calculated (dcp) (Figure 6.5c). For each dcp distance
calculated, if the value is less than 2.5Å and if the orientation if the CxHx bond is pointing
towards the aromatic ring, an aromatic CH/π interaction energy score is calculated for
that interaction using the distance between the carbon atom and the centroid of the
aromatic ring as input (dcπ), c) Summation of all CH/π interaction energy scores for the
entire carbohydrate molecule gives the total CH/π interaction energy score for that pose.
The performance of various CH/π interaction energy score coefficients were examined,
namely, 0.3, 0.5, 0.7, 1.0, 1.5 and 2.0.
93
Figure 6.5 Detection of CH/π interactions a.) An average position of the co-ordinates of
the atoms C2, O5 and O1 is determined. In order to find the vector C1H1, the negative of
the vector between points C1 and the average of atom positions C2, O5 and O1 (computed
in (a.)) is determined. b.) The distance between the centroid of the aromatic ring and the
plane of the carbohydrate ring delineated by atoms O5, C2, C3 and C5 is determined,
dcenters (≤ 7Å). c.) The carbon atoms in the carbohydrate ring are projected onto the
aromatic ring plane and the distances between each of these projections and the centroid
of the aromatic ring is determined, dcp (≤ 2.5Å). Shown in green are the CH bond vectors
pointing towards the aromatic ring (scored), and shown in red are the CH bond vectors
pointing away from the aromatic ring (not scored).
dcenters
dcp
a.)
b.) c.)
Avg. position of O5, C2 and O1
94
Docking protocol
The protein and ligand files were prepared using AutoDockTools (version 1.5.4).
107 All C-O bonds were allowed freedom to rotate in the carbohydrate ligands. According
to the protocol used in our earlier work, 168
the docking was performed 10 times using
AutoDock Vina and Vina-Carb. The 10 random seeds for each of the 10 docking runs
were explicitly defined in order to increase comparability between results.
Results and Discussion
The systems in the test set were divided based on the number of detected CH/π
interactions (n) (Table 6.1). Based on Boisbouvier’s work, 169
firstly, for each
carbohydrate ligand in the test set, distances between all the ring carbon atoms and the
centroids of all aromatic rings in the interacting protein were calculated, dCπ. For each
ligand, a CH/π interaction is considered as being present if the dCπ distance is ≤ 4.3Å.
Both programs had a greater success rate at making accurate binding mode predictions of
complexes with a greater number of intermolecular CH/π interactions in their binding
pockets. This result could be indicative of the crucial role that these types of interactions
play in determining the binding specificity of the carbohydrate ligands to their respective
receptors.
Amongst the systems for which the programs succeeded in accurately predicting
the ligand binding modes, the ranking of the accurate PRMSDmin poses improved after
the addition of the CH/π interaction energy score especially in cases with a greater
number of CH/π interactions (n≥2). In the case of systems with n≤1, the addition of the
CH/π interaction energy term decreased the ranking of the PRMSDmin pose. (Table 6.1)
95
Table 6.1 Average rank of accurate PRMSDmin pose predictions by ADV and VC1|2
before and after rescoring as a function of the CH/π interaction energy coefficients. The
systems are divided into different groups based on the number of detected CH/π
interactions.
VC1|2
Number
of CHs
(n)
PRMSDmin [Å] Rank Before
Rank After addition of CH/π
energies
CH/π Coefficient
0.3 0.5 0.7 1 1.5 2
0 (n=8) 0.96 3.88 4.24 4.45 4.90 5.30 5.75 6.09
1 (n=3) 0.50 1.33 1.37 1.60 1.77 2.83 3.80 4.63
2 (n=2) 0.98 2.95 2.45 2.35 2.35 2.10 2.05 2.15
3 (n=4) 1.13 2.98 2.80 2.68 2.60 2.53 2.58 2.60
4 (n=8) 0.95 2.32 2.02 2.00 1.91 1.91 1.92 2.01
5 (n=7) 1.30 2.83 2.14 1.84 1.69 1.60 1.59 1.51
6 (n=1) 0.87 2.00 2.00 2.00 1.00 1.00 1.00 1.00
7 (n=1) 1.13 1.00 1.00 1.00 1.00 1.00 1.00 1.00
9 (n=2) 0.54 5.55 1.00 1.00 1.05 1.15 1.25 1.40
10 (n=2) 0.72 7.85 1.40 1.20 1.00 1.20 1.00 1.00
12 (n=1) 1.00 7.10 3.20 2.20 1.90 2.00 2.10 2.40
16 (n=1) 0.42 1.00 1.00 1.00 1.00 1.00 1.00 1.00
total,
n=40 0.87 3.40 2.05 1.94 1.85 1.97 2.09 2.23
ADV
Number of CHs (n) PRMSDmin [Å] Rank Before Rank After
CH/π Coefficient = 0.7
0 (n=5) 1.09 4.04 4.58
1 (n=3) 0.78 1.00 2.00
2 (n=1) 0.62 1.00 1.00
3 (n=4) 1.12 2.25 2.10
4 (n=8) 0.86 2.54 1.88
5 (n=7) 1.31 4.23 2.90
6 (n=1) 0.79 2.00 1.00
96
7 (n=1) 1.26 3.30 6.30
9 (n=2) 0.63 4.75 3.80
10 (n=2) 0.95 5.00 1.00
12 (n=0) - - -
16 (n=1) 0.42 1.00 1.00
total, n=35 0.89 2.83 2.51
Amongst the various CH/π interaction energy coefficients tested, coefficient
values ≥ 0.7 resulted in the most improvement of pose ranking. Using higher values of
the coefficient on with systems with a lower number of CH/π interactions caused the
ranking of the PRMSDmin to decline. Therefore, based on the data obtained, a coefficient
value of 0.7 was chosen as the optimal value to rescore docked carbohydrate poses using
the CH/π interaction energy function.
In the case of systems 1VEO and 1ITC, the application of the CH/π interaction
energy scores, improved the ranking of the accurate PRMSDmin poses produced by 7.6
and 10 places respectively. For example, in the case of ADV1|2, the PRMSD of the top-
ranked pose before rescoring was 5.6Å, whereas after rescoring, the PRMSD of the top-
ranked pose became 0.9Å. (Figure 6.6)
97
Figure 6.6 The effect of applying the CH/π interaction to the top-ranked pose produced
by VC1|2 before and after rescoring. Shown in green is the crystal ligand, in white is the
top-ranked pose before rescoring (PRMSD = 5.6Å) and in blue is the top-ranked pose
after rescoring (PRMSD = 0.9Å).
Conclusions
The incorporation of the CH/π interaction energy term improved rankings of
accurate PRMSDmin pose predictions produced by both ADV and VC1|2. A CH/π
interaction energy coefficient of 0.7 produced optimal results for the test set considered.
In at least 40% of the total test systems, both docking programs were unable to produce
accurate binding mode predictions. The inclusion of the CH/π interaction energy function
within the VC scoring function can be expected to improve binding mode prediction, by
98
favorably scoring any such interaction between every docked pose generated by the
algorithm and the protein receptor. This would in turn decrease the probability of
rejection of such poses during the selection stage of the algorithm, before the final results
are assembled. Additionally, an appropriate CH/π interaction coefficient value should
also be included and its optimum value determined.
The algorithm for the detection of CH/π interactions can be further improved, for
instance, by considering the angle of the CH vectors with respect to the normal to the
aromatic ring plane. The test set can also be expanded to increase diversity, both with
respect to receptor and ligand types, and also with respect to systems with or without
intermolecular CH/π interactions. The consideration of pivotal CH/π interactions in
protein-carbohydrate complexes, and accounting for the energies that these non-covalent
interactions contribute to protein-carbohydrate binding can improve our binding mode
predictions, and help us better understand the factors influencing biological recognition.
Future Directions
The CH/π interaction energy function presented in this study is a first-order
approximation of an energy curve to model the interaction between an aliphatic CH-
group and an aromatic ring. The model can be further improved by using data from
available literature studying these interactions. In the 2006 study by Ringer et al., 155
the
authors performed QM calculations to estimate the contribution of CH/π interactions to
the total interaction energy in model systems using the Symmetry Adapted Perturbation
Theory (SAPT) 170
analysis. The authors performed computations on model systems
99
consisting of methane, as a model for aliphatic side-chains, and benzene, phenol or indole
as aromatic components of phenylalanine, tyrosine and tryptophan. The model systems
used are show in Figure 6.7.
Figure 6.7 Model Systems used by Ringer et al. to quantify CH/π interactions using
quantum mechanical calculations
They obtained potential energy curves by varying distances between the methane
molecule and the aromatic moieties in each model complex. We observed that the
reported energies were remarkably similar in terms of maximum interaction energy and
shape of the interaction potential, and have developed a generic CH/π function by
averaging the QM data and fitting a Lennard-Jones potential to the average values. Figure
6.8. This new energy function can be used to score CH/π interactions.
𝑉𝑥 = 𝜀 [((𝜎
𝑥)
12
− (𝜎
𝑥)
6
)] [6.1]
a. b. c. d.
100
where, x is distance between carbon atom and aromatic ring centroid.
Figure 6.8 a.) The individual interaction energy curves for the models (as described in
Figure 6.7) used by Ringer et al. 155
, alongside the average of the individual curves. b.)
The average curve (a) shown alongside the mathematical model used in the current study.
-2
-1
0
1
2
3
4
5
0 1 2 3 4 5 6 7
En
erg
y [
kc
al/
mo
l]
Distance [Å]
Model 6.4a
Model 6.4b
Model 6.4c
Model 6.4d
Average Curve
-2
-1
0
1
2
3
4
5
0 1 2 3 4 5 6 7
En
erg
y [
kc
al/
mo
l]
Distance [Å]
Average Curve
Model
a.)
b.)
101
7. CONCLUSIONS
In Chapter 4, the performances of three docking programs, namely AutoDock 3.0.5,
AutoDock 4.2 and AutoDock Vina were compared and AutoDock Vina had the most
success in accurately predicting binding modes of the carbohydrate ligands. A set of six
antibody-carbohydrate systems were used in this study. An algorithm for aligning the
antibody structures to the co-ordinate axes prior to docking based on the complementarity
determining regions was developed in order to increase comparability and reproducibility
of the results, in addition to being useful in an automated docking pipeline to be
implemented in GlycamWeb (www.glycam.com). A set of disaccharide models were
used to develop the Carbohydrate Intrinsic (CHI) energy functions, which score
oligosaccharide structures based on the conformations of their glycosidic linkages.
Application of the CHI energy functions resulted in an improvement of the rankings of
the accurate pose predictions. A survey of the PDB for carbohydrate crystal structures,
consisting of carbohydrates linked either covalently or non-covalently to various
receptors including lectins, antibodies, enzymes and carbohydrate binding modules,
revealed that the glycosidic torsion preferences of these structures were similar despite of
being bound to different kinds of substrates. A majority of the glycosidic torsion angles
fall into the same energy well, for each CHI energy curve. These energy functions can
therefore also aid in the refinement of experimental oligosaccharide structures.
The research presented in chapter 5 described the incorporation of the CHI energy
functions within AutoDock Vina’s scoring function, leading to the development of Vina-
Carb. The performance of Vina-Carb and the original AutoDock Vina were evaluated and
compared against a set of protein-carbohydrate systems consisting of lectins, antibodies,
102
carbohydrate binding modules and enzymes. Vina-Carb significantly improved the
conformations of the docked oligosaccharide poses. The integration of the CHI energy
functions within the program led to the penalization of unfavorable glycosidic torsion
angles, increasing the appearance of poses with energetically favorable glycosidic
linkages in the output. The improvements effected in the conformation of the
carbohydrate ligand automatically improved the chances of VC making accurate binding
mode predictions. The source code of Vina-Carb ver. 1.0 is available for download at:
http://glycam.org/publication-materials/vina-carb. The suite of CHI energy functions
could be further expanded to include 2,x linkages, and other standard sugar
conformations as needed.
In chapter 6, the role of CH/π interactions in binding specificity and affinity in
protein-carbohydrate complexes has been outlined. Previously available quantum
mechanical data describing the interaction between models of CH groups and aromatic
amino acids was used to obtained mathematical models describing the CH/π interactions
energy in such complexes. This CH/π interaction energy function, when applied to lectin-
carbohydrate docked complexes with significant CH/π contacts in the binding pocket,
improved the rankings of accurate binding mode predictions. This function can be
incorporated within Vina-Carb’s scoring functions so that the presence of CH/π
interactions is favored during docking, which could consequently further improve
oligosaccharide binding mode predictions.
103
8. REFERENCES
(1) Drickamer, K.; Taylor, M. E. Biology of Animal Lectins. Annu. Rev. Cell
Biol. 1993, 9, 237-264.
(2) Varki, A. Biological Roles of Oligosaccharides: All of the Theories are
Correct. Glycobiology 1993, 3, 97-130.
(3) Haltiwanger, R. S.; Lowe, J. B. Role of glycosylation in development.
Annu Rev Biochem 2004, 73, 491-537.
(4) Cobb, B. A.; Kasper, D. L. Coming of age: carbohydrates and immunity.
European Journal of Immunology 2005, 35, 352-356.
(5) Beuvery, E. C.; Vanrossum, F.; Nagel, J. COMPARISON OF THE
INDUCTION OF IMMUNOGLOBULIN-M AND IMMUNOGLOBULIN-G
ANTIBODIES IN MICE WITH PURIFIED PNEUMOCOCCAL TYPE-3 AND
MENINGOCOCCAL GROUP-C POLYSACCHARIDES AND THEIR PROTEIN
CONJUGATES. Infection and Immunity 1982, 37, 15-22.
(6) Brown, G. D.; Gordon, S. Immune recognition: A new receptor for [beta]-
glucans. Nature 2001, 413, 36-37.
(7) Rademacher, T. W.; Parekh, R. B.; Dwek, R. A. Glycobiology. Ann. Rev.
Biochem. 1988, 57, 785-838.
(8) Feizi, T. Carbohydrate differentiation antigens: probable ligands for cell
adhesion molecules. Trends in Biochemical Sciences 1991, 16, 84-86.
104
(9) Varki, A.; Cummings, R.; Esko, J.; Freeze, H.; Hart, G.; Marth, J.:
Essentials of Glycobiology; Cold Spring Harbor Laboratory Press: New York, 1999.
(10) Roth, Z.; Yehezkel, G.; Khalaila, I. Identification and quantification of
protein glycosylation. International Journal of Carbohydrate Chemistry 2012, 2012.
(11) Chou, C.-F.; Smith, A. J.; Omary, M. Characterization and dynamics of O-
linked glycosylation of human cytokeratin 8 and 18. Journal of Biological Chemistry
1992, 267, 3901-3906.
(12) Jackson, S. P.; Tijan, R. O-Glycosylation of Eukaryotic Transcription
Factors: Implications for Mechanisms of Transcriptional Regulation. Cell 1988, 55, 125-
133.
(13) Gerken, T. A.; Butenhof, K. J.; Shogren, R. Effects of Glycosylation on
the Conformation and Dynamics of O-Linked Glycoproteins: Carbon-13 NMR Studies of
Ovine Submaxillary Mucin. Biochem. 1989, 28, 5536-5543.
(14) Wittwer, A. J.; Howard, S. C.; Carr, L. S.; Harakas, N. K.; Feder, J.;
Parekh, R. B.; Rudd, P. M.; Dwek, R. A.; Rademacher, T. W. Effects of N-Glycosylation
on in Vitro Activity of Bowes Melanoma and Human Colon Fibroblast Derived Tissue
Plasminogen Activator. Biochem. 1989, 28, 7662-7669.
(15) Saso, L.; Silvestrini, B.; Guglielmotti, A.; Lahita, R.; Cheng, C. Y.
ABNORMAL GLYCOSYLATION OF ALPHA(2)-MACROGLOBULIN, A NON-
ACUTE-PHASE PROTEIN, IN PATIENTS WITH AUTOIMMUNE-DISEASES.
Inflammation 1993, 17, 465-479.
105
(16) Rook, G. A. W.; Steele, J.; Brealey, R.; Whyte, A.; Isenberg, D.; Sumar,
N.; Nelson, L.; Bodman, K. B.; Young, A.; Roitt, I. M.; Hutchison, F.; Williams, P.;
Scragg, I.; Edge, C. J.; Arkwright, P.; Ashford, D.; Wormald, M.; Rudd, P.; Redman, C.;
Dwek, R. A.; Rademacher, T. W. Changes in IgG Glycoform Levels may be Relevant to
Remission of Arthritis During Pregnancy.
(17) Rademacher, T. W.; Parekh, R. B.; Dwek, R. A.; Isenberg, D.; Rook, G.;
Axford, J. S.; Roitt, I. The Role of IgG Glycoforms in the Pathogenesis of Rheumatoid
Arthritis. Springer Semin. Immunopathol. 1988, 10, 231-249.
(18) Renaudineau, Y.; Saraux, A.; Dueymes, M.; Le Goff, P.; Youinou, P.
Importance of IgG Glycosylation in Rheumatoid Arthritis. Rev. Rhum. 1998, 65, 429-
433.
(19) Watson, M.; Rudd, P.; Bland, M.; Dwek, R.; Axford, J. S. Sugar Printing
Rheumatic Diseases: A Potential Method for Disease Differentiation Using
Immunoglobulin G Oligosaccharides. Arth Rheum 1999, 42, 1682-1690.
(20) Brockhausen, I.: Glycodynamics of mucin biosynthesis in gastrointestinal
tumor cells. In Glycobiology and Medicine; Axford, J. S., Ed.; Advances in Experimental
Medicine and Biology, 2003; Vol. 535; pp 163-188.
(21) Porowska, H.; Paszkiewicz-Gadek, A.; Anchim, T.; Wolczynski, S.;
Gindzienski, A. Inhibition of the O-glycan elongation limits MUC1 incorporation to cell
membrane of human endometrial carcinoma cells. International Journal of Molecular
Medicine 2004, 13, 459-464.
106
(22) Hakomori, S. I. Aberrant Glycosylation in Tumors and Tumor-Associated
Carbohydrate Antigens. Advances in Cancer Research 1989, 52, 257-331.
(23) Dennis, J. W.; Granovsky, M.; Warren, C. E. Glycoprotein glycosylation
and cancer progression. Biochimica et Biophysica Acta 1999, 1473, 21 - 34.
(24) Paulson, J. C.; Blixt, O.; Collins, B. E. Sweet Spots in Functional
Glycomics. Nat Chem Biol 2006, 2, 238-248.
(25) Murase, T.; Zheng, R. B.; Joe, M.; Bai, Y.; Marcus, S. L.; Lowary, T. L.;
Ng, K. K. S. Structural Insights into Antibody Recognition of Mycobacterial
Polysaccharides. Journal of Molecular Biology 2009, 392, 381-392.
(26) Kotra, L. P.; Golemi, D.; Amro, N. A. Dynamics of the
Lipopolysaccharide Assembly on the Surface of Escherichia coli. J. Am. Chem. Soc.
1999, 121, 8707-8711.
(27) Park, B. S.; Song, D. H.; Kim, H. M.; Choi, B.-S.; Lee, H.; Lee, J.-O. The
Structural Basis of Lipopolysaccharide Recognition by the TLR4–MD-2 Complex.
Nature 2009, 458, 1191-1195.
(28) Kelly, D. F.; Moxon, E. R.; Pollard, A. J. Haemophilus influenzae type b
conjugate vaccines. Immunology 2004, 113, 163-174.
(29) Darkes, M. J. M.; Plosker, G. L. Pneumococcal conjugate vaccine
(Prevnar; PNCRM7): a review of its use in the prevention of Streptococcus pneumoniae
infection. Paediatric drugs 2002, 4, 609-630.
107
(30) Vyas, N. K.; Vyas, M. N.; Chervenak, M. C.; Johnson, M. A.; Pinto, B.
M.; Bundle, D. R.; Quiocho, F. A. Molecular Recognition of Oligosaccharide Epitopes
by a Monoclonal Fab Specific for Shigella Flexneri Y Lipopolysaccharide: X-ray
Structures and Thernodynamics. Biochemistry 2002, 41, 13575-13586.
(31) Zdanov, A.; Li, Y.; Bundle, D. R.; Deng, S.-J.; MacKenzie, C. R.; Narang,
S. A.; Young, N. M.; Cygler, M. Structure of a Single-Chain Antibody Variable Domain
(Fv) Fragment Complexed with a Carbohydrate Antigen a 1.7-Å Resolution. Proc. Natl.
Acad. Sci. USA 1994, 91, 6423-6427.
(32) Bundle, D. R.; Baumann, H.; Brisson, J.-R.; Gagné, S. M.; Zdanov, A.;
Cygler, M. Solution Structure of a Trisaccharide-Antibody Complex: Comparison of
NMR Measurements with a Crystal Structure. Biochemistry 1994, 33, 5183-5192.
(33) Cygler, M.; Rose, D. R.; Bundle, D. R. Recognition of a Cell-Surface
Oligosaccharide of Pathogenic Salmonella by an Antibody Fab Fragment. Science 1991,
253, 442-445.
(34) Cygler, M.; Wu, S.; Zdanov, A.; Bundle, D. R.; Rose, D. R. Recognition
of a carbohydrate antigenic determinant of Salmonella by an antibody. Biochem Soc
Trans 1993, 21, 437-441.
(35) Vulliez-Le Normand, B.; Saul, F. A.; Phalipon, A.; Bélot, F.; Guerreiro,
C.; Mulard, L. A.; Bentley, G. A. Structures of synthetic O-antigen fragments from
serotype 2a Shigella flexneri in complex with a protective monoclonal antibody.
Proceedings of the National Academy of Sciences of the United States of America 2008,
105, 9976-9981.
108
(36) Roseman, S. Reflections on glycobiology. Journal of Biological
Chemistry 2001, 276, 41527-41542.
(37) Dwek, R. A. Glycobiology: Toward Understanding the Function of
Sugars. Chem Rev 1996, 96, 683-720.
(38) Fischer, E. Ueber die Configuration des Traubenzuckers und seiner
Isomeren. II. Berichte der deutschen chemischen Gesellschaft 1891, 24, 2683-2687.
(39) Juaristi, E.; Cuevas, G.: The anomeric effect; CRC press, 1994.
(40) Juaristi, E.; Cuevas, G. Recent Studies of the Anomeric Effect.
Tetrahedron 1992, 48, 5019-5087.
(41) Tvaroska, I.; Carver, J. P. The Anomeric, Reverse Anomeric and Exo-
Anomeric Effects in C-, N-, and S- Glycosyl Compounds. Manuscript.
(42) Anomeric Effect. Origin and Consequences; Szarek, W. A.; Horton, D.,
Eds.; American Chemical Society: Washington, D.C., 1979; Vol. 87, pp 132.
(43) Tvaroska, I.; Kozar, T. The Conformational Properties of the Glycosidic
Linkage. Carbohydr. Res. 1981, 90, 173-185.
(44) Kirby, A. J.: The Anomeric Effect and Related Stereoelectronic Effects at
Oxygen; Springer-Verlag: New York, 1983.
(45) Fuchs, B.; Schleifer, L.; Tartakovsky, E. Probing the Anomeric Effect1:
The Structural Criterion. Nouveau Journal de Chimie 1984, 8, 275-278.
109
(46) Tvaroska, I.; Bleha, T.: Anomeric and Exo-Anomeric Effects in
Carbohydrate Chemistry. In Adv. Carbohydr. Chem. Biochem.; Tipson, R. S., Derek, H.,
Eds.; Academic Press: New York, 1989; Vol. 47; pp 45-123.
(47) Agirre, J.; Davies, G.; Wilson, K.; Cowtan, K. Carbohydrate anomalies in
the PDB. Nature chemical biology 2015, 11, 303-303.
(48) Nivedha, A. K.; Makeneni, S.; Foley, B. L.; Tessier, M. B.; Woods, R. J.
Importance of ligand conformational energies in carbohydrate docking: Sorting the wheat
from the chaff. J Comput Chem 2013.
(49) Bourne, Y.; van Tilbeurgh, H.; Cambillau, C. Protein-Carbohydrate
Interactions. Curr. Opin. Struct. Biol. 1993, 3, 681-686.
(50) Vyas, N. K. Atomic Features of Protein-Carbohydrate Interactions. Curr.
Opin. Struct. Biol. 1991, 1, 732-740.
(51) Quiocho, F. A. Carbohydrate-Binding Proteins: Tertiary Structures and
Protein-Sugar Interactions. Ann. Rev. Biochem. 1986, 55, 287-315.
(52) Munske, G. R.; Krakauer, H.; Magnuson, J. A. Calorimetric study of
carbohydrate binding to concanavalin A. Archives of biochemistry and biophysics 1984,
233, 582-587.
(53) Bundle, D. R.; Young, N. M. Carbohydrate-protein Interactions in
Antibodies and Lectins. Curr. Opin. Struct. Biol. 1992, 2, 666-673.
(54) Quiocho, F. A.; Vyas, N. K. Novel Stereospecificity of the L-Arabinose-
Binding Protein. Nature 1984, 310, 381-386.
110
(55) Kozmon, S.; Matuska, R.; Spiwok, V. c.; Koca, J. Dispersion Interactions
of Carbohydrates with Condensate Aromatic Moieties: Theoretical Study on the CH–p
Interaction Additive Properties. Phys. Chem. Chem. Phys. 2011, 13, 14215–14222.
(56) Elgavish, S.; Shaanan, B. Lectin-Carbohydrate Interactions: Different
Folds, Common Recognition Principles. Trends Biochem. Sci. 1997, 22, 462-467.
(57) Vyas, N. K.; Vyas, M. N.; Quiocho, F. A. Sugar and signal-transducer
binding sites of the Escherichia coli galactose chemoreceptor protein. Science 1988, 242,
1290-1295.
(58) Quiocho, F. A. Protein-carbohydrate interactions: basic molecular
features. Pure and Applied Chemistry 1989, 61, 1293-1306.
(59) DeMarco, M. L.; Woods, R. J. Structural Glycobiology: A Game of
Snakes and Ladders. Glycobiology 2008, 18, 426-440.
(60) Woods, R. J.; Tessier, M. B. Computational Glycoscience: Characterizing
the Spatial and Temporal Properties of Glycans and Glycan—Protein Complexes. Curr.
Opin. Struct. Biol. 2010, 20, 575-583.
(61) Ghazarian, H.; Idoni, B.; Oppenheimer, S. B. A Glycobiology Review:
Carbohydrates, Lectins and Implications in Cancer Therapeutics. Acta Histochem. 2011,
113, 236-247.
(62) Hakomori, S. Tumor-associated carbohydrate antigens. Annu Rev Immunol
1984, 2, 103-126.
111
(63) Fukuda, M. Possible roles of tumor-associated carbohydrate antigens.
Cancer Research 1996, 56, 2237-2244.
(64) Eisen, M. B.; Sabesan, S.; Skehel, J. J.; Wiley, D. C. Binding of the
Influenza A Virus to Cell-Surface Receptors: Structures of Five Hemagglutinin–
Sialyloligosaccharide Complexes Determined by X-Ray Crystallography. Virology 1997,
232, 19-31.
(65) Suzuki, Y.; Nagao, Y.; Kato, H.; Matsumoto, M.; Nerome, K.; Nakajima,
K.; Nobusawa, E. Human influenza A virus hemagglutinin distinguishes
sialyloligosaccharides in membrane-associated gangliosides as its receptor which
mediates the adsorption and fusion processes of virus infection. Specificity for
oligosaccharides and sialic acids and the sequence to which sialic acid is attached.
Journal of Biological Chemistry 1986, 261, 17057-17061.
(66) Wiley, D. C.; Skehel, J. J. The structure and function of the hemagglutinin
membrane glycoprotein of influenza virus. Annual review of biochemistry 1987, 56, 365-
394.
(67) Magnani, J. L.; Ernst, B. From Carbohydrate Leads to Glycomimetic
Drugs. Nature Reviews Drug Discovery 2009, 8, 661-677.
(68) Dreitlein, W. B.; Maratos, J.; Brocavich, J. Zanamivir and oseltamivir:
Two new options for the treatment and prevention of influenza. Clinical Therapeutics
2001, 23, 327-355.
112
(69) Moscona, A. Neuraminidase Inhibitors for Influenza. N Engl J Med 2005,
353, 1363-1373.
(70) Kevin, H. M.: Galectins and Disease Implication for Targeted
Therapeutics. In American Chemical Society, 2012; pp 61-77.
(71) Tessier, M. B.; Grant, O. C.; Heimburg-Molinaro, J.; Smith, D.; Jadey, S.;
Gulick, A. M.; Glushka, J.; Deutscher, S. L.; Rittenhouse-Olson, K.; Woods, R. J.
Computational Screening of the Human TF-Glycome Provides a Structural Definition for
the Specificity of Anti-Tumor Antibody JAA-F11. PLoS One 2013, 8, e54874.
(72) Woods, R.; Yongye, A.: Computational Techniques Applied to Defining
Carbohydrate Antigenicity. In Anticarbohydrate Antibodies; Kosma, P., Müller-Loennies,
S., Eds.; Springer Vienna, 2012; pp 361-383.
(73) Kadirvelraj, R.; Gonzalez-Outeriño, J.; Foley, B. L.; Beckham, M. L.;
Jennings, H. J.; Foote, S.; Ford, M. G.; Woods, R. J. Understanding the Bacterial
Polysaccharide Antigenicity of Streptococcus agalactiae versus Streptococcus
pneumoniae. PNAS 2006, 103, 8149-8154.
(74) Yongye, A. B.; Gonzales Outeriño, J.; Glushka, J.; Schultheis, V.; Woods,
R. J. The Conformational Properties of Methyl α-(2,8)-di/trisialosides and Their N-acyl
Analogs: Implications for Anti-Neisseria meningitidis B Vaccine Design. Biochemistry
2008, 47, 12493–12514.
(75) Calarese, D. A.; Scanlan, C. N.; Zwick, M. B.; Deechongkit, S.; Mimura,
Y.; Kunert, R.; Zhu, P.; Wormald, M. R.; Stanfield, R. L.; Roux, K. H.; Kelly, J. W.;
113
Rudd, P. M.; Dwek, R. A.; Katinger, H.; Burton, D. R.; Wilson, I. A. Antibody Domain
Exchange Is an Immunological Solution to Carbohydrate Cluster Recognition. Science
2003, 300, 2065-2071.
(76) Dyekjær, J. D.; Woods, R. J.: Predicting the Three-Dimensional Structures
of Anti-Carbohydrate Antibodies: Combining Comparative Modeling and MD
Simulations. In NMR Spectroscopy and Computer Modeling of Carbohydrates. Recent
Advances. ; Vliegenthart, J. F. G., Woods, R. J., Eds.; ACS Symposium Series 930;
American Chemical Society: Washington, 2006; Vol. 930; pp 203-219.
(77) Gildersleeve, J.; Roach, T. A.; Li, Z.; Gildersleeve, J. C. Supplier-
Dependent Antiglycan Monoclonal Antibody Specificities: Comment On "High-
Throughput Carbohydrate Microarray Profiling of 27 Antibodies Demonstrates
Widespread Specificity Problems. Glycobiology 2008, 18, 746-756.
(78) Pincus, S. H.; Moran, E.; Maresh, G.; Jennings, H. J.; Pritchard, D. G.;
Egan, M. L.; Blixt, O. Fine specificity and cross-reactions of monoclonal antibodies to
group B streptococcal capsular polysaccharide type III. Vaccine 2012, 30, 4849-4858.
(79) Cooke, R. M.; Hale, R. S.; Lister, S. G.; Shah, G.; Weir, M. P. The
Conformation of the Sialyl Lewis X Ligand Changes upon Binding to E-Selectin.
Biochemistry 1994, 33, 10591-10596.
(80) Mahmoudian, M. The cannabinoid receptor: computer-aided molecular
modeling and docking of ligand. Journal of Molecular Graphics and Modelling 1997, 15,
149-153.
114
(81) Laederach, A.; Dowd, M. K.; Coutinho, P. M.; Reilly, P. J. Automated
Docking of Maltose, 2-Deoxymaltose, and Maltotetraose into the Soybean -Amylase
Active Site. Proteins: Structure, Function and Genetics 1999, 37, 166-175.
(82) Goodsell, D. S.; Morris, G. M.; Olson, A. J. Automated docking of
flexible ligands: Applications of autodock. Journal of Molecular Recognition 1996, 9, 1-
5.
(83) Sotriffer, C. A.; Flader, W.; Winger, R. H.; Rode, B. M.; Liedl, K. R.;
Varga, J. M. Automated Docking of Ligands to Antibodies: Methods and Applications.
Methods, Companion to Methods in Enzymol. 2000, 20, 280-291.
(84) Jorgensen, W. L. The many roles of computation in drug discovery.
Science 2004, 303, 1813-1818.
(85) Foley, B. L.; Tessier, M. B.; Woods, R. J. Carbohydrate Force Fields.
WIREs Computational Molecular Science 2011, 1-69.
(86) Laederach, A.; Reilly, P. J. Modeling Protein Recognition of
Carbohydrates. Proteins: Struct. Funct. Genet. 2005, 60, 591-597.
(87) Sapay, N.; Nurisso, A.; Imberty, A.: Simulation of Carbohydrates, from
Molecular Docking to Dynamics in Water. In Biomolecular Simulations; Monticelli, L.,
Salonen, E., Eds.; Methods in Molecular Biology; Humana Press, 2013; Vol. 924; pp
469-483.
115
(88) Bras, N. F.; Fernandes, P. A.; Ramos, M. J. Docking and molecular
dynamics studies on the stereoselectivity in the enzymatic synthesis of carbohydrates.
Theor. Chem. Acc. 2009, 122, 283-296.
(89) Laederach, A.; Reilly, P. J. Specific Empirical Free Energy Function for
Automated Docking of Carbohydrates to Proteins. J. Comput. Chem. 2003, 24, 1748-
1757.
(90) Hwang, M.-J.; Ni, X.; Waldman, M.; Ewig, C. S.; Hagler, A. T.
Derivation of Class II Force Fields. VI. Carbohydrate Compounds and Anomeric
Effects. Biopolymers 1998, 45, 435-468.
(91) Woods, R. J.; Edge, C. J.; Wormald, M. R.; Dwek, R. A.: GLYCAM_93:
A Generalized Parameter Set for Molecular Dynamics Simulations of Glycoproteins and
Oligosaccharides. Application to the Structure and Dynamics of a Disaccharide Related
to Oligomannose. In Complex Carbohydrates in Drug Research; Bock, K., Clausen, H.,
Krogsgaard-Larsen, P., Kofod, H., Eds.; Munksgaard: Copenhagen, Denmark, 1993; Vol.
36; pp 15-36.
(92) Weldon, A. J.; Tschumper, G. S. Intrinsic Conformational Preferences of
and an Anomeric-Like Effect in 1-Substituted Silacyclohexanes. Int. J. Quantum Chem.
2007, 107, 2261-2265.
(93) Woodcock, H. L.; Moran, D.; Pastor, R. W.; MacKerell, A. D.; Brooks, B.
R. Ab initio modeling of glycosyl torsions and anomeric effects in a model carbohydrate:
2-Ethoxy tetrahydropyran. Biophysical Journal 2007, 93, 1-10.
116
(94) Kirschner, K. N.; Yongye, A. B.; Tschampel, S. M.; González-Outeiriño,
J.; Daniels, C. R.; Foley, B. L.; Woods, R. J. GLYCAM06: A Generalizable
Biomolecular Force Field. Carbohydrates. J. Comput. Chem. 2008, 29, 622–655.
(95) Guvench, O.; Mallajosyula, S. S.; Raman, E. P.; Hatcher, E.;
Vanommeslaeghe, K.; Foster, T. J.; Jamison, F. W.; MacKerell, A. D. CHARMM
Additive All-Atom Force Field for Carbohydrate Derivatives and Its Utility in
Polysaccharide and Carbohydrate–Protein Modeling. J. Chem. Theory Comput. 2011, 7,
3162-3180.
(96) French, A. D.; Kelterer, A.-M.; Johnson, G. P.; Dowd, M. K.; Cramer, C.
J. HF/6-31G* Energy Surfaces for Disaccharide Analogs. J. Comput. Chem. 2001, 22, 65.
(97) French, A. D.; Dowd, M. K. Exploration of Disaccharide Conformations
by Molecular Mechanics. J. Mol. Struct. (Theochem) 1993, 286, 183-201.
(98) Talavera, A.; Eriksson, A.; Ökvist, M.; López-Requena, A.; Fernández,
Y.; Pérez, R.; Moreno, E.; Krengel, U. Crystal Structure of an Anti-ganglioside Antibody,
and Modeling of the Functional Mimicry of its NeuGc-GM3 Antigen by an Anti-
idiotypic Antibody. Molec. Immun. 2009, 46, 3466-3475.
(99) Paula, S.; Monson, N.; Ball, W. J., Jr. Molecular Modeling of Cardiac
Glycoside Binding by the Human Sequence Monoclonal Antibody 1B3. Proteins 2005,
60, 382-391.
117
(100) Blaszczyk-Thurin, M.; Murali, R.; Westerink, M. A. J.; Steplewski, Z.;
Sung Co, M.; Kieber-Emmons, T. Molecular recognition of the Lewis Y antigen by
monoclonal antibodies. Protein Engineering 1996, 9, 447-459.
(101) Vyas, N. K.; Vyas, M. N.; Chervenak, M. C.; Bundle, D. R.; Pinto, B. M.;
Quiocho, F. A. Structural Basis of Peptide-Carbohydrate Mimicry in an Antibody-
Combining Site. Proc. Natl. Acad. Sci. USA 2003, 100, 15023-15028.
(102) Agostino, M.; Sandrin, M. S.; Thompson, P. E.; Ramsland, P. A.; Yuriev,
E. Peptide Inhibitors of Xenoreactive Antibodies Mimic the Interaction Profile of the
Native Carbohydrate Antigens. Pep. Sci. 2011, 96, 193-206.
(103) Agostino, M.; Jene, C.; Boyle, T.; Ramsland, P. A.; Yuriev, E. Molecular
Docking of Carbohydrate Ligands to Antibodies: Structural Validation against Crystal
Structures. J. Chem. Inf. Model. 2009, 49, 2749-2760.
(104) Agostino, M.; Sandrin, M. S.; Thompson, P. E.; Yuriev, E.; Ramsland, P.
A. In Silico Analysis of Antibody-carbohydrate Interactions and its Application to
Xenoreactive Antibodies. Glycobiol. 2009, 47, 105-115.
(105) Lee, M.; Lloyd, P.; Zhang, X.; Schallhorn, J. M.; Sugimoto, K.; Leach, A.
G.; Sapiro, G.; Houk, K. N. Shapes of Antibody Binding Sites: Qualitative and
Quantitative Analyses Based on a Geomorphic Classification Scheme. J. Org. Chem.
2006, 71, 5082-5092.
118
(106) Huey, R.; Morris, G. M.; Olson, A. J.; Goodsell, D. S. A Semiempirical
Free Energy Force Field with Charge-Based Desolvation. J. Comput. Chem. 2007, 28,
1145-1152.
(107) Morris, G. M.; Huey, R.; Lindstrom, W.; Sanner, M. F.; Belew, R. K.;
Goodsell, D. S.; Olson, A. J. Autodock4 and AutoDockTools4: Automated Docking with
Selective Receptor Flexiblity. J. Comput. Chem. 2009, 30, 2785-2791.
(108) Trott, O.; Olson, A. J. AutoDock Vina: Improving the Speed and
Accuracy of Docking with a New Scoring Function, Efficient Optimization and
Multithreading. J. Comput. Chem. 2010, 31, 455-461.
(109) GLYCAM Web. http://www.glycam.org.
(110) Zhang, W. H., T; Schafmeister, C; Ross, W. S., Case, D. A. AmberTools
Version 1.0. 2008.
(111) Bernstein, F. C.; Koetzle, T. F.; Williams, G. J. B.; Meyer, E. F.; Brice, M.
D.; Rodgers, J. R.; Kennard, O.; Shimanouchi, T.; Tasumi, M. Protein Data Bank -
Computer-Based Archival File for Macromolecular Structures. J. Mol. Biol. 1977, 112,
535-542.
(112) French, A. D.; Johnson, G. P.; Cramer, C. J.; Csonka, G. I.
Conformational analysis of cellobiose by electronic structure theories. Carbohydrate
research 2012, 350, 68-76.
(113) Williams, T.; Kelley, C. gnuplot 4.6 (2013). URL http://www. gnuplot.
info/documentation. html.
119
(114) Martin, A.; Cheetham, J. C.; Rees, A. R. Modeling antibody hypervariable
loops: a combined algorithm. Proceedings of the National Academy of Sciences 1989, 86,
9268-9272.
(115) Martin, A. C. R.; Cheetham, J. C.; Rees, A. R. Molecular Modeling of
Antibody Combining Sites. Methods Enzymol. 1991, 203, 121-153.
(116) Wu, T. T.; Kabat, E. An analysis of the sequences of the variable regions
of Bence Jones proteins and myeloma light chains and their implications for antibody
complementarity. The Journal of experimental medicine 1970, 132, 211-250.
(117) Chothia, C.; Lesk, A. M. Canonical Structures for the Hypervariable
Regions of Immunoglobulins. J. Mol. Biol. 1987, 196, 901-917.
(118) Frisch, M.; Trucks, G.; Schlegel, H. B.; Scuseria, G.; Robb, M.;
Cheeseman, J.; Scalmani, G.; Barone, V.; Mennucci, B.; Petersson, G. Gaussian 09,
Revision A. 02, Gaussian. Inc., Wallingford, CT 2009, 200.
(119) Hill, A. D.; Reilly, P. J. A Gibbs free energy correlation for automated
docking of carbohydrates. Journal of computational chemistry 2008, 29, 1131-1141.
(120) French, A. D.; Tran, V. H.; Pérez, S.: Conformational Analysis of a
Disaccharide (Cellobiose) with the Molecular Mechanics Program (MM2). In Computer
Modeling of Carbohydrate Molecules; French, A. D., Brady, J. W., Eds.; American
Chemical Society: Washington, DC, 1990; Vol. Symposium Series 430; pp 191-212.
120
(121) Lütteke, T.; von der Lieth, C.-W. Data Mining the PDB for Glyco-
Related Data. Methods in Molecular Biology, Glycomics: Methods and Protocols 2009,
534, 293-310.
(122) Le Guilloux, V.; Schmidtke, P.; Tuffery, P. Fpocket: An Open Source
Platform For Ligand Pocket Detection. BMC Bioinform. 2009, 10, 168.
(123) Chang, M. W.; Ayeni, C.; Breuer, S.; Torbett, B. E. Virtual Screening for
HIV Protease Inhibitors: A Comparison of AutoDock 4 and Vina. PLoS ONE 2010, 5, 9.
(124) Bohne, A.; Lang, E.; von der Lieth, C.-W. W3-SWEET: Carbohydrate
Modeling By Internet. Journal of Molecular Modeling 1998, 4, 33-43.
(125) Fadda, E.; Woods, R. J. Molecular Simulations of Carbohydrates and
Protein–carbohydrate Interactions: Motivation, Issues and Prospects. Drug Discov. Today
2010, 15, 596-609.
(126) Imberty, A. Oligosaccharide Structures: Theory Versus Experiment. Curr.
Opin. Struct. Biol. 1997, 7, 617-623.
(127) Pauling, L.: The Nature of the Chemical Bond; Cornell university press
Ithaca, NY, 1960; Vol. 3.
(128) Damm, W.; Frontera, A.; Tirado-Rives, J.; Jorgensen, W. L. OPLS All-
Atom Force Field for Carbohydrates. J. Comput. Chem. 1997, 18, 1955-1970.
121
(129) Halperin, I.; Ma, B.; Wolfson, H.; Nussinov, R. Principles of Docking:
An Overview of Search Algorithms and a Guide to Scoring Functions. Proteins:
Structure, Function and Genetics 2002, 47, 409-443.
(130) Schulz-Gasch, T.; Stahl, M. Scoring functions for protein–ligand
interactions: a critical perspective. Drug Discovery Today: Technologies 2004, 1, 231-
239.
(131) Kerzmann, A.; Neumann, D.; Kohlbacher, O. SLICK − Scoring and
Energy Functions for Protein−Carbohydrate Interactions. J. Chem. Inf. Model. 2006, 46,
1635–1642.
(132) Kerzmann, A.; Fuhrmann, J.; Kohlbacher, O.; Neumann, D.
BALLDock/SLICK: A new method for protein-carbohydrate docking. Journal of
Chemical Information and Modeling 2008, 48, 1616-1625.
(133) Pettersen, E. F.; Goddard, T. D.; Huang, C. C.; Couch, G. S.; Greenblatt,
D. M.; Meng, E. C.; Ferrin, T. E. UCSF Chimera - A Visualization System for
Exploratory Research and Analysis. J. Comp. Chem. 2004, 25, 1605-1612.
(134) Humphrey, W.; Dalke, A.; Schulten, K. VMD - Visual Molecular
Dynamics. J. Molec. Graphics 1996, 14, 33-38.
(135) Cremer, D.; Pople, J. A. A General Definition of Ring Puckering
Coordinates. J. Am. Chem. Soc. 1975, 97, 1354-1358.
122
(136) Makeneni, S.; Foley, B. L.; Woods, R. J. BFMP: A Method for
Discretizing and Visualizing Pyranose Conformations. Journal of chemical information
and modeling 2014, 54, 2744-2750.
(137) Lütteke, T.; Bohne-Lang, A.; Loss, A.; Goetz, T.; Frank, M.; von der
Lieth, C.-W. GLYCOSCIENCES. de: an Internet portal to support glycomics and
glycobiology research. Glycobiology 2006, 16, 71R-81R.
(138) Sternberg, M. J. E.: Protein Structure Prediction: A Practical Approach,
1996.
(139) Schwede, T.: Computational Structural Biology: Methods and
Applications, 2008.
(140) Lawrence, S.; Feil, S.; Holien, J.; Kuiper, M.; Doughty, L.; Dolezal, O.;
Mulhern, T.; Tweten, R.; Parker, M. Manipulating the Lewis antigen specificity of the
cholesterol-dependent cytolysin lectinolysin. Frontiers in Immunology 2012, 3.
(141) Oyelaran, O.; Gildersleeve, J. C. Glycan Arrays: Recent Advances and
Future Challenges. Current Opinion in Chemical Biology 2009, 13, 406-413.
(142) Taylor, M. E.; Drickamer, K. Structural Insights into what Glycan Arrays
tell us About how Glycan-binding Proteins Interact with their Ligands. Glycobiology
2009, 19, 1155–1162.
(143) Muraki, M.; Morikawa, M.; Jigami, Y.; Tanaka, H. The roles of conserved
aromatic amino-acid residues in the active site of human lysozyme: a site-specific
123
mutagenesis study. Biochimica et Biophysica Acta (BBA) - Protein Structure and
Molecular Enzymology 1987, 916, 66-75.
(144) Luís, A. S.; Venditto, I.; Temple, M. J.; Rogowski, A.; Baslé, A.; Xue, J.;
Knox, J. P.; Prates, J. A.; Ferreira, L. M.; Fontes, C. M. Understanding how noncatalytic
carbohydrate binding modules can display specificity for xyloglucan. Journal of
Biological Chemistry 2013, 288, 4799-4809.
(145) Kerzmann, A.; Neumann, D.; Kohlbacher, O. SLICK - Scoring and
Energy Functions for Protein-Carbohydrate Interactions. J. Chem. Inf. Model. 2006, 46,
1635-1642.
(146) Kitago, Y.; Karita, S.; Watanabe, N.; Kamiya, M.; Aizawa, T.; Sakka, K.;
Tanaka, I. Crystal structure of Cel44A, a glycoside hydrolase family 44 endoglucanase
from Clostridium thermocellum. The Journal of biological chemistry 2007, 282, 35703-
35711.
(147) Tamres, M. Aromatic Compounds as Donor Molecules in Hydrogen
Bonding1. Journal of the American Chemical Society 1952, 74, 3375-3378.
(148) Reeves, L.; Schneider, W. Nuclear magnetic resonance measurements of
complexes of chloroform with aromatic molecules and olefins. Canadian Journal of
Chemistry 1957, 35, 251-261.
(149) Pimentel, G. C.; McClellan, A. L.: The Hydrogen Bond; W. H. Freeman
and Company: New York, 1960.
124
(150) Nishio, M. The CH/π hydrogen bond: Implication in chemistry. Journal of
Molecular Structure 2012, 1018, 2-7.
(151) Brandl, M.; Weiss, M. S.; Jabs, A.; Sühnel, J.; Hilgenfeld, R. C-h⋯π-
interactions in proteins. Journal of Molecular Biology 2001, 307, 357-377.
(152) Kitaura, K.; Morokuma, K. A new energy decomposition scheme for
molecular interactions within the Hartree‐Fock approximation. International Journal of
Quantum Chemistry 1976, 10, 325-340.
(153) Tsuzuki, S.; Honda, K.; Uchimaru, T.; Mikami, M.; Tanabe, K. The
Magnitude of the CH/π Interaction between Benzene and Some Model Hydrocarbons.
Journal of the American Chemical Society 2000, 122, 3746-3753.
(154) Laughrey, Z. R.; Kiehna, S. E.; Riemen, A. J.; Waters, M. L.
Carbohydrate−π Interactions: What Are They Worth? J. Am. Chem. Soc. 2008, 130,
14625–14633.
(155) Ringer, A. L.; Figgs, M. S.; Sinnokrot, M. O.; Sherrill, C. D. Aliphatic
C−H/π Interactions: Methane−Benzene, Methane−Phenol, and Methane−Indole
Complexes. The Journal of Physical Chemistry A 2006, 110, 10822-10828.
(156) Mohamed, M. N. A.; Watts, H. D.; Guo, J.; Catchmark, J. M.; Kubicki, J.
D. MP2, density functional theory, and molecular mechanical calculations of C–H···π
and hydrogen bond interactions in a cellulose-binding module–cellulose model system.
Carbohydrate Research 2010, 345, 1741-1751.
125
(157) Asensio, J. L.; Ardá, A.; Cañada, F. J.; Jiménez-Barbero, J. Carbohydrate–
Aromatic Interactions. Acc. Chem. Res. 2012, 46, 946-954.
(158) Muraki, M. The Importance of Ch / π Interactions to the Function of
Carbohydrate Binding Proteins. Protein and Peptide Letters 2002, 9, 195-209.
(159) Krapp, S.; Mimura, Y.; Jefferis, R.; Huber, R.; Sondermann, P. Structural
Analysis of Human IgG-Fc Glycoforms Reveals a Correlation Between Glycosylation
and Structural Integrity. J. Mol. Biol. 2003, 325, 979-989.
(160) Fernández-Alonso, M. d. C.; Cañada, F. J.; Jiménez-Barbero, J.; Cuevas,
G. Molecular Recognition of Saccharides by Proteins. Insights on the Origin of the
Carbohydrate−Aromatic Interactions. Journal of the American Chemical Society 2005,
127, 7379-7386.
(161) Thobhani, S.; Ember, B.; Siriwardena, A.; Boons, G.-J. Multivalency and
the Mode of Action of Bacterial Sialidases. J. Am. Chem. Soc. 2002, A-B.
(162) Kitov, P. I.; Bundle, D. R. On the nature of the multivalency effect: a
thermodynamic model. Journal of the American Chemical Society 2003, 125, 16271-
16284.
(163) Neumann, D.; Kohlbacher, O. In Tilte2009.
(164) Vyas, N. K.; Vyas, M. N.; Quiocho, F. A. Comparison of the Periplasmic
Receptors for L-Arabinose, D-Glucose/D-Galactose, and D-Ribose. J. Biol. Chem. 1991,
266, 5226-5237.
126
(165) Vardakou, M.; Flint, J.; Christakopoulos, P.; Lewis, R. J.; Gilbert, H. J.;
Murray, J. W. A family 10 Thermoascus aurantiacus xylanase utilizes arabinose
decorations of xylan as significant substrate specificity determinants. J Mol Biol 2005,
352, 1060-1067.
(166) Chávez, M. I.; Andreu, C.; Paloma, V.; Aboitiz, N.; Freire, F.; Groves, P.;
Asensio, J. L.; Asensio, G.; Muraki, M.; Cañada, F. J.; Jiménez-Barbero, J. On the
Importance of Carbohydrate-Aromatic Interactions for the Molecular Recognition of
Oligosaccharides by Proteins: NMR Studies of the Structure and Binding Affinity of
AcAMP2-like Peptides with Non-Natural Naphthyl and Fluoroaromatic Residues. Chem.
Eur. J. 2005, 11, 7060-7074.
(167) Wojciechowski, M.; Lesyng, B. Generalized Born Model: Analysis,
Refinement, and Applications to Proteins.
(168) Nivedha, A. K., Thieker F. David, Woods, R. J. Vina-Carb: Improving
Glycosidic Angles During Carbohydrate Docking. J. Chem. Theory Comput. 2015,
(Accepted).
(169) Plevin, M. J.; Bryce, D. L.; Boisbouvier, J. Direct detection of CH/π
interactions in proteins. Nat Chem 2010, 2, 466-471.
(170) Jeziorski, B.; Moszynski, R.; Szalewicz, K. Perturbation theory approach
to intermolecular potential energy surfaces of van der Waals complexes. Chem Rev 1994,
94, 1887-1930.
127
9. APPENDIX
Supplementary Information Chapter 4
S4.1. Editing Autodock Vina’s source code to give 100 docked poses
ADV’s source code was downloaded, and the variable par.mc.num_saved_mins in
main_procedure (in the file main.cpp) was set to 100, and the program was re-compiled.
S4.2. AD3 parameters
a) Docking parameters
outlev 1 # diagnostic output level
seed pid time # seeds for random generator
types COH # atom type names
move 1MFA_ligand.pdbq # small molecule
about 0.003 0.036 -0.094 # small molecule center
tran0 random # initial coordinates/A or random
quat0 random # initial quaternion
ndihe 15 # number of active torsions
dihe0 random # initial dihedrals (relative) or random
tstep 2.0 # translation step/A
qstep 50.0 # quaternion step/deg
dstep 50.0 # torsion step/deg
torsdof 7 0.3113 # torsional degrees of freedom and coeffiecent
intnbp_r_eps 4.00 0.0222750 12 6 # C-C lj
intnbp_r_eps 3.60 0.0257202 12 6 # C-O lj
intnbp_r_eps 3.00 0.0081378 12 6 # C-H lj
intnbp_r_eps 3.20 0.0297000 12 6 # O-O lj
intnbp_r_eps 2.60 0.0093852 12 6 # O-H lj
intnbp_r_eps 2.00 0.0029700 12 6 # H-H lj
rmstol 2.0 # cluster_tolerance/A
extnrg 1000.0 # external grid energy
e0max 0.0 10000 # max initial energy; max number of retries
ga_pop_size 200 # number of individuals in population
ga_num_evals 800000 # maximum number of energy evaluations
ga_num_generations 27000 # maximum number of generations
ga_elitism 1 # number of top individuals to survive to next
generation
ga_mutation_rate 0.02 # rate of gene mutation
ga_crossover_rate 0.8 # rate of crossover
ga_window_size 10 #
ga_cauchy_alpha 0.0 # Alpha parameter of Cauchy distribution
ga_cauchy_beta 1.0 # Beta parameter Cauchy distribution
set_ga # set the above parameters for GA or LGA
sw_max_its 300 # iterations of Solis & Wets local search
128
sw_max_succ 4 # consecutive successes before changing rho
sw_max_fail 4 # consecutive failures before changing rho
sw_rho 1.0 # size of local search space to sample
sw_lb_rho 0.01 # lower bound on rho
ls_search_freq 0.06 # probability of performing local search on
individual
set_sw1 # set the above Solis & Wets parameters
ga_run 100 # do this many hybrid GA-LS runs
analysis # perform a ranked cluster analysis
b) Grid parameters npts 70 70 100 # num.grid points in xyz
spacing 0.375 # spacing(A)
gridcenter 0.0 0.0 11.0 # xyz-coordinates or auto
smooth 0.5 # store minimum energy w/in rad(A)
dielectric -0.1465 # <0, AD4 distance-dep.diel;>0, constant
S4.3. AD4.2 parameters
a) Docking parameters
autodock_parameter_version 4.2 # used by autodock to validate parameter set
outlev 1 # diagnostic output level
intelec # calculate internal electrostatics
seed pid time # seeds for random generator
ligand_types C HD OA # atoms types in ligand
move 1MFD_ligand.pdbqt # small molecule
about 0.045 -0.063 -0.041 # small molecule center
tran0 random # initial coordinates/A or random
axisangle0 random # initial orientation
dihe0 random # initial dihedrals (relative) or random
tstep 2.0 # translation step/A
qstep 50.0 # quaternion step/deg
dstep 50.0 # torsion step/deg
torsdof 15 # torsional degrees of freedom
rmstol 2.0 # cluster_tolerance/A
extnrg 1000.0 # external grid energy
e0max 0.0 10000 # max initial energy; max number of retries
ga_pop_size 200 # number of individuals in population
ga_num_evals 800000 # maximum number of energy evaluations
ga_num_generations 27000 # maximum number of generations
ga_elitism 1 # number of top individuals to survive to next
generation
ga_mutation_rate 0.02 # rate of gene mutation
ga_crossover_rate 0.8 # rate of crossover
ga_window_size 10 #
ga_cauchy_alpha 0.0 # Alpha parameter of Cauchy distribution
ga_cauchy_beta 1.0 # Beta parameter Cauchy distribution
set_ga # set the above parameters for GA or LGA
sw_max_its 300 # iterations of Solis & Wets local search
sw_max_succ 4 # consecutive successes before changing rho
sw_max_fail 4 # consecutive failures before changing rho
sw_rho 1.0 # size of local search space to sample
sw_lb_rho 0.01 # lower bound on rho
ls_search_freq 0.06 # probability of performing local search on
individual
set_psw1 # set the above pseudo-Solis & Wets parameters
unbound_model bound # state of unbound ligand
ga_run 100 # do this many hybrid GA-LS runs
write_all # write all conformations in a cluster
analysis # perform a ranked cluster analysis
129
b) Grid parameters
npts 70 70 100 # num.grid points in xyz
spacing 0.375 # spacing(A)
gridcenter 0.0 0.0 11.0 # xyz-coordinates or auto
smooth 0.5 # store minimum energy w/in rad(A)
dielectric -0.1146 # <0, distance-dep.diel;>0, constant
S4.4. ADV Docking parameters
center_x = 0
center_y = 0
center_z = 11
size_x = 26.25
size_y = 26.25
size_z = 37.5
energy_range = 10
num_modes = 100
cpu = 8
S4.5. Comparison of the glycosidic torsion angles in the crystal carbohydrate
ligands to those in the Glycam ligands and their corresponding CHI energy
scores.
Syste
m
Torsion
angle
Experimental Glycam
Disaccharide Unit Torsion
angle
CHI
energy
Torsion
angle
CHI
energy
1MF
A
φ 71.5 0.0 60.0 0.4 DAbepα1-3DManpα-OMe
77.4 0.1 60.0 0.4 DGalpα1-2DManpα-OMe
ψ -135.1 0.1 -120.2 0.3 DAbepα1-3DManpα-OMe
-94.7 0.0 -118.1 0.3 DGalpα1-2DManpα-OMe
1MF
D
φ 76.1 0.0 60.0 0.4 DAbepα1-3DManpα-OMe
103.7 1.4 60.0 0.4 DGalpα1-2DManpα-OMe
ψ -139.4 0.1 -120.2 0.3 DAbepα1-3DManpα-OMe
-151.6 0.2 -118.1 0.3 DGalpα1-2DManpα-OMe
1UZ8
φ -66.7 0.0 -65.4 0.0 DGalpβ1-4DGlcpNAcβ-OMe
-82.7 0.2 -65.5 0.2 LFucpα1-3DGlcpNAcβ-OMe
ψ 128.0 0.3 125.1 0.3 DGalpβ1-4DGlcpNAcβ-OMe
-99.4 0.1 -101.1 0.1 LFucpα1-3DGlcpNAcβ-OMe
1M7
D φ -78.3 0.1 -60.0 0.4
(2-deoxy)LRhapα1-
3DGlcpNAcβ-OMe
130
-63.2 0.3 -60.0 0.4 LRhapα1-3DGlcpNAcβ-OMe
ψ -114.1 0.3 -120.2 0.3
(2-deoxy)LRhapα1-
3DGlcpNAcβ-OMe
111.2 0.2 120.1 0.3 LRhapα1-3DGlcpNAcβ-OMe
1S3K
φ
-77.8 0.1 -66.9 0.1 LFucpα1-3DGlcpNAcα-OH
-77.7 0.3 -66.4 0.0 DGalpβ1-4DGlcpNAcα-OH
-78.1 0.1 -69.4 0.1 LFucpα1-2DGalpβ
ψ
-103.4 0.1 -99.4 0.1 LFucpα1-3DGlcpNAcα-OH
139.3 0.3 127.8 0.3 DGalpβ1-4DGlcpNAcα-OH
140.4 0.3 125.5 0.3 LFucpα1-2DGalpβ
1M7I
φ
-90.2 1.0 -62.4 0.1 LRhapα1-3DGlcpNAcβ
-115.3 2.3 -69.5 0.1 LRhapα1-2LRhapα
-81.7 0.1 -69.0 0.1 LRhapα1-3LRhapα
-63.5 0.3 -65.9 0.2 LRhapα1-2LRhapα
ψ
53.4 0.9 112.4 0.3 LRhapα1-2LRhapα
-90.9 0.0 -114.8 0.3 LRhapα1-2LRhapα
121.0 0.3 114.7 0.3 LRhapα1-3LRhapα
149.3 0.1 112.3 0.3 LRhapα1-2LRhapα
Torsion angles are given in degrees; CHI energy scores are given in kcal/mol.
131
S4.6. CHI Energy Functions
Equation for the φ angle in α-linkages:
E(φ) =
2.977𝑒−
(φ+1.9949∗ 102)2
6.7781∗102 + 1.0225 ∗ 102𝑒−
(φ−1.706∗ 102)2
1.6968 ∗ 103 + 1.0745 ∗
101𝑒−
(φ+1.0531∗102)2
4.7246∗ 103 + 3.6735𝑒
−(φ−6.2012)2
1.3477 ∗ 103 + 2.061𝑒−
(φ−9.1655 ∗ 101)2
1.5 ∗ 103 +
6.1939𝑒−
(φ+2.2979∗ 101)2
2.1223 ∗ 103 − 2.1115𝑒−
(φ−8.3602∗ 101)2
1.2541∗ 103 − 9.8001 ∗ 101𝑒−
(φ−1.7001∗ 102)2
1.5987∗ 103
Equation for the φ angle in β-linkages:
E(φ) =
4. 5054 ∗ 102𝑒−
(φ+3.3077∗ 102)2
4.4498∗ 103 + 2.3712 ∗ 101𝑒−
(φ−3.0463∗ 102)2
8.3752 ∗ 103 +
5.9353𝑒−
(φ+1.5208 ∗ 102)2
6.0498 ∗ 103 + 2.2467 ∗ 101𝑒−
(φ+2.3516 ∗ 101)2
6.0690 ∗ 102 + 1.0036 ∗
101𝑒−
(φ−1.2096 ∗ 102)2
4.038∗ 103 − 1.8141 ∗ 101𝑒−
(φ+2.4268 ∗ 101)2
5.4305 ∗ 102 + 5.8823𝑒−
(φ−1.9632 ∗ 101)2
8.9793 ∗ 102 −
2.1283
Equation for the ψ angle in 1-2ax, 1-4ax and 1-3eq linkages:
132
E(ψ) = 4.6237𝑒−
(ψ−5.0456)2
5.0058 ∗ 103 + 4.6139𝑒−
(ψ−3.6249 ∗ 102)2
2.0906 ∗ 103 + 4.9419𝑒−
(ψ−1.212 ∗ 102)2
2.0938 ∗ 103 +
4.029 ∗ 10−1𝑒−
(ψ−2.4143 ∗ 102)2
4.5683∗ 102 + 7.9888 ∗ 10−1𝑒−
(ψ−6.8425∗ 101)2
6.7881 ∗ 102 + 2.2299 ∗
10−1𝑒−
(ψ−1.9293∗ 102)2
3.4725∗ 102 − 1.2565 ∗ 10−1
Equation for the ψ angle in 1-2eq, 1-4eq and 1-3ax linkages:
E(ψ) =
4.4681𝑒−
(ψ−10−30)2
1.2796 ∗ 103 + 4.382𝑒−
(ψ−3.5777 ∗ 102)2
6.0501∗ 103 + 2.8495 ∗ 102𝑒−
(ψ−1.4664 ∗ 102)2
1.5518 ∗ 103 +
4.7613𝑒−
(ψ−2.2068 ∗ 102)2
5.8929 ∗ 103 − 1.692 ∗ 102𝑒−
(ψ−1.4737 ∗ 102)2
1.7425 ∗ 103 − 1.1844 ∗
102𝑒−
(ψ−1.4606 ∗ 102)2
1.3598 ∗ 103 + 1.0220
133
S4.7. Plots showing agreement between the quantum mechanical data points
(black dots) and CHI energy curves (grey lines)
Root mean squared deviations (RMSDs) were calculated between the quantum
mechanical data points and corresponding data points on the CHI energy curve.
S4.8. GlyTorsion analysis
The various searches performed using the web-tool are tabulated below. S1 refers to the
non-reducing sugar residue, while S2 refers to the reducing sugar residue.
GlyTorsion searches performed for Figure 4.6 (main text):
Figure 8a Figure 8b
S. No. S1 linkage S2 S. No. S1 linkage S2
1 a-D-* 1-2 *-Manp* 1 b-D-* 1-2 *-Manp*
2 a-D-* 1-2 *-Galp* 2 b-D-* 1-2 *-Galp*
3 a-D-* 1-2 *-Glcp* 3 b-D-* 1-2 *-Glcp*
4 a-L-* 1-2 *-Manp* 4 b-L-* 1-2 *-Manp*
5 a-L-* 1-2 *-Galp* 5 b-L-* 1-2 *-Galp*
6 a-L-* 1-2 *-Glcp* 6 b-L-* 1-2 *-Glcp*
7 a-D-* 1-4 *-Manp* 7 b-D-* 1-4 *-Manp*
8 a-D-* 1-4 *-Galp* 8 b-D-* 1-4 *-Galp*
9 a-D-* 1-4 *-Glcp* 9 b-D-* 1-4 *-Glcp*
0
2
4
6
8
10
12
0 60 120 180 240 300 360
ΔE
[kca
l/m
ol]
φ [deg]
RMSD: 0.16
0
2
4
6
8
10
12
0 60 120 180 240 300 360
ΔE
[k
cal/
mol]
φ [deg]
RMSD: 0.02
0
1
2
3
4
5
6
7
8
0 60 120 180 240 300 360
ΔE
[k
cal/
mol]
ψ [deg]
RMSD: 0.02
0
1
2
3
4
5
6
7
8
0 60 120 180 240 300 360
ΔE
[k
cal/
mol]
ψ [deg]
RMSD: 0.04
134
10 a-L-* 1-4 *-Manp* 10 b-L-* 1-4 *-Manp*
11 a-L-* 1-4 *-Galp* 11 b-L-* 1-4 *-Galp*
12 a-L-* 1-4 *-Glcp* 12 b-L-* 1-4 *-Glcp*
13 a-D-* 1-3 *-Manp* 13 b-D-* 1-3 *-Manp*
14 a-D-* 1-3 *-Galp* 14 b-D-* 1-3 *-Galp*
15 a-D-* 1-3 *-Glcp* 15 b-D-* 1-3 *-Glcp*
16 a-L-* 1-3 *-Manp* 16 b-L-* 1-3 *-Manp*
17 a-L-* 1-3 *-Galp* 17 b-L-* 1-3 *-Galp*
18 a-L-* 1-3 *-Glcp* 18 b-L-* 1-3 *-Glcp*
Figure 8c Figure 8d
S. No. S1 linkage S2 S. No. S1 linkage S2
1 *-Manp* 1-2 *-D-Manp* 1 *-Manp* 1-2 *-D-Glcp*
2 *-Galp* 1-2 *-D-Manp* 2 *-Galp* 1-2 *-D-Glcp*
3 *-Glcp* 1-2 *-D-Manp* 3 *-Glcp* 1-2 *-D-Glcp*
4 *-Manp* 1-2 *-L-Manp* 4 *-Manp* 1-2 *-L-Glcp*
5 *-Galp* 1-2 *-L-Manp* 5 *-Galp* 1-2 *-L-Glcp*
6 *-Glcp* 1-2 *-L-Manp* 6 *-Glcp* 1-2 *-L-Glcp*
7 *-Manp* 1-3 *-D-Glcp* 7 *-Manp* 1-2 *-D-Galp*
8 *-Galp* 1-3 *-D-Glcp* 8 *-Galp* 1-2 *-D-Galp*
9 *-Glcp* 1-3 *-D-Glcp* 9 *-Glcp* 1-2 *-D-Galp*
10 *-Manp* 1-3 *-L-Glcp* 10 *-Manp* 1-2 *-L-Galp*
11 *-Galp* 1-3 *-L-Glcp* 11 *-Galp* 1-2 *-L-Galp*
12 *-Glcp* 1-3 *-L-Glcp* 12 *-Glcp* 1-2 *-L-Galp*
13 *-Manp* 1-3 *-D-Manp* 13 *-Manp* 1-4 *-D-Glcp*
14 *-Galp* 1-3 *-D-Manp* 14 *-Galp* 1-4 *-D-Glcp*
15 *-Glcp* 1-3 *-D-Manp* 15 *-Glcp* 1-4 *-D-Glcp*
16 *-Manp* 1-3 *-L-Manp* 16 *-Manp* 1-4 *-L-Glcp*
17 *-Galp* 1-3 *-L-Manp* 17 *-Galp* 1-4 *-L-Glcp*
18 *-Glcp* 1-3 *-L-Manp* 18 *-Glcp* 1-4 *-L-Glcp*
19 *-Manp* 1-3 *-D-Galp* 19 *-Manp* 1-4 *-D-Galp*
20 *-Galp* 1-3 *-D-Galp* 20 *-Galp* 1-4 *-D-Galp*
21 *-Glcp* 1-3 *-D-Galp* 21 *-Glcp* 1-4 *-D-Galp*
22 *-Manp* 1-3 *-L-Galp* 22 *-Manp* 1-4 *-L-Galp*
23 *-Galp* 1-3 *-L-Galp* 23 *-Galp* 1-4 *-L-Galp*
24 *-Glcp* 1-3 *-L-Galp* 24 *-Glcp* 1-4 *-L-Galp*
25 *-Manp* 1-4 *-D-Manp*
26 *-Galp* 1-4 *-D-Manp*
27 *-Glcp* 1-4 *-D-Manp*
28 *-Manp* 1-4 *-L-Manp*
29 *-Galp* 1-4 *-L-Manp*
135
30 *-Glcp* 1-4 *-L-Manp*
In GlyTorsion, the ψ torsion angle is defined with respect to the Cx+1 atom of the
reducing sugar, but since the CHI energy functions can only be applied to ψ torsion
angles defined w.r.t. the Cx-1 atom, the ψ angle values from the web-tool were used to
obtain the same torsion angle values defined w.r.t. the Cx-1 atom, by adding or subtracting
120° to the value depending on the D/L configuration of the reducing sugar.
S4.9. CHI energy score in kcal/mol of top-ranked poses, before and after
rescoring
System
AD3 AD4.2 ADV
Before
rescoring
After
rescoring
Before
rescoring
After
rescoring
Before
rescoring
After
rescoring
1MFA 1.31 1.31 1.48 1.48 2.84 1.73
1MFD 1.29 1.29 4.42 1.84 4.21 2.51
1UZ8 1.29 0.29 0.65 0.65 0.76 0.76
1M7D 3.14 1.19 11.60 0.93 1.11 1.11
1S3K 7.02 0.99 1.26 1.26 1.67 1.67
1M7I 7.72 2.33 18.45 4.31 10.20 2.31
S4.10. Rank of lowest PRMSD structure, before and after rescoring
S4.11. tLEaP input files to assemble ligands with deoxy sugars
System
AD3 AD4.2 ADV
Before
rescoring
After
rescoring
Before
rescoring
After
rescoring
Before
rescoring
After
rescoring
1MFA 55 67 10 14 1 1
1MFD 18 9 13 2 2 1
1UZ8 4 3 4 2 1 1
1M7D 10 4 7 2 1 1
1S3K 2 1 5 3 1 1
1M7I 1 3 31 50 1 1
136
a) Assembling the 1MFA/1MFD ligand
# ----- leaprc for loading the Glycam_04 force field
addPdbResMap {{ 0 "OLS" "NOLS" } { 1 "OLS" "COLS" } { 0 "OLT" "NOLT" } { 1 "OLT" "COLT" }
{ 0 "OLP" "NOLP" } { 1 "OLP" "COLP" } { 0 "HYP" "NHYP" } { 1 "HYP" "CHYP" }}
# load atom type hybridizations
addAtomTypes {{ "C" "C" "sp2" } { "CG" "C" "sp3" } { "CY" "C" "sp3" } { "H" "H"
"sp3" } { "H1" "H" "sp3" } { "H2" "H" "sp3" } { "HC" "H" "sp3" } { "HO" "H" "sp3" } {
"HW" "H" "sp3" } { "N" "N" "sp2" } { "OH" "O" "sp3" } { "OS" "O" "sp3" } { "O" "O"
"sp2" } { "O2" "O" "sp2" } { "OW" "O" "sp3" } {"OY" "O" "sp3" } { "S" "S" "sp3" }}
# load the main paramter set
parm94 = loadamberparams /usr/local/programs/amber10/dat/leap/parm/parm94.dat
glycam_06 = loadamberparams /usr/local/programs/glycam06/parameters/Glycam_06_current.dat
# load all prep files for polysaccharides
loadamberprep /usr/local/programs/glycam06/prep_files/Glycam_06_current.prep
# load lib files
loadOff solvents.lib
loadOff ions94.lib
amber_seq = sequence { OME ZMA }
set amber_seq tail amber_seq.2.O3
amber_seq=sequence { amber_seq 0AE }
set amber_seq tail amber_seq.2.O2
amber_seq=sequence { amber_seq 0LA }
impose amber_seq {3 2} { {H1 C1 O3 C3 -60.0} }
impose amber_seq {3 2} { {C1 O3 C3 H3 0.0} }
137
impose amber_seq {4 2} { {H1 C1 O2 C2 -60.0} }
impose amber_seq {4 2} { {C1 O2 C2 H2 0.0} }
charge amber_seq
savepdb amber_seq salmonella_glycam.pdb
b) Assembling the 1M7D ligand
# ----- leaprc for loading the Glycam_04 force field
addPdbResMap {{ 0 "OLS" "NOLS" } { 1 "OLS" "COLS" } { 0 "OLT" "NOLT" } { 1 "OLT" "COLT" }
{ 0 "OLP" "NOLP" } { 1 "OLP" "COLP" } { 0 "HYP" "NHYP" } { 1 "HYP" "CHYP" }}
# load atom type hybridizations
addAtomTypes {{ "C" "C" "sp2" } { "CG" "C" "sp3" } { "CY" "C" "sp3" } { "H" "H"
"sp3" } { "H1" "H" "sp3" } { "H2" "H" "sp3" } { "HC" "H" "sp3" } { "HO" "H" "sp3" } {
"HW" "H" "sp3" } { "N" "N" "sp2" } { "OH" "O" "sp3" } { "OS" "O" "sp3" } { "O" "O"
"sp2" } { "O2" "O" "sp2" } { "OW" "O" "sp3" } { "OY" "O" "sp3" } { "S" "S" "sp3" }}
# load the main paramter set
parm94 = loadamberparams /usr/local/programs/amber10/dat/leap/parm/parm94.dat
glycam_06 = loadamberparams /usr/local/programs/glycam06/parameters/Glycam_06_current.dat
# load all prep files for polysaccharides
loadamberprep /usr/local/programs/glycam06/prep_files/Glycam_06_current.prep
amber_seq = sequence { OME 3YB 2DR 0hA }
impose amber_seq {3 2} { {H1 C1 O3 C3 60.0} }
impose amber_seq {3 2} { {C1 O3 C3 H3 0.0} }
impose amber_seq {4 3} { {H1 C1 O3 C3 60.0} }
impose amber_seq {4 3} { {C1 O3 C3 H3 0.0} }
charge amber_seq
savepdb amber_seq 1M7D_glycam.pdb
138
Supplementary Information Chapter 5
S5.1. A comparison of PRMSDmin(5) poses obtained using ADV and VC1
amd VC2 at all 5 CHI-cutoff values (1 to 5) is shown in 1a. The corresponding
standard deviation values are depicted in 1b.
a.)
PDB
1OP
3 1UZ8 1M7D 1MFA 1MFD 1MFE 1S3K
1CL
Y
1CL
Z 1MFC 1M7I 1MFB 3BZ4
3C
6S
Ab
2G1
2
291-2G3-
A
SYA/J
6
scFv SE155-
4
Fab SE155-
4
SE155-
4
HU3S19
3
BR9
6
BR9
6
SE155-
4
SYA/J
6
SE155
-4 F22-4
F2
2-
4
AD
V 0.27 0.19 0.26 1.11 0.52 0.59 0.30 0.45 0.68 1.01 1.31 3.1 3.5 6.9
VC1|
0
0.25
4 0.18 0.281 0.947 0.39 0.986 0.293 0.54 0.43 1.76 0.748 1.1 1.1 2.8
VC1|
1 0.33 0.19 0.26 1.01 0.52 0.59 0.30 0.45 0.66 1.78 0.71 1.5 1.3 1.5
VC1|
2 0.21 0.18 0.26 1.00 0.52 0.59 0.30 0.45 0.68 1.72 0.54 1.4 1.7 2.4
VC1|
3 0.33 0.18 0.26 1.12 0.52 0.59 0.30 0.45 0.68 1.70 0.54 1.3 2.7 5.8
VC1|
4 0.21 0.19 0.26 1.23 0.52 0.59 0.30 0.45 0.68 1.04 0.69 1.8 3.2 5.7
VC1|
5 0.39 0.19 0.26 1.10 0.52 0.59 0.29 0.44 0.68 1.01 0.72 2.3 3.7 5.7
VC2|
0 0.32 0.18 0.51 0.96 0.37 0.89 0.19 0.65 0.59 1.88 0.80 1.3 1.2 3.6
VC2|
1 0.21 0.19 0.26 0.96 0.52 0.59 0.30 0.45 0.65 1.80 0.64 1.5 1.3 1.3
VC2|
2 0.21 0.19 0.26 1.12 0.52 0.59 0.30 0.45 0.68 1.72 0.54 1.4 1.6 1.9
VC2|
3 0.21 0.19 0.26 1.15 0.52 0.59 0.30 0.44 0.68 1.76 0.54 1.5 2.7 7.0
VC2|
4 0.33 0.19 0.26 1.16 0.52 0.59 0.30 0.46 0.69 0.97 0.76 1.9 3.3 5.8
VC2|
5 0.21 0.19 0.26 1.18 0.52 0.59 0.30 0.45 0.68 1.00 0.73 2.4 3.8 5.6
Those highlighted in green are less than 2.0 Å
b.)
PDB
1OP
3 1UZ8 1M7D 1MFA 1MFD 1MFE 1S3K
1CL
Y
1CL
Z 1MFC 1M7I 1MFB
3BZ
4
3C6
S
Ab
2G1
2
291-2G3-
A
SYA/J
6
scFv SE155-
4
Fab SE155-
4
SE155-
4
HU3S19
3
BR9
6
BR9
6
SE155-
4
SYA/J
6
SE155-
4
F22-
4
F22-
4
ADV 0.2 0.0 0.0 0.3 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.1 0.5 1.2
VC1|0 0.0 0.0 0.0 0.1 0.0 0.3 0.0 0.0 0.0 0.2 0.1 0.1 0.2 0.9
VC1|1 0.2 0.0 0.0 0.3 0.0 0.0 0.0 0.0 0.0 0.2 0.1 0.1 0.1 0.7
VC1|2 0.0 0.0 0.0 0.3 0.0 0.0 0.0 0.0 0.0 0.2 0.0 0.5 0.6 1.5
VC1|3 0.2 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.0 0.3 0.0 0.4 0.9 0.3
VC1|4 0.0 0.0 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.4 0.3 0.4 0.2 0.6
VC1|
5 0.3 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0.5 0.2 0.1
VC2|0 0.0 0.0 0.4 0.0 0.0 0.3 0.0 0.1 0.1 0.1 0.0 0.3 0.1 1.0
VC2|1 0.0 0.0 0.0 0.3 0.0 0.0 0.0 0.0 0.1 0.1 0.1 0.1 0.1 0.0
VC2|2 0.0 0.0 0.0 0.3 0.0 0.0 0.0 0.0 0.0 0.2 0.0 0.4 0.5 1.3
VC2|3 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.0 0.2 0.0 0.3 0.9 2.3
VC2|4 0.2 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.0 0.4 0.4 0.4 0.4 0.7
VC2|5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0.5 0.1 0.3
139
S5.2. A list of all protein-carbohydrate crystal structures employed in the
study. The carbohydrate sequences have been obtained using the pdbcare tool
in glycosciences.de.
S.No. PDB ID LINUCS
1 1hql a-D-Galp-(1-3)-b-D-Galp-(1-1)-methyl
2 1lte b-D-Galp-(1-4)-b-D-Glcp
3 1niv a-D-Manp-(1-3)-a-D-Manp-(1-1)-methyl
4 1qos a-D-GlcpNAc-(1-4)-b-D-GlcpNAc
5 1slt b-D-Galp-(1-4)-a-D-GlcpNAc
6 2aai b-D-Galp-(1-4)-b-D-Glcp
7 2ovu a-D-Manp-(1-2)-a-D-Manp-(1-1)-methyl
8 2pel b-D-Galp-(1-4)-a-D-Glcp
b-D-Galp-(1-4)-b-D-Glcp
9 3o0x a-D-Glcp-(1-3)-a-D-Manp-(1-2)-a-D-Manp-(1-2)-a-D-Manp
10 4g1r a-D-Manp-(1-2)-a-D-Manp
11 4g1s a-D-Manp-(1-2)-a-D-Manp
12 1jpc a-D-3,6-deoxy-Manp
a-D-Manp-(1-6)+
|
a-D-Manp
|
a-D-Manp-(1-3)+
13 1qot a-L-Fucp-(1-2)-b-D-Galp
a-L-Fucp-(1-2)-b-D-Galp-(1-4)-b-D-Glcp
14 1sl6 b-D-Galp-(1-4)+
|
a-D-GlcpNAc
|
140
a-L-Fucp-(1-3)+
15 2auy b-D-GlcpNAc-(1-2)-a-D-Manp-(1-3)-a-D-Manp-(1-1)-methyl
16 2bos a-D-Galp-(1-4)-b-D-Galp-(1-4)-b-D-Glcp-(1-1)-butyl
a-D-Galp-(1-4)-b-D-Galp-(1-4)-b-D-Glcp
a-D-Galp-(1-4)-b-D-Galp
17 2e6v a-D-Manp-(1-2)-a-D-Manp-(1-3)-b-D-Manp
18 2eal a-D-GalpNAc-(1-3)-b-D-GalpNAc-(1-3)-b-D-Galp
19 2g7c a-D-Galp-(1-3)-b-D-Galp-(1-4)-b-D-GlcpNAc
20 2vxj a-D-Galp-(1-3)-b-D-Galp-(1-4)-b-D-Glcp
a-D-Galp-(1-3)-b-D-Galp-(1-4)-a-D-Glcp
a-D-Galp-(1-4)-D-1-deoxy-Galp
21 3ef2 a-D-Galp-(1-3)+
|
b-D-Galp
|
a-L-Fucp-(1-2)+
a-D-Galp-(1-3)+
|
a-D-Galp
|
a-L-Fucp-(1-2)+
22 1gsl a-L-Fucp-(1-2)-b-D-Galp-(1-4)+
|
b-D-GlcpNAc-(1-1)-methyl
|
a-L-Fucp-(1-3)+
23 1j8r b-D-GalpNAc-(1-3)-a-D-Galp-(1-4)-b-D-Galp-(1-4)-b-D-Glcp
24 1led a-L-Fucp-(1-4)+
|
b-D-GlcpNAc-(1-1)-methyl
|
a-L-Fucp-(1-2)-b-D-Galp-(1-3)+
141
25 1ulf a-D-GalpNAc-(1-3)+
|
b-D-Galp-(1-4)-b-D-Glcp
|
a-L-Fucp-(1-2)+
26 1w8f b-D-GlcpNAc-(1-3)-b-D-Galp-(1-4)+
|
b-D-Glcp
|
a-L-Fucp-(1-3)+
b-D-Galp-(1-4)-b-D-GlcpNAc-(1-3)-b-D-Galp-(1-4)+
|
b-D-Glcp
|
a-L-Fucp-(1-3)+
b-D-Galp-(1-4)+
|
b-D-Glcp
|
a-L-Fucp-(1-3)+
27 2zhk b-D-Galp-(1-4)-b-D-GlcpNAc-(1-3)-b-D-Galp-(1-4)-b-D-GlcpNAc
28 3lek a-L-Fucp-(1-4)+
|
a-D-GlcpNAc
|
a-L-Fucp-(1-2)-b-D-Galp-(1-3)+
29 3o0w a-D-Glcp-(1-3)-a-D-Manp-(1-2)-a-D-Manp-(1-2)-a-D-Manp
30 3wg3 a-D-GalpNAc-(1-3)+
|
b-D-Galp-(1-4)-b-D-GlcpNAc
|
a-L-Fucp-(1-2)+
142
31 3zwe a-D-Galp-(1-3)+
|
b-D-Galp-(1-4)-b-D-Glcp
|
a-L-Fucp-(1-2)+
32 4gwi a-L-Fucp-(1-2)-b-D-Galp-(1-4)+
|
b-D-GlcpNAc
|
a-L-Fucp-(1-3)+
33 4mrd b-D-GlcpA-(1-3)-b-D-GlcpNAc-(1-4)-b-D-GlcpA-(1-3)-D-1-deoxy-GlcpNAc
34 1k9i b-D-GlcpNAc-(1-2)-a-D-Manp-(1-6)+
|
a-D-Manp
|
b-D-GlcpNAc-(1-2)-a-D-Manp-(1-3)+
35 1tei b-D-GlcpNAc-(1-2)-a-D-Manp-(1-6)+
|
a-D-Manp
|
b-D-GlcpNAc-(1-2)-a-D-Manp-(1-3)+
36 1zhs a-D-Manp-(1-6)+
|
b-D-Manp-(1-4)-b-D-GlcpNAc-(1-4)-b-D-GlcpNAc
|
a-D-Manp-(1-3)+
37 2o2l a-D-GalpNAc-(1-3)+
|
b-D-Galp-(1-4)+
| |
a-L-Fucp-(1-2)+ b-D-Glcp
|
143
a-L-Fucp-(1-3)+
38 2vco a-D-Manp-(1-6)+
|
b-D-Manp-(1-4)-b-D-GlcpNAc-(1-4)-b-D-GlcpNAc
|
a-D-Manp-(1-3)+
39 4gk9 a-D-Manp-(1-6)+
|
a-D-Manp-(1-6)+
| |
a-D-Manp-(1-3)+ b-D-Manp
|
a-D-Manp-(1-3)+
40 2wt2
b-D-Galp-(1-4)-b-D-GlcpNAc-(1-3)-b-D-Galp-(1-4)-b-D-GlcpNAc-(1-3)-b-D-Galp-(1-4)-b-
D-GlcpNAc
41 2zhm
b-D-Galp-(1-4)-b-D-GlcpNAc-(1-3)-b-D-Galp-(1-4)-b-D-GlcpNAc-(1-3)-b-D-Galp-(1-4)-b-
D-GlcpNAc
b-D-Galp-(1-4)-b-D-GlcpNAc-(1-3)-b-D-Galp-(1-4)-b-D-GlcpNAc
42 2vuz b-D-GlcpNAc-(1-2)-a-D-Manp-(1-6)+
|
b-D-Manp-(1-4)-b-D-GlcpNAc-(1-4)-b-D-GlcpNAc
|
b-D-GlcpNAc-(1-2)-a-D-Manp-(1-3)+
43 2ygm a-D-Galp-(1-3)+
|
b-D-Galp-(1-4)-b-D-GlcpNAc
|
a-L-Fucp-(1-2)+
44 1j84 b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp
45 2i74 a-D-Manp-(1-6)+
|
a-D-Manp-(1-6)-a-D-Manp
144
|
a-D-Manp-(1-3)+
46 2j1t a-L-Fucp-(1-2)-b-D-Galp-(1-4)+
|
b-D-GlcpNAc
|
a-L-Fucp-(1-3)+
47 2j1u a-D-GalpNAc-(1-3)+
|
b-D-Galp-(1-4)-b-D-Glcp
|
a-L-Fucp-(1-2)+
48 2j72 a-D-Glcp-(1-4)-a-D-Glcp-(1-4)-a-D-Glcp-(1-4)-a-D-Glcp
49 2j73 a-D-Glcp-(1-6)-a-D-Glcp-(1-4)-a-D-Glcp-(1-4)-a-D-Glcp
a-D-Glcp-(1-4)-a-D-Glcp-(1-4)-a-D-Glcp
50 3ach b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp
51 1gu3 b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp
52 1of4 b-D-Manp-(1-4)-b-D-Manp-(1-4)-b-D-Manp-(1-4)-b-D-Manp-(1-4)-b-D-Manp
53 1uxx b-D-Xylp-(1-4)-b-D-Xylp-(1-4)-b-D-Xylp-(1-4)-b-D-Xylp-(1-4)-b-D-Xylp
54 3aci b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp
55 2y6l b-D-Xylp-(1-4)-b-D-Xylp-(1-4)-b-D-Xylp-(1-4)-b-D-Xylp-(1-4)-b-D-Xylp
56 2yfz b-D-Galp-(1-2)-a-D-Xylp-(1-6)+
|
b-D-Glcp-(1-4)-b-D-Glcp
|
b-D-Glcp-(1-4)+
57 2zex b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp
58 1gwl
b-D-Manp-(1-4)-b-D-Manp-(1-4)-b-D-Manp-(1-4)-b-D-Manp-(1-4)-b-D-Manp-(1-4)-b-D-
Manp
59 1gwm b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-a-D-Glcp
60 1oh3 b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-a-D-Glcp
61 1oh4 a-D-Galp-(1-6)+
145
|
a-D-Galp-(1-6)+ b-D-Manp-(1-4)-b-D-Manp-(1-4)-b-D-Manp
| |
b-D-Manp-(1-4)+
|
b-D-Manp-(1-4)+
62 2ypj a-D-Xylp-(1-6)+
|
a-D-Xylp-(1-6)+ b-D-Glcp-(1-4)-b-D-Glcp
| |
b-D-Glcp-(1-4)+
|
a-D-Xylp-(1-6)-b-D-Glcp-(1-4)+
63 2eo7 b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Manp
64 2eex b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp
65 2ej1 b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp
66 1mfa a-D-3-deoxy-Fucp-(1-3)+
|
a-D-Manp-(1-1)-methyl
|
a-D-Galp-(1-2)+
67 1mfd a-D-3-deoxy-Fucp-(1-3)+
|
a-D-Manp-(1-1)-methyl
|
a-D-Galp-(1-2)+
68 1mfc a-D-3-deoxy-Fucp-(1-3)+
|
a-D-Manp-(1-4)-a-L-Rhap
|
a-D-Galp-(1-2)+
69 1mfb a-D-3-deoxy-Fucp-(1-3)+
146
|
a-D-Manp-(1-4)-a-L-Rhap
|
a-D-Galp-(1-2)-a-D-Manp-(1-4)-a-L-Rhap-(1-3)-a-D-Galp-(1-2)+
70 1mfe a-D-3-deoxy-Fucp-(1-3)+
|
a-D-Manp
|
a-D-Galp-(1-2)+
71 1s3k a-L-Fucp-(1-2)-b-D-Galp-(1-4)+
|
a-D-GlcpNAc
|
a-L-Fucp-(1-3)+
72 1uz8 b-D-Galp-(1-4)+
|
b-D-GlcpNAc-(1-1)-methyl
|
a-L-Fucp-(1-3)+
73 1m7d a-L-Rhap-(1-3)-a-L-2,6-deoxy-Glcp-(1-3)-b-D-GlcpNAc-(1-1)-methyl
74 1m7i a-L-Rhap-(1-2)-a-L-Rhap-(1-3)-a-L-Rhap-(1-3)-b-D-GlcpNAc-(1-2)-a-L-Rhap-(1-1)-methyl
75 3bz4 a-D-Glcp-(1-4)+
|
a-D-Glcp-(1-4)+ a-L-Rhap-(1-3)-b-D-GlcpNAc-(1-
1)-methyl
| |
a-L-Rhap-(1-3)-b-D-GlcpNAc-(1-2)-a-L-Rhap-(1-2)-a-L-Rhap-(1-3)+
|
a-L-Rhap-(1-2)-a-L-Rhap-(1-3)+
76 3c6s a-D-Glcp-(1-4)+
|
a-D-Glcp-(1-4)+ a-L-Rhap-(1-3)-b-D-GlcpNAc-(1-
147
2)-L-1-deoxy-Rhap
| |
a-L-Rhap-(1-3)-b-D-GlcpNAc-(1-2)-a-L-Rhap-(1-2)-a-L-Rhap-(1-3)+
|
a-L-Rhap-(1-2)-a-L-Rhap-(1-3)+
77 1cly a-L-Fucp-(1-2)-b-D-Galp-(1-4)+
|
b-D-GlcpNAc-(1-1)-<C10O2>
|
a-L-Fucp-(1-3)+
78 1clz a-L-Fucp-(1-2)-b-D-Galp-(1-4)+
|
b-D-GlcpNAc-(1-1)-<C10O2>
|
a-L-Fucp-(1-3)+
79 1op3 a-D-Manp-(1-2)-a-D-Manp
80 2eqd
b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-
4)-b-D-Glcp-(1-4)-b-D-Glcp
Systems which failed the positive control test.
1 1hlc b-D-Galp-(1-4)-b-D-Glcp
2 2dur a-D-Manp-(1-2)-a-D-Manp
3 3a0e a-D-Manp-(1-3)-a-D-Manp
4 2zx4 a-D-Galp-(1-4)-b-D-Galp-(1-4)-b-D-Glcp
5 1sl4 a-D-Manp-(1-6)+
|
a-D-Manp-(1-6)-a-D-Manp
|
a-D-Manp-(1-3)+
6 2zl6 a-L-Fucp-(1-2)-b-D-Galp-(1-3)-b-D-GlcpNAc-(1-3)-b-D-Galp-(1-4)-b-D-Glcp
7 1sl5 b-D-Galp-(1-4)+
|
b-D-GlcpNAc-(1-3)-b-D-Galp
148
|
a-L-Fucp-(1-3)+
8 4afd b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp
9 2j1v a-L-Fucp-(1-2)-b-D-Galp-(1-4)-b-D-GlcpNAc
10 2jcq
b-D-GlcpNAc-(1-4)-b-D-GlcpA-(1-3)-b-D-GlcpNAc-(1-4)-b-D-GlcpA-(1-3)-b-D-GlcpNAc-
(1-4)-b-D-GlcpA-(1-3)-b-D-GlcpNAc
11 2yg0 a-D-Galp-(1-6)-b-D-Manp
149
Supplementary Information Chapter 6
S6.1. PDB IDs of Lectin-Carbohydrate Systems used to test the CH/π
interaction energy function.
VC1|2
Accurate Predictions Made?
Yes No
1led 2auy 2gvy 1cxf
1veo 2bos 1eo5 1qos
3azs 2ovu 1hlc 2gou
2aai 1lte 2vuz 4gwi
1itc 4g1r 2zl6 2zx4
1gsl 1tei
1sl6
2pel 4g1s 1k9i
1gwl 2g7c 2vxj
1pj9 3o0w 1jpc
1hql 3o0x 1ulf
3pfz 1niv 1w8f
1mxd 2o2l 1sl4
3wg3 2dur 1vbp
1j8r 3lek 1sl5
2wdb 1zhs 3a0e
4gk9 2e6v
3ef2
1uh3
2zhk
1qot
2vco
2wt2
2zhm
3zwe
ADV
1veo 1cxf 1hlc 2gou
1led 1lte 2gvy 1niv
3azs 1qos 2vuz 1ulf
2aai 1tei 2pel 3lek
1gsl 1zhs 2vco 4gwi
1pj9 2auy 1eo5 2vxj
3zwe 2bos 4gk9 2zx4
1mxd 2dur
1jpc
3wg3 2g7c 2e6v
1hql 2o2l 1k9i
150
3pfz 2ovu 1sl6
1gwl 3o0w 1w8f
1itc 3o0x 1sl5
1qot 4g1r 1sl4
1uh3 4g1s 1vbp
2wdb
3a0e
3ef2
2zl6
2wt2
2zhk
1j8r
2zhm
S6.2. CHI energy functions to score the ω glycosidic torsion angle in
1,6-linkages.
Using the GlyTorsion tool available at www.glycosciences.de, the distribution of ω
glycosidic torsion angles for 1,6 linkages were collected.
0
5
10
15
20
25
30
35
160
140
120
100
80
60
40
20 0
-20
-40
-60
-80
-10
0
-12
0
-140
-16
0
-18
0
Dis
trib
uti
on
of
Str
uc
ture
s [
%]
ω [ ]
Equatorial O4
Axial O4
151
Figure S6.2: The distribution of ω glycosidic angles from carbohydrate crystal structures
with 1,6-linkages divided into two sets based on the position of attachment
(equatorial/axial) of the O4 atom to the reducing sugar.
The dataset was divided into two based on whether the O4 atom forming a part of the
reducing sugar is attached to the plane of the ring axially or equatorially. Based on the
three possible rotamers, three parabolas were joined to form the CHI energy equation for
these linkages. The relative energies of the minima for each of the three parabolic curves
were determined using the crystal structure data, by making use of the formula to
determine the Boltzmann factor by using the Boltzmann distribution for two states. The
equations thus obtained are as follows:
𝐸 = 𝑘 ∗ (𝑥 − 𝜃)2 + 𝑏
where, k = 0.0025.
When O4 is equatorially attached to the plane of the carbohydrate ring,
and if x ∈ [0 – 120], θ = 60, b = 0.21,
and if x ∈ (120 – 240], θ = 180, b = 1.39,
and if x ∈ (240 – 360], θ = 300, b = 0.
When O4 is axially attached to the plane of the carbohydrate ring,
and if x ∈ [0 – 120], θ = 60, b = 0,
and if x ∈ (120 – 240], θ = 180, b = 0.3,
and if x ∈ (240 – 360], θ = 300, b = 1.0.
152
S6.3. Gridbox Centers of Test Systems
1gsl 1veo 2g7c 2vuz 1tei 3azs 2zx4 4gk9
center_x 30.779 31.309 26.162 -10.236 37.595 25.555 39.534 5.942
center_y 14.986 13.767 14.95 -34.789 4.058 -0.181 20.789 2.899
center_z 32.023 40.95 4.176 3.215 -43.079 7.932 46.443 53.234
1jpc 2gou 2o2l 2wt2 1ulf 1eo5 3a0e 3pfz
center_x 56.063 8.405 15.19 22.86 60.265 85.067 20.259 -22.802
center_y 45.229 -8.714 -55.302 6.174 3.52 60.767 -37.806 30.906
center_z 25.17 22.111 -15.243 12.353 5.723 44.647 3.05 -19.125
1k9i 2gvy 2ovu 3ef2 1vbp 1j8r 3lek 1cxf
center_x 8.948 28.687 25.16 2.304 109.791 15.851 4.052 44.796
center_y 46.79 23.377 20.968 64.669 108.52 12.655 4.27 89.74
center_z 44.049 -19.395 21.031 3.633 127.115 62.124 19.329 47.379
1niv 1gwl 2vxj 3wg3 1w8f 1qos 3o0w 1hlc
center_x 113.529 8.285 16.514 54.63 -2.792 -15.015 -18.2 11.686
center_y 52.413 -10.664 -30.606 -40.764 4.137 43.955 -18.233 32.302
center_z 138.995 27.185 115.835 -3.111 34.364 31.99 14.748 84.653
1sl4 1pj9 2zhk 3zwe 2aai 1qot 3o0x 1hql
center_x 41.291 41.326 7.879 -5.218 31.715 77.957 31.2 23.809
center_y 28.944 85.533 42.544 11.166 41.265 8.421 11.371 8.536
center_z 16.404 46.799 7.094 -1.876 11.239 29.285 -16.511 12.031
1sl5 1uh3 2zhm 4g1r 2bos 1zhs 4gwi 1led
center_x 29.388 37.554 -5.21 -1.592 14.64 13.899 -3.557 30.923
center_y -7.683 25.783 -5.945 -7.611 3.347 18.066 -3.798 14.723
center_z -4.434 49.609 14.265 11.645 56.284 -12.694 -20.05 31.526
1sl6 2wdb 2zl6 4g1s 2dur 2pel 1itc 1lte
center_x 127.995 40.041 25.33 10.35 114.805 56.016 16.854 18.882
center_y 48.794 21.235 40.009 -12.03 16.861 27.792 -18.317 -2.108
center_z 38.953 14.007 -18.489 7.428 48.688 64.361 -12.045 36.227
2e6v 2vco 1mxd 2auy
center_x 36.602 51.993 24.973 3.166
center_y 9.703 2.279 31.414 43.303
center_z 39.651 -20.137 1.73 25.044
top related