conformational degeneracy restricts the effective information content of heparan sulfate

7
Conformational degeneracy restricts the effective information content of heparan sulfatew Timothy R. Rudd and Edwin A. Yates* Received 10th November 2009, Accepted 18th January 2010 First published as an Advance Article on the web 15th February 2010 DOI: 10.1039/b923519a The linear, sulfated polysaccharide heparan sulfate occupies a pivotal position in intercellular signalling events, interacting with numerous proteins on the cell surface and in the extracellular matrix. Its complex sequences suggest high potential information content but, despite extensive efforts, a clear relationship between its substitution pattern and biological activity remains elusive. This results from technical limitations, compounded by attempts to correlate substitution pattern directly with activity without considering other conformational factors. For a series of systematically modified analogues of heparan sulfate, the relationship between substitution pattern and experimental 13 C NMR chemical shifts, which act as reporters of the presence of conformational change, particularly around the glycosidic linkages, was explored through chemometric analysis. From analysis of the experimental data it was evident that wide linkage variation arose from O-sulfation in iduronate and N-sulfation in glucosamine residues but, their effects were distinct, while 6-O-sulfation had much less impact. Models of saccharide sequences showed that the maximum spread of variation in glycosidic linkages occurred before maximum sequence diversity and revealed a highly degenerate system: a fraction of possible sequences is sufficient to provide diverse backbone conformations to satisfy particular protein binding requirements. The unique information content potentially available in HS sequences, defined ultimately by conformation, is vastly inferior to the potential sequence diversity. 1. Introduction The glycosaminoglycan, heparan sulfate, forms an important nexus between mammalian cells and their immediate environment, the extracellular matrix, and is therefore of major biological and potential medical significance. Heparan sulfate (HS) has become the focus of considerable interest following its implication in many diverse biological and medical functions, ranging from the apparently structure-specific interaction with antithrombin (AT) and subsequent inhibition of factor Xa, to the much less specific interaction with thrombin (factor IIa), as well as to a wide variety of other proteins, 1 which include the Alzheimer’s b-secretase, 2,3 fibroblast growth factors and receptors (FGF/FGFRs), 4–7 other growth factors, such as GDNF, 8,9 components of the WNT signalling pathway, 10 microbial surface proteins, including the viral coat protein of herpes simplex virus (HSV), 11,12 HIV 13–15 and the inhibition of microbial attachment. 16 Many of these examples involve interactions between HS chains and single proteins, others require formation of ternary complexes and might be expected to exhibit higher levels of structural specificity. However, the search for an understanding of structure–function relation- ships has, so far, proved elusive and remains the subject of much debate. Only the interactions between AT and a relatively restricted series of pentasaccharides in heparin 17 or that between the HSV coat protein and 3-O-sulfated structures in HS 11 approach the term ‘‘specific’’ in the accepted biochemical sense. Some hold the view that specificity is high, but conclude this on the basis of a set of test compounds of limited sequence variety. Others speculate that HS activities have little or no structural specificity and that their properties are due to general charge density considerations. 18 For the vast majority of HS binding proteins, a broad range of binding and activities are observed experimentally as the sequence is varied and School of Biological Sciences, University of Liverpool, Liverpool, UK L69 7ZB. E-mail: [email protected]; Tel: +44 (0)151-795-4429 w Electronic supplementary information (ESI) available: Table S1. Systematically modified heparins used in the analysis. Table S2. 13 C chemical shift assignments (ppm) for 12 chemically modified heparins. Table S3. The relative contribution of the first three principal components (c1 (accounting for 38.8% of the variance in the data), c2 (22.3%) and c3 (18.5%) to the range of 13 C chemical shift values observed at the linkage positions among the 12 modified polysaccharides. Fig. S1. Combinatorial modelling of A-1–I-4 linkages, then I-1–A-4 linkages; linkage variation against number of sulfates, kernel density plots of the linkage variation distributions and variation of the linkage distance at every level of sulfation for 2–16 residues. Table S4. Analysis of the variation in individual glycosidic linkages from combinatorial modelling of all possible oligosaccharide stretches from 2 to 16 residues, linkages A-1, I-4 and I-1, A-4 individually. Fig. S2. Distribution of linkage variation (r.s.s. distances) along all possible sequence combinations for 4 and 6 residues, respectively. Fig. S3. Variability of linkage positions in all possible tetrasaccharide stretches at each linkage position. Table S5. Breakdown of individual linkage variation for all possible sequences of 4 residues. Fig. S4A. Dissimilarities between all stretches of 4 residues running in the same direction. Fig. S4B. Dissimilarities between all stretches of 4 residues running in opposite directions. Fig. S5. Histogram of the dissimilarity values for all stretches of 4 residues and 6 residues running in the same direction and the comparison of forward and backwards measuring dissimilarity along the sequence, rather than overall variation in the linkage. See DOI: 10.1039/b923519a 902 | Mol. BioSyst., 2010, 6, 902–908 This journal is c The Royal Society of Chemistry 2010 PAPER www.rsc.org/molecularbiosystems | Molecular BioSystems Published on 15 February 2010. Downloaded by Johns Hopkins University on 24/09/2013 11:36:55. View Article Online / Journal Homepage / Table of Contents for this issue

Upload: edwin-a

Post on 15-Dec-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Conformational degeneracy restricts the effective information

content of heparan sulfatew

Timothy R. Rudd and Edwin A. Yates*

Received 10th November 2009, Accepted 18th January 2010

First published as an Advance Article on the web 15th February 2010

DOI: 10.1039/b923519a

The linear, sulfated polysaccharide heparan sulfate occupies a pivotal position in intercellular

signalling events, interacting with numerous proteins on the cell surface and in the extracellular

matrix. Its complex sequences suggest high potential information content but, despite extensive

efforts, a clear relationship between its substitution pattern and biological activity remains elusive.

This results from technical limitations, compounded by attempts to correlate substitution pattern

directly with activity without considering other conformational factors. For a series of

systematically modified analogues of heparan sulfate, the relationship between substitution

pattern and experimental 13C NMR chemical shifts, which act as reporters of the presence of

conformational change, particularly around the glycosidic linkages, was explored through

chemometric analysis. From analysis of the experimental data it was evident that wide linkage

variation arose from O-sulfation in iduronate and N-sulfation in glucosamine residues but, their

effects were distinct, while 6-O-sulfation had much less impact. Models of saccharide sequences

showed that the maximum spread of variation in glycosidic linkages occurred before maximum

sequence diversity and revealed a highly degenerate system: a fraction of possible sequences is

sufficient to provide diverse backbone conformations to satisfy particular protein binding

requirements. The unique information content potentially available in HS sequences, defined

ultimately by conformation, is vastly inferior to the potential sequence diversity.

1. Introduction

The glycosaminoglycan, heparan sulfate, forms an important

nexus betweenmammalian cells and their immediate environment,

the extracellular matrix, and is therefore of major biological

and potential medical significance. Heparan sulfate (HS) has

become the focus of considerable interest following its

implication in many diverse biological and medical functions,

ranging from the apparently structure-specific interaction with

antithrombin (AT) and subsequent inhibition of factor Xa, to

the much less specific interaction with thrombin (factor IIa), as

well as to a wide variety of other proteins,1 which include the

Alzheimer’s b-secretase,2,3 fibroblast growth factors and

receptors (FGF/FGFRs),4–7 other growth factors, such as

GDNF,8,9 components of the WNT signalling pathway,10

microbial surface proteins, including the viral coat protein of

herpes simplex virus (HSV),11,12 HIV 13–15 and the inhibition

of microbial attachment.16 Many of these examples involve

interactions between HS chains and single proteins, others

require formation of ternary complexes and might be expected

to exhibit higher levels of structural specificity. However, the

search for an understanding of structure–function relation-

ships has, so far, proved elusive and remains the subject of

much debate. Only the interactions between AT and a

relatively restricted series of pentasaccharides in heparin17

or that between the HSV coat protein and 3-O-sulfated

structures in HS11 approach the term ‘‘specific’’ in the accepted

biochemical sense.

Some hold the view that specificity is high, but conclude this

on the basis of a set of test compounds of limited sequence

variety. Others speculate that HS activities have little or no

structural specificity and that their properties are due to

general charge density considerations.18 For the vast majority

of HS binding proteins, a broad range of binding and activities

are observed experimentally as the sequence is varied and

School of Biological Sciences, University of Liverpool, Liverpool,UK L69 7ZB. E-mail: [email protected]; Tel: +44 (0)151-795-4429w Electronic supplementary information (ESI) available: Table S1.Systematically modified heparins used in the analysis. Table S2. 13Cchemical shift assignments (ppm) for 12 chemically modified heparins.Table S3. The relative contribution of the first three principal components(c1 (accounting for 38.8% of the variance in the data), c2 (22.3%) andc3 (18.5%) to the range of 13C chemical shift values observed at thelinkage positions among the 12 modified polysaccharides. Fig. S1.Combinatorial modelling of A-1–I-4 linkages, then I-1–A-4 linkages;linkage variation against number of sulfates, kernel density plots of thelinkage variation distributions and variation of the linkage distance atevery level of sulfation for 2–16 residues. Table S4. Analysis of thevariation in individual glycosidic linkages from combinatorialmodelling of all possible oligosaccharide stretches from 2 to 16residues, linkages A-1, I-4 and I-1, A-4 individually. Fig. S2. Distributionof linkage variation (r.s.s. distances) along all possible sequencecombinations for 4 and 6 residues, respectively. Fig. S3. Variabilityof linkage positions in all possible tetrasaccharide stretches at eachlinkage position. Table S5. Breakdown of individual linkage variationfor all possible sequences of 4 residues. Fig. S4A. Dissimilaritiesbetween all stretches of 4 residues running in the same direction.Fig. S4B. Dissimilarities between all stretches of 4 residues running inopposite directions. Fig. S5. Histogram of the dissimilarity values forall stretches of 4 residues and 6 residues running in the same directionand the comparison of forward and backwards measuring dissimilarityalong the sequence, rather than overall variation in the linkage. SeeDOI: 10.1039/b923519a

902 | Mol. BioSyst., 2010, 6, 902–908 This journal is �c The Royal Society of Chemistry 2010

PAPER www.rsc.org/molecularbiosystems | Molecular BioSystems

Publ

ishe

d on

15

Febr

uary

201

0. D

ownl

oade

d by

Joh

ns H

opki

ns U

nive

rsity

on

24/0

9/20

13 1

1:36

:55.

View Article Online / Journal Homepage / Table of Contents for this issue

no simple relationship between substitution pattern and

activity usually emerges. The field has, therefore, reached

something of an impasse, with both mechanistic insight and

progress towards exploiting the many medical and pharma-

ceutical applications of HS derivatives being effectively

hampered by the lack of understanding of structure–function

relationships.

Heparan sulfate and its close structural analogue, heparin,

as well as their chemically modified derivatives, share a

common structural backbone, based on a repeating 1,4-linked

disaccharide unit [Scheme 1] comprising a uronic acid (either

b-D-GlcA or a-L-IdoA) and a-D-glucosamine, with varying

patterns of sulfation, at position-2 of the uronic acid and/or

-6 of glucosamine with either N-acetyl, N-sulfonamido

(N-sulfate) or free amino groups at position-2. Other, rarer

sulfations can also occur, most notably 3-O-sulfation in

glucosamine and 2-O-sulfation in GlcA residues. It has been

speculated that HS may have high information content, an

idea that is based on its potentially vast sequence diversity but,

the extent to which this is exploited in nature and if it is, the

nature of the relationships between substitution pattern,

structure and function, remain important questions. Where

the conformational details have not been specifically pursued

by in-depth studies on small numbers of well-defined oligo-

saccharides, there has been a tendency to treat these molecules

as comprising linear chains with appended charged groups

(there is often an emphasis on sulfates), while the backbone

geometry, although likely to play a major role in defining the

binding and hence activity of HS with proteins, has largely

been overlooked. Indeed, it can be argued that providing the

ability for HS chains to adopt appropriate backbone geometries

is a prerequisite to disporting the charged groups in suitable

geometric arrangements.

A first step towards understanding these molecules and

hence their complex interactions with proteins must be to

delineate the relationship between their substitution patterns

and conformational properties, which may help provide an

explanation for experimentally observed structure–function

relationships. Without doubt, the most informative current

method for the study of conformation in solution is nuclear

magnetic resonance (NMR) spectroscopy and a considerable

body of work has pursued this aim, usually for individual

oligosaccharides8,11,19–25 or heparin-related polysaccharides.26–28

However, there are a huge number of combinations of

substitution patterns for even modestly sized oligosaccharides

and the extent to which this diversity is exploited, or indeed,

the extent to which it needs to be exploited in nature, is not

known. In the present article, general characteristics of this

system, which underlie the relationship of sequence diversity

to information content, are sought and the latter issue is

addressed.

Individual saccharides are very difficult to purify from

natural sources and tackling the problem through studying

the conformation of the many synthetic oligosaccharides that

would be required is currently difficult to envisage for practical

reasons. Here, the nature of the relationship between substitution

(O-sulfation, N-sulfation, N-acetylation) and the observed13C NMR chemical shift patterns in 12 systematically modified

polysaccharides are examined, employing multivariate statistical

techniques, with the aim of revealing the underlying relationships

between substitution pattern and changes in conformation,

particularly at the glycosidic linkages, which may help account

for the activities of HS saccharides.

2. Results and discussion

The experimental 13C NMR chemical shift data of 12

model polysaccharides have been analysed to determine the

extent to which the ability to vary the substitution pattern at

each of the three main positions of substitution: position-2 of

iduronate (I-2), position-2 of glucosamine (A-2) or position-6

of glucosamine (A-6), either alone, or in combination, affects

the overall degree of structural variability observed, parti-

cularly in the linkages (defined by A-4 to I-1 and I-4 to A-1)

but also at I-3 and I-5, which report conformational changes

in iduronate residues [Table 2]. These seemingly esoteric

questions are of fundamental importance to an understanding

of HS–heparin structure–function relationships because the

relative structural significance, in terms of backbone geometry,

of the three modifications which mimic the biosynthetic steps

(2- or 6-O-sulfation in IdoA or GlcN, or N-sulfation/

N-acetylation in GlcN) can be determined. This will be a

crucial first step in relating structure to activity in a way that

moves beyond simple correlations between sulfation and

activity.

The effect of sulfation at I-2 or A-2 (component 1 or 2) has

significant effects at I-3 and I-5, indicative of conformational

change [Table 1], whereas sulfation at A-6 (component 3) has

negligible effect. Assignment of the 13C NMR spectra28 was

carried out using multi-dimensional homonuclear (1H–1H)

and heteronuclear (1H–13C) NMR [Table S1, ESIw]. Subsequentprincipal component analysis identified three principal

components (c1. . .c3) describing 38.8, 33.2 and 18.5% of the

variation (79.6% overall) in the data. It was noteworthy that

c1, c2 and c3 correlated strongly with the substitution state of

Scheme 1 Schematic of general repeating disaccharide structure of

heparan sulfate and modified heparin polysaccharides: [–4) a-L-IdoA1–4 a-D-glucosamine (1–], where R1 =H or SO3

�, R2 =H/COCH3 or

SO3� and R3 = H or SO3

�. The a-L-IdoA can be replaced by its C-5

epimer, b-D-GlcA.

Table 1 Influence of modifications on 13C NMR chemical shifts atI-3 and I-5

Position Component 1 Component 2 Component 3

I-3 0.77 0.60 0.01I-5 0.77 0.50 �0.06

This journal is �c The Royal Society of Chemistry 2010 Mol. BioSyst., 2010, 6, 902–908 | 903

Publ

ishe

d on

15

Febr

uary

201

0. D

ownl

oade

d by

Joh

ns H

opki

ns U

nive

rsity

on

24/0

9/20

13 1

1:36

:55.

View Article Online

positions I-2, A-2 and A-6, respectively, and also that

the dependencies of variation of 13C chemical shift values on

the 3 components varied considerably at the linkage positions

[Table 2].

The relative contributions of each component to the

overall variation in chemical shifts at the glycosidic linkage

positions (A-1, I-4, I-1 and A-4) were extracted from the

dataset comprising all the chemical shift values and are

shown in Fig. 1 and broken down in Table 2 to show the

effect at individual linkage positions. This revealed the relative

extent to which the ability to vary substitution pattern

at a given position could be related to variations in

chemical shift values at the four linkage positions, A-1, I-4,

I-1 and A-4.

2.1 Linkages are influenced differentially by the three

substitution positions

A measure of the variations in 13C chemical shift values at

those positions involved in the glycosidic linkages, A-1, I-4, I-1

and A-4, were extracted from the loading plots, which had

themselves been derived using all the chemical shifts in the

molecule [Table S1, ESIw]. Variation at positions A-1 and I-1

depended heavily on modification at A-2 (component 2; c2 in

Table 2) and modification at I-2 (c1) respectively. Variation

at A-4 depended on both modification at A-6 (c3) and

modification at A-2 (c2), while variation at I-4, on the

other hand, depended on c1 (modification at I-2) and c2

(modification at A-2) [Fig. 1A–C]. Variation at none of the

linkage positions depended heavily on the substitution condi-

tion at both I-2 and A-6 (c1 and c3), indicating that modifica-

tions at A-2 and A-6 created variation at the linkage positions

independently of each other. Looked at another way, the

linkages either side of glucosamine were influenced primarily

by substitution at A-2 and A-6, but not I-2 (i.e. by c2 and c3,

but not c1) and those either side of the iduronate residue were

influenced by substitution at I-2 and A-2, but not A-6 (i.e. by

c1 and c2, but not c3). It is noteworthy that c3 (modification

at A-6) had a significant effect only at A-4 and then the effect

was moderate, while two substitutions resulted in effects at

I-1: either at I-2 or A-2 (c1 and c2).

Table 2 Combinations of components 1, 2 and 3 (representing an ability to modify positions I-2, A-2 and A-6 respectively) and their resulting‘‘distance’’ (in terms of overall (root sum of squares) changes in chemical shift) away from the unmodified compound (defined as the origin)

Contribution

r.s.s. ‘‘distance’’aA-1 I-4 I-1 A-4

Combination of components R1 R2 R3 R4

c1 c1 0.05 0.52 0.98 0.02 1.11c2 c2 0.87 0.84 0.02 0.19 1.22c3 c3 0.04 0.02 0.01 0.51 0.51c1 + c2 c1 0.05 0.52 0.98 0.02 1.65

c2 0.87 0.84 0.02 0.19c1 + c3 c1 0.05 0.52 0.98 0.02 1.22

c2 0.04 0.02 0.01 0.51c2 + c3 c2 0.87 0.84 0.02 0.19 1.33

c3 0.04 0.02 0.01 0.51c1 + c2 + c3 c1 0.05 0.52 0.98 0.02 1.73

c2 0.87 0.84 0.02 0.19c3 0.04 0.02 0.01 0.51

None (origin) — 0 0 0 0 Originr.s.s. variation at each positionb — 3.83 5.51 3.94 2.88 —

a [S4i=1ri

2]1/2. b [Sri2]1/2.

Fig. 1 Contributions (loading barplots) of the three principal

components to variation at each glycosidic linkage position in the

disaccharide repeating unit of modified heparin derivatives: A.

Component 1, B. Component 2, C. Component 3. [blue star highlights

I-1 or I-4, red stars highlight A-1 or A-4]. D. Overall extent of

variation in the linkage positions bestowed by the ability to modify

substituents at I-2, A-2 and A-6 (correlated with c1, c2 and c3

respectively) alone and in combination (root sum of squares) plotted

as the root sum of squares distance (r.s.s. distance) from starting

structure (to scale). The effects of c1, c2 and c3 alone are shown in

blue, red and green, and correlate with the substitution condition at

I-2, A-2 and A-6 respectively.

904 | Mol. BioSyst., 2010, 6, 902–908 This journal is �c The Royal Society of Chemistry 2010

Publ

ishe

d on

15

Febr

uary

201

0. D

ownl

oade

d by

Joh

ns H

opki

ns U

nive

rsity

on

24/0

9/20

13 1

1:36

:55.

View Article Online

2.2 Analysis of the overall effects of single and combined

modifications, and at each type of linkage

Modifications at A-2 caused the most significant changes to

linkage positions, followed by those at I-2 and then, to a lesser

extent, by those at A-6. It was also possible to examine the

combined effects of multiple modifications on the variation in

chemical shift values at each linkage position. Summing

the relative contributions (root sum of squares distance:

(r12 + r2

2 + r32 + r4

2)1/2) over all linkages represents the

overall ‘‘distance’’, a measure of the variation from an

unmodified starting structure (the origin) that can be generated

by having the capability of making particular single or

combined modifications, whose magnitudes are represented

in Fig. 1D.

The degree of variation at each linkage position [i.e. A-1,

I-4, I-1 and A-4] was analysed, in the case of single and then

combined modifications, by summing the contributions at

each linkage position of the relevant components for each

combination of possible modification: at I-2 (c1 alone), at A-2

(c2 alone), at A-6 (c3 alone), then at I-2 and A-2 (c1 + c2) etc.,

with ‘‘none’’ defining the origin, i.e. no substitution [Table 2].

Similarly, the extent to which variation in the 13C chemical

shift values for each particular linkage [i.e. A-1. . .I-4 or

I-1. . .A-4] was bestowed by an ability to alter substituents at

single or combined positions could also be determined. At

each linkage position, these values were summed [i.e. down the

columns in Table 2] to give the overall variation possible for all

combinations of modifications at each individual linkage

position, I-4 being most strongly affected and, greater variation

was apparent in the A-1. . .I-4 linkage (3.83 and 5.51) than in

I-1. . .A-4 (3.94 and 2.88; values taken from the last row of

Table 2).

The order of significance (i.e. how far the resulting struc-

tures were from the starting structure) was the following: the

substitution state at A-2 was most significant, then I-2 and the

least significant was the substitution state at A-6. The bio-

synthetic enzymes are thought to act in the same order. This

suggests that the ability to modify the backbone geometry

underlies the relationship between sequence and structure in

HS–heparin. It was also interesting that 6-O-sulfation was not

responsible for widespread structural changes in the backbone;

its most significant effect being a moderate influence on

chemical shift values at A-4. In contrast, the modifications at

A-2 and I-2, which occur earlier in the biosynthetic pathway,

have more significant effects on the backbone structure.

Modification at A-2 clearly has widespread effects on linkage

positions, influencing geometry significantly at A-1, I-4 and

moderately so at A-4, while modification at I-2 primarily

influences I-1 and I-4 [Fig. 1A–C and Table 2].

2.3 Implications for heterogeneous sequences

All of the examples above were derived from essentially

homogeneous polysaccharides. The implications for hetero-

geneous sequences were next examined in a sample, in which

randomly distributed chemical modifications involving partial

de-O-sulfation at A-6 and I-2 generated statistical variability

in the structure,28,29 with only A-2 remaining completely

N-acetylated. The principal signals at the anomeric positions

I-1 and A-1 were recorded for combinations of nearest

neighbours, represented in Table 3.

A-1 is insensitive to I-2 modification and I-1 is insensitive to

A-6 modification in their adjacent residues, agreeing with the

findings for the homogeneously modified polysaccharides

(Section 2.2). In addition, despite a mixed population of

adjacent residues, no signals lying outside those values

observed in the homogeneous polysaccharides were evident.

This suggests that the limits of the extent of linkage variation

in heterogeneous sequences are formed by those observed for

the homogeneous polysaccharides.

2.4 Evaluating the effects of substitution patterns at the

glycosidic linkages

The analyses reported here are for sodium salts, in which

there is no possibility of inter- or intra-residue bridging by

multivalent cations.30,31 Inter-residue hydrogen bonding does

exist for some derivatives, however, and has been identified

previously in these compounds.27,32 For the present case, if the

total number of possible ways of influencing the linkage

geometry are considered, and taking values from Table 2,

where any significant effect (values r0.05 are set to 0 and

>0.05 to 1) at A-1, I-4, I-1 and A-4 is represented as 1 and no

effect as 0, the relationship can be simplified [Table 4] to

reveal the presence (1) or absence (0) of an effect. Summing

the contributions made by the components in the same

Table 3 13C chemical shifts at linkage positions A-1 and I-1 inhomogeneous and heterogeneous sequences in a partially modifiedheparin polysaccharide. In both instances, the middle residue (labelledB) is shown with the adjacent residues either side at A and C. Inhomogeneous sequences, the residues at A and C are identical, while inheterogeneous sequences, they are different. The chemical shifts of A-1(a) and I-1 (b) do not differ significantly in cases when (at A and C)adjacent residues differ compared to those of the homogeneous cases(when A = C). This indicates that the effects of the neighbouringgroups (towards the non-reducing end direction) are not great on thelinkage positions and no unexpected shifts in the residue at B occur

(a)I2X–A

6XNAc(A-1)–I2X A-1 of IdoA–GlcN–IdoA

A B C A-1 d 13C (ppm) D (ppm)IdoA2S

–GlcNAc –IdoA2S96.8

0.0IdoA 96.8IdoA

–GlcNAc –IdoA97.1

0.4IdoA2S 97.5IdoA2S

–GlcNAc6S –IdoA2S96.6

0.1IdoA 96.7IdoA2S

–GlcNAc6S –IdoAN/A

N/AIdoA 97.1

0.3 DGlcNAc � 6S(b)

A6XNAc–I2X(I-1)–A

6XNAc I-1 of GlcN–IdoA–GlcN

A B C I-1 d13C (ppm) D (ppm)GlcNAc

–IdoA –GlcNAc104.3

0.3GlcNAc6S 104.6GlcNAc

–IdoA –GlcNAc6S104.6

0.2GlcNAc6S 104.8GlcNAc

–IdoA2S –GlcNAc102.3

0.1GlcNAc6S 102.2GlcNAc

–IdoA2S –GlcNAc6S102.4

0.2GlcNAc6S 102.2

0.1 DIdoA/IdoA2S

X denotes the possible substitutions: IdoA2S/IdoA2OH and

GlcNAc6S/GlcNAc6OH. The error in the reported 13C chemical shift

values is � 0.1 ppm.

This journal is �c The Royal Society of Chemistry 2010 Mol. BioSyst., 2010, 6, 902–908 | 905

Publ

ishe

d on

15

Febr

uary

201

0. D

ownl

oade

d by

Joh

ns H

opki

ns U

nive

rsity

on

24/0

9/20

13 1

1:36

:55.

View Article Online

combinations as above [Table 2] (but using binary addition,

akin to Boolean OR logic; [disjunction x 3 y], so that one or

more effects are registered in the same way, indicated by 1: i.e.

0 + 0 = 0, 1 + 0 = 1, 0 + 1 = 1, 1 + 1 = 1) simplifies the

table and permits the existence of change at each glycosidic

linkage to be recorded. The possible changes at the four

linkage positions, A-1, I-4, I-1 and A-4, were simplified to

two linkages: L1 and L2 [Table 4] shown in the right hand

columns; L1 representing linkage A-1. . .I-4 and L2 representing

I-1. . .A-4.

This shows that certain combinations of substitutions

altered the linkages, L1 and L2, in distinct ways but, also that

there was considerable redundancy, with six unique ways of

simultaneously modifying both glycosidic linkages. Notably,

the 8 combinations failed to generate all 4 possible patterns of

variation in linkage positions (i.e. 0 0, 1 0, 0 1 and 1 1); note

that c2 and c2 + c3 and also c1 + c2 and c1 + c2 + c3 gave

the same result; 1101 and 1111 (= 1 1), respectively, but no

combination of substitutions was able to achieve the pattern

(1 0) i.e., to influence L1 (A-1–I-4) alone.

2.5 Consequences of the limited influence of substitution

pattern on linkage variability in all possible sequences

with 2 to 16 residues

Following the result in Section 2.4, all possible combinations

of pairwise sequences from 2 to 16 residues were modelled,

calculating for each sequence combination the overall

variation in linkage geometry [as a ‘‘distance’’ from the origin

i.e., from an unmodified structure of the same length] which

was plotted against the number of sulfate groups it contained

[Fig. 2A]. It is important to note that these models represent

segments of this length as if they were in a polysaccharide

chain, not as free oligosaccharides. The results were also

arrayed as plots of frequency, or density, (i.e. the number of

structures with a given ‘‘distance’’) [Fig. 2B] and the range of

variability for each level of sulfation was determined [Fig. 2C].

Shorter sequences had a rather uneven ‘‘distance’’ distribution

(left-skewed), but this became smoother with increasing chain

length [Fig. 2B]. Furthermore, it was interesting to note

that the widest spread of structural variation (i.e. spread of

‘‘distance’’) occurred at moderate sulfation levels and always

before the maximum sequence diversity [Fig. 2C]. This was a

consequence of the modest influence of the modification at A-6

on the linkage geometry and shows that considerable linkage

variability can already be attained with moderate sulfation

levels and only 2 types of substitution: at A-2 and I-2. The

analysis also illustrated the huge degeneracy present in the

system. The areas in the centre of the plots were occupied by

large numbers of structures for longer oligosaccharides

[Fig. 2A] (For 16 residues stretches, there are B106 sequences

that have 12 sulfates which lie within �1% average r.s.s.

distance.). It is also interesting to note, the two linkages

(A-1. . .I-4 and I-1. . .A-4) have different levels of variation,

with that at A1–I4 being greater than that at I1–A4 [ESI,

Fig. S1]. The implication from these results is that only a

modest subset of all possible structures needs to be synthesised

to generate substantial variation in linkage geometry, the

variability is ‘uneven’ within the structure but, the system is,

in terms of overall linkage variation, highly degenerate.

3. Experimental

The approach presented here aims to deduce the location

and severity of changes in environment at positions in the

repeating structure by analysis of the 13C NMR chemical shift

values for a series of modified heparin polysaccharides, acting

as models of HS. Experimental 13C NMR chemical shift

values were measured for a library of chemically modified

heparin polysaccharides, analysed by factor analysis, with

prior mean-centering and factors were extracted through

principal components. The loadings were derived from the

analyses reporting the effective change although known, in

chemical shift when a modification was made at specific

positions in the molecules; the modifications causing the

change could be identified by the component regression scores.

A combinatorial approach was taken to model a number of

different length sequences by combining the loadings derived

from the previous analysis of the experimental 13C NMR

chemical shift data. The overall effect on the linkages was

investigated by calculating the root sum squared contributions

from all linkage positions for all possible disaccharides

[8 possibilities] to hexadecasaccharides [16 216 possibilities,

777]. This reports the overall effect on the chemical shifts in

the molecule due to the particular modification. For the

combinatorial modelling of all possible (even) sequences from

2–16 residues, the overall (i.e. combined for all linkages in each

sequence) root sum of squares variation from the origin (using

combinations of principal components) were calculated for

each possible sequence. These values were plotted [Fig. 2] as:

A. root sum of squares distances from the origin, B. kernel

density, i.e. number of sequences, with particular distances

from the origin and C. as variation in ‘‘distance from the

origin’’ (difference between maximum and minimum ‘‘distance’’,

Dmax � Dmin) for populations of sequences with particular

numbers of sulfate groups representing the overall range of

structural variation within that population. The same procedure

was also conducted for each type of linkage, either I-A or A-I,

and the results are shown in Fig. S1, ESIw. The root sum

squares were calculated for variation in 13C chemical shift

values for the specific linkage positions along the sequence, to

assess the contributions to linkage variation in both forward

Table 4 The effect of combinations of components (representing theability to modify substituents at A-2, I-2 and A-6) at the four positionsinvolved in the glycosidic linkage and in a reduced form addressing thepresence (1) or absence (0) of changes in the glycosidic linkages L1(A-1. . .I-4) and L2 (I-1. . .A-4). Single or multiple effects are recordedas 1, no effect as 0

Linkage positions Linkages

A-1 I-4 I-1 A-4 L1 L2

None 0 0 0 0 0 0c1 0 1 1 0 1 1c2 1 1 0 1 1 1c3 0 0 0 1 0 1c1 + c2 1 1 1 1 1 1c1 + c3 0 1 1 1 1 1c2 + c3 1 1 0 1 1 1c1 + c2 + c3 1 1 1 1 1 1

906 | Mol. BioSyst., 2010, 6, 902–908 This journal is �c The Royal Society of Chemistry 2010

Publ

ishe

d on

15

Febr

uary

201

0. D

ownl

oade

d by

Joh

ns H

opki

ns U

nive

rsity

on

24/0

9/20

13 1

1:36

:55.

View Article Online

and reverse directions. These were given equal weight and

averaged along the chain to report the overall linkage

variation. This was calculated for all possible sequences of

4 [64 possibilities] and 6 sugar residues [512 possibilities]

[Fig. S2, ESIw].Factor analysis using factors extracted by principal

components was performed using SPSS [SPSSUK Ltd.,

Woking United Kingdom]. All other analysis and data

manipulations were performed using Microsoft Excel (Office

2007) [data manipulation], SPSS [distance correlation matrix]

and R [R Development Core Team (2009). R: A language and

environment for statistical computing, R Foundation for

Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0,

URL http://www.R-project.org] [histograms, large dataset

plotting, kernel density plot]. Other figures were produced

using ChemDraw Ultra 11 [CambridgeSoft, Cambridge, UK]

and SigmaPlot 11 [Systat Software UK Limited,

Hounslow, UK].

4. Conclusion

The variation in linkage geometry of heparin and HS saccharides

is dependent on substitution pattern and this provides a subtle

link between sulfation pattern with the observed biological

activities. The conformational consequences are distinct at the

two linkages [A-1. . .I-4 and I-1. . .A-4] and at each type

of glycosidic linkage position [A-1, I-4, I-1 and I-4]. The

modifications which occur earlier in the biosynthetic pathway

(at A-2 and I-2) have the greatest effect on the linkages, while

6-O-sulfation has least apparent influence on backbone and

overall conformation.

The effect on linkage variation in short sequences arising

from an ability to modify I-2, A-2 and/or A-6 with sulfate

groups, or not, is predicted to result in uneven linkage

variation. The overall effect of this on the linkage ‘‘distance’’

becomes obscured for longer stretches by virtue of the vast

number of possible sequence combinations and their huge

apparent degeneracy, although when structural variety at each

linkage position along the chains is considered, the degeneracy

is reduced (o20%) [Fig. S5, ESIw] but is still very significant

[B3 � 106 possible sequences for hexadeccasaccharides].

Given the absence of strict sequence specificity for most

HS–protein interactions, this argues for a considerable degree

of conformational degeneracy in linkages as a function of

substitution pattern and implies either, that only a relatively

small subset of structures may be required in nature to satisfy

the various geometric requirements for HS–protein binding, or

that there may be many alternative sequences, any of which

will suffice in a given situation. In other words, the unique

information content potentially available in HS sequences,

defined ultimately by conformation, is vastly inferior to the

potential sequence diversity. The results also suggest that the

definition of specificity for these compounds needs to be stated

carefully and should include reference to both sequence as well

as overall chain and linkage conformation. The level of

degeneracy apparently present within the system implies

that several HS sequences should satisfy particular binding

requirements and has implications for the design of GAG

mimetics. It should be emphasised that, for valid structure–

function conclusions to be drawn employing an approach

based on screening libraries for activities, a sufficiently wide

range of structures (to cover geometric variety sufficiently, not

simply to occupy degenerate sequences) first needs to be

screened.29 Those structure–activity relationships which are

derived from experiments employing, for example, structurally

limited naturally occurring HS from a particular source, may

not contain sufficient geometric diversity to allow rigorous

conclusions to be drawn concerning issues of structural variety

and specificity more widely. HS domains should be viewed in

terms of their conformational characteristics and not simply

the type of sulfation pattern.

Fig. 2 Combinatorial modelling of overall linkage variation for all

possible oligosaccharides (pairwise from 2 to 16 residues): A. Com-

bined linkage ‘r.s.s. distances’ against number of sulfates for the

linkages A-1, I-1 and I-1, A-4 combined. B. Kernel density plots of

the linkage distance distributions in A. C. Total variation (maximum

r.s.s. distance, Dmax �mimimum r.s.s. distance, Dmin) of the combined

linkages at each level of sulfation for all possible sequences.

This journal is �c The Royal Society of Chemistry 2010 Mol. BioSyst., 2010, 6, 902–908 | 907

Publ

ishe

d on

15

Febr

uary

201

0. D

ownl

oade

d by

Joh

ns H

opki

ns U

nive

rsity

on

24/0

9/20

13 1

1:36

:55.

View Article Online

An important observation is that high backbone diversity

was attainable even with low levels of sulfation, providing the

possibility of many structurally suitable molecular scaffolds

(defined by backbone geometry) for direct interactions with

single proteins, but this also allows scope for differentiation

during the formation of higher complexes (e.g. FGF/FGFR)

through additional modifications.

The findings presented here may also be of use for projected

structure–function studies; one of the limiting factors has

always been the huge number of possible structures which

need to be considered. This analysis suggests the beginning of

a rationale for the selection of possible synthetic target

structures.

Acknowledgements

The authors gratefully acknowledge funding from The

Wellcome Trust, The Royal Society and BBSRC. Prof. D. G.

Fernig is thanked for useful discussions.

Notes and references

1 A. Ori, M. C. Wilkinson and D. G. Fernig, Front. Biosci., 2008,4309–4338.

2 Z. Scholefield, E. A. Yates, G. Wayne, A. Amour, W. McDowelland J. E. Turnbull, J. Cell Biol., 2003, 163, 97–107.

3 J. van Horssen, P. Wesseling, L. P. van den Heuvel, R. M. de Waaland M. M. Verbeek, Lancet Neurol., 2003, 2, 482–492.

4 B. L. Allen and A. C. Rapraeger, J. Cell Biol., 2003, 163,637–648.

5 M. Mohammadi, S. K. Olsen and O. A. Ibrahimi, Cytokine GrowthFactor Rev., 2005, 16, 107–137.

6 L. Pellegrini, Curr. Opin. Struct. Biol., 2001, 11, 629–634.7 Z. L. Wu, L. Zhang, T. Yabe, B. Kuberan, D. L. Beeler,A. Love and R. D. Rosenberg, J. Biol. Chem., 2003, 278,17121–17129.

8 J. A. Davies, E. A. Yates and J. E. Turnbull, Growth Factors, 2003,21, 109–119.

9 S. M. Rickard, R. S. Mummery, B. Mulloy and C. C. Rider,Glycobiology, 2003, 13, 419–426.

10 K. Itoh and S. Y. Sokol, Development, 1994, 120, 2703–2711.11 R. Copeland, A. Balasubramaniam, V. Tiwari, F. Zhang,

A. Bridges, R. J. Linhardt, D. Shukla and J. Liu, Biochemistry,2008, 47, 5774–5783.

12 D. Pinna, P. Oreste, T. Coradin, A. Kajaste-Rudnitski, S. Ghezzi,G. Zoppetti, A. Rotola, R. Argnani, G. Poli, R. Manservigi andE. Vicenzi, Antimicrob. Agents Chemother., 2008, 52, 3078–3084.

13 M. Moulard, H. Lortat-Jacob, I. Mondor, G. Roca, R. Wyatt,J. Sodroski, L. Zhao, W. Olson, P. D. Kwong and Q. J. Sattentau,J. Virol., 2000, 74, 1948–1960.

14 M. Tyagi, M. Rusnati, M. Presta and M. Giacca, J. Biol. Chem.,2001, 276, 3254–3261.

15 R. R. Vives, A. Imberty, Q. J. Sattentau and H. Lortat-Jacob,J. Biol. Chem., 2005, 280, 21353–21357.

16 E. L. G. M. Tonnaer, T. G. Hafmans, T. H. Van Kuppevelt,E. A. M. Sanders, P. E. Verweij and J. H. A. J. Curfs, MicrobesInfect., 2006, 8, 316–322.

17 M. Petitou, Nouv. Rev. Fr. Hematol., 1984, 26, 221–226.18 J. Kreuger, D. Spillmann, J. P. Li and U. Lindahl, J. Cell Biol.,

2006, 174, 323–327.19 J. Angulo, M. Hricovini, M. Gairi, M. Guerrini, J. L. de Paz,

R. Ojeda, M. Martin-Lomas and P. M. Nieto, Glycobiology, 2005,15, 1008–1015.

20 W. L. Chuang, M. D. Christ and D. L. Rabenstein, Anal. Chem.,2001, 73, 2310–2316.

21 R. Lucas, J. Angulo, P. M. Nieto and M. Martin-Lomas, Org.Biomol. Chem., 2003, 1, 2253–2266.

22 D. Mikhailov, K. H. Mayo, I. R. Vlahov, T. Toida, A. Pervin andR. J. Linhardt, Biochem. J., 1996, 318(Pt 1), 93–102.

23 K. Sugahara, R. Tohno-oka, S. Yamada, K. H. Khoo,H. R. Morris and A. Dell, Glycobiology, 1994, 4, 535–544.

24 G. Torri, B. Casu, G. Gatti, M. Petitou, J. Choay, J. C. Jacquinetand P. Sinay, Biochem. Biophys. Res. Commun., 1985, 128,134–140.

25 S. Yamada, Y. Yamane, H. Tsuda, K. Yoshida and K. Sugahara,J. Biol. Chem., 1998, 273, 1863–1871.

26 B. Mulloy and M. J. Forster, Glycobiology, 2000, 10, 1147–1156.27 E. A. Yates, F. Santini, B. De Cristofano, N. Payre, C. Cosentino,

M. Guerrini, A. Naggi, G. Torri and M. Hricovini, Carbohydr.Res., 2000, 329, 239–247.

28 E. A. Yates, F. Santini, M. Guerrini, A. Naggi, G. Torri andB. Casu, Carbohydr. Res., 1996, 294, 15–27.

29 E. A. Yates, S. E. Guimond and J. E. Turnbull, J. Med. Chem.,2004, 47, 277–280.

30 S. E. Guimond, T. R. Rudd, M. A. Skidmore, A. Ori, D. Gaudesi,C. Cosentino, M. Guerrini, R. Edge, D. Collison, E. J. McInnes,G. Torri, J. E. Turnbull, D. G. Fernig and E. A. Yates, Biochemistry,2009, 48, 4772–4779.

31 T. R. Rudd, S. E. Guimond, M. A. Skidmore, L. Duchesne,M. Guerrini, G. Torri, C. Cosentino, A. Brown, D. T. Clarke,J. E. Turnbull, D. G. Fernig and E. A. Yates, Glycobiology, 2007,17, 983–993.

32 T. R. Rudd, M. A. Skidmore, S. E. Guimond, C. Cosentino,G. Torri, D. G. Fernig, R. M. Lauder, M. Guerrini andE. A. Yates, Glycobiology, 2009, 19, 52–67.

908 | Mol. BioSyst., 2010, 6, 902–908 This journal is �c The Royal Society of Chemistry 2010

Publ

ishe

d on

15

Febr

uary

201

0. D

ownl

oade

d by

Joh

ns H

opki

ns U

nive

rsity

on

24/0

9/20

13 1

1:36

:55.

View Article Online