knowledge-based chemical fragment analysis in protein binding sites

35
The Wolfson Institute for Biomedical Research The Cruciform Building, UCL 1 Knowledge-based Chemical Fragment Analysis in Protein Binding Sites Edith Chan Roman Laskowski, David Selwood University College London, UK 19 June 2014 [email protected] www.ucl.ac.uk/wibr/research/drug-discovery/edith-chan

Upload: cresset

Post on 02-Jun-2015

149 views

Category:

Science


0 download

TRANSCRIPT

Page 1: Knowledge-based chemical fragment analysis in protein binding sites

The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL

1

Knowledge-based Chemical Fragment Analysis

in Protein Binding Sites

Edith Chan

Roman Laskowski, David Selwood

University College London, UK

19 June [email protected]://www.ucl.ac.uk/wibr/research/drug-discovery/edith-chan

Page 2: Knowledge-based chemical fragment analysis in protein binding sites

The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL

2

A medicinal chemist’s problem

vFLIP – IKKgamma, 3cl3.pdb

A common question from medicinal

chemists - what compounds

should be made given a protein

target.

Page 3: Knowledge-based chemical fragment analysis in protein binding sites

The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL

3

A medicinal chemist’s problem

vFLIP – IKKgamma, 3cl3.pb

• A common question from

medicinal chemists - what

compounds should be made

given a protein target.

• Chemists very good at

generating new ideas when

seeing other structures.

Page 4: Knowledge-based chemical fragment analysis in protein binding sites

The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL

How can a medicinal chemist decide on a chemical strategy

– Screening to find leads

– Natural ligand

– Virtual screening (hit rate improvement of 10x)

– SAR studies

– Fragment based approaches

Med chem need

– A simple way to select likely binding molecules for a protein binding site

Page 5: Knowledge-based chemical fragment analysis in protein binding sites

The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL

Fragment-based drug discovery

5

• FBDD has become an established and

successful paradigm for the past 10

years.

• Small chemical structures are screened

to probe the binding site and then to

identify larger molecules to bind.

• Most platforms are laboratory based –

like X-ray, NMR.

Target Site

Page 6: Knowledge-based chemical fragment analysis in protein binding sites

The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL

Design of fragments: current status

• Physicochemical filtering Ro3 (Ro2 ½ ), small, soluble

• Screen out reactive groups

• Chemical handles for further manipulation

• Most groups use 500- 2000 compounds, some use 10,000 plus

• A good binding affinity - ligand efficiency often used to express binding

affinity of fragments

– Typical good LE is > 0.2

LE = -ΔGHAC

-RT ln(Kd)HAC

=

LE = ligand efficiency, HAC = heavy atom count

Page 7: Knowledge-based chemical fragment analysis in protein binding sites

The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL

Common computational approaches

• Fragment /property centered

– Analyzing drug-related databases for molecular

frameworks, property, diversity, and privileged

scaffolds, etc.

DrugBank, WDI

MW

HA

HD

PSA clogP

Diversity

Rule of3

N

N

Page 8: Knowledge-based chemical fragment analysis in protein binding sites

The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL

Common computational approaches

• 3D fragments experimental centered

– Most current fragments are based around

aromatic or heteroaromatic structures

– Use Diversity orientated synthesis to design

new fragment based libraries

Page 9: Knowledge-based chemical fragment analysis in protein binding sites

The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL

Common computational approaches

• Binding site centered

• FragFEATURE – predict fragments / functional groups based on structural environment of binding sites.

• LigFrag-RPM, preference of small fragments / functional groups and their amino acid environment

• Would be too complicated for chemists to understand

9

Tang et al. PLOS Computational Biology 2014, 10, e1003589Wang et al. Chemical Information & Modeling 2011, 51, 807-815

Page 10: Knowledge-based chemical fragment analysis in protein binding sites

The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL

Target centred

• Given a new or unknown target, what kind of scaffolds or fragments

chemists should try.

• Using X-ray structures to define fragments

• A cheminformatics database identifying the chemical motifs or fragments

that preferentially interact with particular protein side chains in the binding

site.

Page 11: Knowledge-based chemical fragment analysis in protein binding sites

The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL

Our approach

• Use the pdb as a source of ligand- protein information- deriving the interacting “fragment”

• High quality information

Definition of fragment

• the largest ring assembly containing the atoms involved in hydrogen bond(s) to one of the side chains

• Looked at Asp, Glu, Arg and His as the most common amino acids involved in H-bonding interactions

Page 12: Knowledge-based chemical fragment analysis in protein binding sites

The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL

12

PDB

• The Protein Data Bank (PDB) has over 100K structures, with ~ 77K that have co-crystallized ligand.

• PDB 3D co-crystallized structures provide protein-ligand interaction.

• Analyses of specific ligand/side chain interactions tended to focus on single ligand atoms rather than chemical fragments.

Page 13: Knowledge-based chemical fragment analysis in protein binding sites

The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL

Protein-ligand interactions – 3D

13

Hydrogen Bonds

Interactions in 3D: Hbonds, hydrophobic, van der Waals, etc

Page 14: Knowledge-based chemical fragment analysis in protein binding sites

The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL

14

Protein-ligand interactions – 2D

LigPlot

N

N

N

N

O

ON

F

F

N

O

O

O

NN

Page 15: Knowledge-based chemical fragment analysis in protein binding sites

The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL

15

Fragment motif definition & extraction

N

N

N

N

O

ON

F

F

N

O

O

O

NN

• Interacting motif is the largest ring

assembly containing the atoms involved in

the hydrogen bond(s).

• Other substituents on the ring assembly

are removed if they are not involved in the

hydrogen bond to the relevant protein side

chain

Page 16: Knowledge-based chemical fragment analysis in protein binding sites

The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL

16

Amino acids studied• Acidic and negatively charged

N

O

O

O

N

O

OO

N

N

C+

O

N NH

HH

H

H

N

O

NN

H

• Basic and positively charged arginine (Arg)

• Basic and polar histidine (His)

aspartic acid (Asp) glutamic acid (Glu)

Asp, Glu, Arg, and His – important in binding sites

• Account for 55% of all catalytic residues in enzyme actives sites

• Most frequently in contact with ligand in 50 diverse protein binding sites

Villar, FEBS Lett 1994 349 125Bartlett, J Mol Biol 2002 324 105

Page 17: Knowledge-based chemical fragment analysis in protein binding sites

The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL

17

Ligands

Filter Criteria Used to Retrieve Relevant Ligands from the PDB

1. 100 < molecular weight (Da) < 800

2. Number of atoms > 5

3. Only atoms H, C, N, O, P, S, F, Cl, Br, I

4. Number of oxygen and nitrogen atoms < 16

5. Number of hydrogen donor atoms < 8

6. Number of rotatable bonds < 16

7. Not an metal ion or inorganic compound, such as AlF3

8. Not a common solvent used in X-ray, such as GOL (Glycerol), EDO (1,2-ethanediol), TRS (2-amino-2-hydroxymethyl-propane-1,3-diol)

9. Not an impurity or unknown, such as UNX, UNK, UNL, ARG, O, C, N

10. Not an negativeiIon, such as NO3 (Nitrate), SO4 (sulfate)

11. Not an amino acid

12. Not a common sugar or lipid

13. Not a cofactor, such as ATP, ADP, SAM, FAM

Chan, et al. J. Med. Chem. 2010, 53, 3086–3094

Page 18: Knowledge-based chemical fragment analysis in protein binding sites

The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL

18

Data set from PDB• The data set for this study was compiled in April 2009

• Approximately 26,000 structures.

• Hbond between ligands and specific protein side chains were extracted from the data files in PDBsum

• PDBsum uses the HBPLUS program to calculate potential hydrogen bonds and non-bonded contacts.

• The 3D coordinates of the ligand and interacting side chain of interest were then extracted from the parent PDB file and translated into MDL SD format for further processing

PDB

Protein-ligand

complexes

X-ray only; R < 2.2Å

Hbond withD, E, R or H?

Protein LigandInteraction(Hbond)

N

N

O

ON

F

F

N

O

O

O

NN

Fragment

N

N

Page 19: Knowledge-based chemical fragment analysis in protein binding sites

The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL

19

O-H•••O

N-H•••O

N-H •••N

Asp

Glu

Arg

His

0

5

10

15

20

25

30

35

40

1 2 3 4 5 6 7 8

Fre

q(%

)

2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 Bond Length (Å)

0

5

10

15

20

25

30

35

40

45

50

1 2 3 4 5 6 7 8

Fre

q (

%)

2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 Bond Length (Å)

0

5

10

15

20

25

30

35

40

45

50

1 2 3 4 5 6 7 8

Fre

q (

%)

2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 Bond Length (Å)

HBond• HBond length - distance between

• O-O, N-O, and N-N atoms

• Typical HBond values between 2

heteroatoms

• 2.5 – 3.5 Å (15 - 20 kcal/mol)

• The bond lengths of different type of

Hbond are in line with typical values

• – O-O shortest

• – N-N longest

• Due to atomic radii

Page 20: Knowledge-based chemical fragment analysis in protein binding sites

The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL

20

N-H•••O

Asp

Glu

Arg

His

0

5

10

15

20

25

30

35

40

45

50

1 2 3 4 5 6 7 8

Fre

q (

%)

2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 Bond Length (Å)

Hbond: N-O category• In the N-O category, the mean bond

values for Asp, Glu, and Arg are around

• 2.8-3.0 Å

• However, in His, the mean value is slightly

shorter, around

• 2.6 to 2.8 Å

• bond value distribution is more

spread out.

• It may imply that His has the ability to form

a wider range (stronger to weaker) of

Hbond with ligands.

Page 21: Knowledge-based chemical fragment analysis in protein binding sites

The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL

21

N-H •••N

Asp

Glu

Arg

His

0

5

10

15

20

25

30

35

40

1 2 3 4 5 6 7 8

Fre

q(%

)

2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 Bond Length (Å)

Hbond: N-N categoryThe N-N category,:

• Hbond between His and other N-

containing ligand tends to be longer

• weaker Hbond compared to those

formed by Arg.

• Most of PDB in this category are Zn

binding proteins.

• His(s) coordinate to Zn as well as

ligand. In this case, His is donating

electrons to the metal and forms a

weaker HB with ligand.

2q1qcarbonic anhydrase

Page 22: Knowledge-based chemical fragment analysis in protein binding sites

The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL

22

Protein families

His

Glu

Arg

Asp

Hydrolase

Oxidoreductase

Transferase

Lyase

Isomerase

others

33%

18%15%6%

3%

26%

21%17%5%

4%

33%

18%

15%6%

3%

33%

18%15%

6%

3%

• Our data set contains structures from

– all enzyme classes

– various receptor families.

• 5 protein families dominate

– Hydrolase

– Oxidoreductase

– Transferase

– Lyase

– Isomerase

• In Asp and Glu, their family distribution profiles are similar.

• in Arg, the top protein family is oxidoreductase.

The domination of enzymatic classes over receptor families reflect the nature that almost in all the cases, hydrogen bonding interaction is a requirement in enzymatic active sites.

Page 23: Knowledge-based chemical fragment analysis in protein binding sites

The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL

23

• Fragments showing two H-bonds to Asp and Glu side chains

Table 1. Fragments showing two hydrogen bonds to Asp and Glu side chains. Generic fragment

(fa) Highest frequency examplesb

nc Aryl amidines

Ar N

N

Asp (11) Glu ( 2)

N

N

72 2

N

N

N

N 30 0

N

N

12 0

Guanidines

RN N

N

Asp (2) Glu (5)

N

N

N

14 2

N

N

N

0 1

N

N

N

O 1 1

1-aza-2-aminoaryls

N

N

N

N

N

O

N

N

N

N

N

O

N

N

N

N

N

O Asp (8) Glu (6)

9 0

7 10

4 0

azaheteroaryl-7-amines N

N

Asp (2) Glu (1)

N

N N

NN

0 3

N

N N

NN

2 0

Dihydropyridazines

N

N

O

O

Asp (2) Glu (0)

N

N

O

O

2 0

NN

O

O

N

N

1 0

Cyclic diols

Wn

O

O Asp (20) Glu (13)

O

O

O 69 65

O

O

O 54 105

N

O

O 10 5

O

ON

N Asp or Glu

Page 24: Knowledge-based chemical fragment analysis in protein binding sites

The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL

24

Asp vs Glu – O mediated motifs are similar

OO

O

O

O

O

O

O

N

N

O

O

O

Asp 34 30 8 7 10 1Glu 45 69 5 5 4 11

NO

OO

O

• O-mediated motifs are similar

O

O

Asp or Glu

O

OH

OH

Page 25: Knowledge-based chemical fragment analysis in protein binding sites

The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL

25

Asp vs Glu: N-mediated motifs are different

Asp 72 30 14 12 11 2Glu 0 0 1 0 0 0

Asp 6 0 0 1 0 0Glu 13 11 4 4 3 1

NN

N

N

N

N

N

N

N

N

N

N N

N

N

N

N

N

N

N

N

N

N

S N

N

N S N

NN

N

Asp 6 1 1 2 1 1 Glu 0 0 0 0 2 0

S

NN

N

N

O

NN

NN

OH

H

N

N

O

NN

N N N

O

HH

N

N N

N

N

N

NN

Page 26: Knowledge-based chemical fragment analysis in protein binding sites

The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL

26

cytochromE c peroxidase• Membrane-bound hemoproteins that are

essential for electron transport. They are capable of undergoing oxidation and reduction.

• Asp can HB to a variety of motifs – 15 (10%)

NN

N

N NN N

N N

O

NNN

S

NN

N

N

N N

N

N N

N

NS

N

N

N

N

N

2euu1dso1dsp1ds41dse1aej1aes1cmp

2eun2aqd2eut2eup2anz

2as2

2as4

2rbu 2as62as1

1ryc1kxm

2rc0 2rbx

1aeo1aeg

1aen

1aee

1aem 1aek

Page 27: Knowledge-based chemical fragment analysis in protein binding sites

The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL

27

Arg – COOH motifs

• Arg is positively charged at pH values below their pKa (~12).

• Arg is mainly a HB donor.

• From our study, Arg almost exclusively forms HBond with O-mediated ligands

• The most frequent motif is acid.

O O O

O

S

O

O

O

O O

NO

N

O

O

N N

N

O O

OO

N

O

O

SO

OO

N+

O O

N

O

O

N

N

C+

O

N NH

HH

H

H

Page 28: Knowledge-based chemical fragment analysis in protein binding sites

The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL

28

• Although acidic motif is most common, exceptions apply but not often

O

O

N

ARG – via N:

1yfx 1vfs

Page 29: Knowledge-based chemical fragment analysis in protein binding sites

The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL

Generic fragment (fa)

Highest frequency examplesb

nc

Aza heterocycle

WnN

N

N

N

S

N

8 14 7 4

Sulfonamides

NS

OO

S

OO

N

SO

O

NN

NS

OS

OO

N

9 5 1 1

Cyclic alcohols

Wn

O

O

O

O

N

O

O 24 4 3 3

Phenols O

O

O

O N N

O

20 88 13 2

Carboxylic acids

O

R O

O

O

O

OO

O

ON

18 17 8 2

Carbonyls

OR

R'

O

N

O

N

N O

O

O

28 5 4 4

Fragments showing H-bonds to His side chains

• pKa ~ 6

• High freq binds to metal

• Can be HB or HD

• Most freq frags are –OH, -C=O, -COOH

• N-heterocyclics

N

O

NN

H

Page 30: Knowledge-based chemical fragment analysis in protein binding sites

The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL

30

Common and unique

OO

O

O

N

N

OO

N

N

O

O

N

N

Common in all 4 amino acids– universal Hbond partners

Unique in Asp but not others

N

N

NN

N

O

O

O

N

N

Most frequent

Asp Glu Arg His

Page 31: Knowledge-based chemical fragment analysis in protein binding sites

The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL

31

Table 2. Analysis and statistics for the 4 amino acids

Asp Glu Arg His

Side chain moiety and property

carboxylic acid carboxylic acid guanidinium imidazole

negatively charged, acidic

negatively charged, acidic

positively charged, basic

polar, basic

PDBa 3851 2541 3764 2736

Nonreductant PDBb 2428 1043 1568 1019

Protein familyc 186 150 214 166

Unique ligandd 992 893 710 883

Unique motif 161 144 137 133

Diversity ratioe 0.16 0.16 0.19 0.15

Hydrogen bond mediated atom

N 106 (66%) 80 (56%) 7 (5%) 35 (26%)

O 53 (33%) 56 (39%) 124 (91%) 85 (63%)

F/Cl 0 0 5 2

S 2 2 1 5

Mixedf 0 6 0 6

• The diversity ratio measures the diversity of the fragment motifs for all the ligands

= number of unique motif / number of unique ligands

• Asp, Glu, and His have a similar ratio, 0.16, while Arg (0.19) is higher.

• More variety of fragment motifs that interact with Arg in the PDB even though the number of unique ligands is the smallest in Arg.

`

Page 32: Knowledge-based chemical fragment analysis in protein binding sites

The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL

32

Table 2. Analysis and statistics for the 4 amino acids

Asp Glu Arg His

Hydrogen bond mediated atom

N 106 (66%) 80 (56%) 7 (5%) 35 (26%)

O 53 (33%) 56 (39%) 124 (91%) 85 (63%)

F/Cl 0 0 5 2

S 2 2 1 5

Mixedf 0 6 0 6

• Acidic side chains, Asp and Glu have a higher tendency to form hydrogen bond with N-mediated motifs

• Basic residues, Arg and His have a higher tendency with O-mediated ones.

• Most interesting is that Arg almost exclusively forms hydrogen bonds with O-mediated ligands (91%), suggesting it could be most effective to use O-mediated motifs to form hydrogen bonds with Arg.

• Another reassuring fact is that no F or Cl (hydrogen acceptor) is detected to form hydrogen bonds with Asp or Glu

• confirming that a binding site’s Asp and Glu are always hydrogen bond acceptors as well as negatively charged in protein structures.

Page 33: Knowledge-based chemical fragment analysis in protein binding sites

The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL

33http://www.ucl.ac.uk/~rmgzawe/suppinfo/index.html

Page 34: Knowledge-based chemical fragment analysis in protein binding sites

The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL

Conclusions

• Data show a conserved number of fragments.

• The most common fragments should be represented in fragment screening

sets for given targets.

• We need to expand and refine the analyses e.g. missing amino acids

• Some fragments (3D) are under-represented and could be embedded in

novel scaffolds – new chemistry

Chan, et al. J. Med. Chem. 2010, 53, 3086–3094

Page 35: Knowledge-based chemical fragment analysis in protein binding sites

The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL

Acknowledgment

• Funding from Cancer Research UK

35