bioinformatics in the bourne lab

51
Bioinformatics in the Bourne Lab Philip E. Bourne [email protected] BILD 94 May 3, 2012 August 14, 2009 5/3/12 UCSD BILD 94 1

Upload: philip-bourne

Post on 06-May-2015

573 views

Category:

Education


2 download

DESCRIPTION

A lecture in BILD94 at UCSD on introducing undergraduates to various aspects of bioinformatics.

TRANSCRIPT

Page 1: Bioinformatics in the Bourne Lab

UCSD BILD 94 1

Bioinformatics in the Bourne Lab

Philip E. [email protected]

BILD 94May 3, 2012

August 14, 2009

5/3/12

Page 2: Bioinformatics in the Bourne Lab

UCSD BILD 94 2

Some Personal Background ….

5/3/12

Page 3: Bioinformatics in the Bourne Lab

UCSD BILD 94 35/3/12

Page 4: Bioinformatics in the Bourne Lab

UCSD BILD 94 4

The Life of One Scientist – The Early YearsSo That You Might Not Make the Same Mistakes

• My high school teacher Mr. Wilson said I would be a failure at chemistry

• My PhD is in chemistry

• The opportunity to live in different places shaped my life

• Good friends are forever5/3/12

Page 5: Bioinformatics in the Bourne Lab

UCSD BILD 94 5

40+ Years Later

Ten Simple Rules for Starting a CompanyPLoS Comp Biol 2012 8(3) 1002439

5/3/12

Page 6: Bioinformatics in the Bourne Lab

UCSD BILD 94 65/3/12

Page 7: Bioinformatics in the Bourne Lab

UCSD BILD 94 7

PhD in Physical Chemistry

5/3/12

Page 8: Bioinformatics in the Bourne Lab

UCSD BILD 94 8

Always Loved Computing

Circa 19745/3/12

Page 9: Bioinformatics in the Bourne Lab

UCSD BILD 94 9

Postdoctoral Work – The Molecular Basis of How the Body Works

• Regrets: never learnt another language

5/3/12

Page 10: Bioinformatics in the Bourne Lab

UCSD BILD 94 10

Post Doc

5/3/12

Page 11: Bioinformatics in the Bourne Lab

UCSD BILD 94 11

Some Things Stay with You Your Whole Life

5/3/12

Page 12: Bioinformatics in the Bourne Lab

UCSD BILD 94 12

Senior Scientist HHMI Columbia University New York

• Driven not by career but wanting to live in New York City

5/3/12

Page 13: Bioinformatics in the Bourne Lab

UCSD BILD 94 13

~1990 Got Involved with the The Human Genome

• Was only possible by applying computers to problems in biology

• Developed algorithms to support physical and genetic mapping of Chr 13

5/3/12

Page 14: Bioinformatics in the Bourne Lab

UCSD BILD 94 14

Came to UCSD to Apply Computers to Big Biological Problems

• Possibly the best place in the world to do computational biology

5/3/12

Page 15: Bioinformatics in the Bourne Lab

UCSD BILD 94 155/3/12

Page 16: Bioinformatics in the Bourne Lab

UCSD BILD 94 16

The Protein Kinase Family•A large family important to signal transduction in eukaryotes and many bacteria.

•Phosphotransferases: transfer phosphate group from ATP to Ser/Thr or Tyr residue on target protein, producing a range of downstream signaling effects.

•PKA: an example of a typical protein kinase (TPK) fold, shown in “open book” format

5/3/12

Page 17: Bioinformatics in the Bourne Lab

UCSD BILD 94 17

Sometime Ya Got to Just Do It Yourself

5/3/12

Page 18: Bioinformatics in the Bourne Lab

UCSD BILD 94 18

Num

ber

of r

elea

sed

entr

ies

Year

The Growth of Data is A Major Driver in Biology

5/3/12

Page 20: Bioinformatics in the Bourne Lab

UCSD BILD 94 20

Big Research Questions in the Lab1. Can we improve how science is

disseminated and comprehended?

2. What is the ancestry of the protein structure universe and what can we learn from it?

3. Are there alternative ways to represent proteins from which we can learn something new?

4. What really happens when we take a drug?

5. Can we contribute to the treatment of neglected {tropical} diseases?

August 14, 2009

5/3/12

Page 21: Bioinformatics in the Bourne Lab

UCSD BILD 94 21

Studying Evolution Through Structure

5/3/12

Page 22: Bioinformatics in the Bourne Lab

UCSD BILD 94 22

Nature’s Reductionism

There are ~ 20300 possible proteins>>>> all the atoms in the Universe

11.2M protein sequences from 10,854 species (source RefSeq)

38,221 protein structures yield 1195 domain folds (SCOP 1.75)

5/3/12

Page 23: Bioinformatics in the Bourne Lab

UCSD BILD 94 23

Initial Question:With the current coverage of proteomes by structure and

assuming we know a high percentage of all folds, is structure a useful discriminator of species?

5/3/12

Page 24: Bioinformatics in the Bourne Lab

UCSD BILD 94 24

Chapter 2 Initial Findings

Russ Doolittle, Professor

Center for Molecular GeneticsUCSD

Song YangPost Doc UC Berkeley

Department of Chemistry and BiochemistryUCSD

Yang, Doolittle & Bourne (2005) PNAS 102(2) 373-8

5/3/12

Page 25: Bioinformatics in the Bourne Lab

UCSD BILD 94 25

To Answer this Question We Only Need to Make Use of Existing Resources

• SCOP – Further catalogs Nature’s reductionism into structural domains, folds, families and superfamilies

• SUPERFAMILY assigns the above to fully sequenced proteomes

5/3/12

Page 26: Bioinformatics in the Bourne Lab

UCSD BILD 94 26

The SCOP Hierarchy v1.75Based on 38221 Structures

7

1195

1962

3902

110800

5/3/12

Page 27: Bioinformatics in the Bourne Lab

UCSD BILD 94 27

Is Structure a Useful Discriminator of Species? - Maybe… Distribution among the three kingdomsas taken from SUPERFAMILY

• Superfamily distributions would seem to be related to the complexity of life

• Update of the work of Caetano-Anolles2 (2003) Genome Biology 13:1563

Eukaryota (650)

Archaea (416) Bacteria (564)

2 42

10

135

118

387

17

SCOP fold (765 total)

1

153/14

9/1

21/2 310/0645/49

29/0 68/0

Any genome / All genomes

5/3/12

Page 28: Bioinformatics in the Bourne Lab

UCSD BILD 94 28

Method – Distance Determination

(FSF)SCOP

SUPERFAMILY

organisms

C. intestinalis C. briggsae F. rubripes

a.1.1 1 1 1

a.1.2 1 1 1

a.10.1 0 0 1

a.100.1 1 1 1

a.101.1 0 0 0

a.102.1 0 1 1

a.102.2 1 1 1

C. intestinalis C. briggsae F. rubripes

C. intestinalis 0 101 109

C. briggsae 0 144

F. rubripes 0

Presence/Absence Data Matrix

Distance Matrix

Chapter 2 Initial Findings5/3/12

Page 29: Bioinformatics in the Bourne Lab

UCSD BILD 94 29

Is Structure a Useful Discriminator of Species? - Yes

Archaea Bacteria Eukaryota

The method cleanly placed all species in their correct superkingdoms

5/3/12

Page 30: Bioinformatics in the Bourne Lab

UCSD BILD 94 30

The Answer Would Appear to be Yes

• It is possible to generate a reasonable tree of life from merely the presence or absence of superfamilies (FSFs) within a given proteome

5/3/12

Page 31: Bioinformatics in the Bourne Lab

UCSD BILD 94 31

Environmental Influence

Chris Dupont Scripps Institute of Oceanography

UCSD

DuPont, Yang, Palenik, Bourne. 2006 PNAS 103(47) 17822-17827

5/3/12

Page 32: Bioinformatics in the Bourne Lab

UCSD BILD 94 32

Consider the Distribution of Disulfide Bonds among Folds

• Disulphides are only stable under oxidizing conditions

• Oxygen content gradually accumulated during the earth’s evolution

• The divergence of the three kingdoms occurred 1.8-2.2 billion years ago

• Oxygen began to accumulate ~ 2.0 billion years ago

• Logical deduction – disulfides more prevalent in folds (organisms) that evolved later

• This would seem to hold true• Can we take this further?

Eukaryota

Archaea Bacteria

0% (0/2)

16.7% (7/42)

0% (0/10)

31.9% (43/135)

14.4% (17/118) 4.7%

(18/387)

5.9% (1/17)

SCOP fold (708 total)

1

5/3/12

Page 33: Bioinformatics in the Bourne Lab

UCSD BILD 94 33

Evolution of the Earth

• 4.5 billion years of change• 300+50K• 1-5 atmospheres• Constant photoenergy• Chemical and geological

changes• Life has evolved in this time

• The ocean was the “cradle” for 90% of evolution

5/3/12

Page 34: Bioinformatics in the Bourne Lab

UCSD BILD 94 34

• Whether the deep ocean became oxic or euxinic following the rise in atmospheric oxygen (~2.3 Gya) is debated, therefore both are shown (oxic ocean-solid lines, euxinic ocean-dashed lines).

• The phylogenetic tree symbols at the top of the figure show one idea as to the theoretical periods of diversification for each Superkingdom.

0

0.5

1

1.00E-20

1.00E-16

1.00E-12

1.00E-08

1.00E-15

1.00E-12

1.00E-09

1.00E-06

1.00E-11

1.00E-09

1.00E-07

00.511.522.533.544.5

Billions of years before present

Concentration

(O2

in arbitrary units, Zn and Fe in m

oles L-1

BacteriaArchaea

Eukarya

Oxygen

Zinc

Iron

CobaltManganese

Theoretical Levels of Trace Metals and Oxygen in the Deep Ocean Through Earth’s History

Replotted from Saito et al, 2003Inorganica Chimica Acta 356: 308-318

5/3/12

Page 35: Bioinformatics in the Bourne Lab

UCSD BILD 94 35

Bacteria Fe superfamilies

a.1.1 a.1.2

a.104.1 a.110.1

a.119.1 a.138.1

a.2.11 a.24.3

a.24.4 a.25.1

a.3.1 a.39.3

a.56.1 a.93.1

b.1.13 b.2.6

b.3.6 b.33.1

b.70.2 b.82.2

c.56.6 c.83.1

c.96.1 d.134.1

d.15.4 d.174.1

d.178.1 d.35.1

d.44.1 d.58.1

e.18.1 e.19.1

e.26.1 e.5.1

f.21.1 f.21.2

f.24.1 f.26.1

g.35.1 g.36.1

g.41.5

Eukaryotic Fe superfamilies

a.1.1 a.1.2

a.104.1 a.110.1

a.119.1 a.138.1

a.2.11 a.24.3

a.24.4 a.25.1

a.3.1 a.39.3

a.56.1 a.93.1

b.1.13 b.2.6

b.3.6 b.33.1

b.70.2 b.82.2

c.56.6 c.83.1

c.96.1 d.134.1

d.15.4 d.174.1

d.178.1 d.35.1

d.44.1 d.58.1

e.18.1 e.19.1

e.26.1 e.5.1

f.21.1 f.21.2

f.24.1 f.26.1

g.35.1 g.36.1

g.41.5

Superfamily Distribution As Well As Overall Content Has Changed

5/3/12

Page 36: Bioinformatics in the Bourne Lab

UCSD BILD 94 36

Hypothesis

• Emergence of cyanobacteria changed oxygen concentrations

• Impacted metal concentrations in the ocean• Organisms used new metals in new ways to

evolve new biological processes eg complex signaling

• This in turn further impacted the environment

5/3/12

Page 37: Bioinformatics in the Bourne Lab

UCSD BILD 94 37

Big Research Questions in the Lab1. Can we improve how science is

disseminated and comprehended?

2. What is the ancestry of the protein structure universe and what can we learn from it?

3. Are there alternative ways to represent proteins from which we can learn something new?

4. What really happens when we take a drug?

5. Can we contribute to the treatment of neglected {tropical} diseases?

August 14, 2009

5/3/12

Page 38: Bioinformatics in the Bourne Lab

UCSD BILD 94 38

Our Motivation• Tykerb – Breast cancer

• Gleevac – Leukemia, GI cancers

• Nexavar – Kidney and liver cancer

• Staurosporine – natural product – alkaloid – uses many e.g., antifungal antihypertensive

Collins and Workman 2006 Nature Chemical Biology 2 689-700Motivators5/3/12

Page 39: Bioinformatics in the Bourne Lab

UCSD BILD 94 39

Our Broad Approach

• Involves the fields of:– Structural bioinformatics– Cheminformatics – Biophysics– Systems biology – Pharmaceutical chemistry

• L. Xie, L. Xie, S.L. Kinnings and P.E. Bourne 2012 Novel Computational Approaches to Polypharmacology as a Means to Define Responses to Individual Drugs, Annual Review of Pharmacology and Toxicology 52: 361-379

• L. Xie, S.L. Kinnings, L. Xie and P.E. Bourne 2012 Predicting the Polypharmacology of Drugs: Identifying New Uses Through Bioinformatics and Cheminformatics Approaches in Drug Repurposing M. Barrett and D. Frail (Eds.) Wiley and Sons. (available upon request)

5/3/12

Page 40: Bioinformatics in the Bourne Lab

Approach - Need to Start with a 3D Drug-Receptor Complex – Either Experimental or Modeled

Generic Name Other Name Treatment PDBid

Lipitor Atorvastatin High cholesterol 1HWK, 1HW8…

Testosterone Testosterone Osteoporosis 1AFS, 1I9J ..

Taxol Paclitaxel Cancer 1JFF, 2HXF, 2HXH

Viagra Sildenafil citrate ED, pulmonary arterial hypertension

1TBF, 1UDT, 1XOS..

Digoxin Lanoxin Congestive heart failure

1IGJ

5/3/12 UCSD BILD 94 40

Page 41: Bioinformatics in the Bourne Lab

A Reverse Engineering Approach to Drug Discovery Across Gene FamiliesCharacterize ligand binding site of primary target (Geometric Potential)

Identify off-targets by ligand binding site similarity(Sequence order independent profile-profile alignment)

Extract known drugs or inhibitors of the primary and/or off-targets

Search for similar small molecules

Dock molecules to both primary and off-targets

Statistics analysis of docking score correlations

Xie and Bourne 2009 Bioinformatics 25(12) 305-312

5/3/1241

Page 42: Bioinformatics in the Bourne Lab

UCSD BILD 94 42

• Initially assign Ca atom with a value that is the distance to the environmental boundary

• Update the value with those of surrounding Ca atoms dependent on distances and orientation – atoms within a 10A radius define i

0.2

0.1)cos(

0.1

i

Di

PiPGP

neighbors

Conceptually similar to hydrophobicity or electrostatic potential that is dependant on both global and local environments

Characterization of the Ligand Binding Site - The Geometric Potential

Xie and Bourne 2007 BMC Bioinformatics, 8(Suppl 4):S9

5/3/12

Page 43: Bioinformatics in the Bourne Lab

UCSD BILD 94 43

Discrimination Power of the Geometric Potential

0

0.5

1

1.5

2

2.5

3

3.5

4

0 11 22 33 44 55 66 77 88 99

Geometric Potential

binding site

non-binding site

• Geometric potential can distinguish binding and non-binding sites

100 0

Geometric Potential Scale

For Residue Clusters

5/3/12

Page 44: Bioinformatics in the Bourne Lab

UCSD BILD 94 44

Local Sequence-order Independent Alignment with Maximum-Weight Sub-Graph Algorithm

L E R

V K D L

L E R

V K D L

Structure A Structure B

• Build an associated graph from the graph representations of two structures being compared. Each of the nodes is assigned with a weight from the similarity matrix

• The maximum-weight clique corresponds to the optimum alignment of the two structures

Xie and Bourne 2008 PNAS, 105(14) 5441

5/3/12

Page 45: Bioinformatics in the Bourne Lab

UCSD BILD 94 45

Similarity Matrix of Alignment

Chemical Similarity• Amino acid grouping: (LVIMC), (AGSTP), (FYW), and (EDNQKRH)• Amino acid chemical similarity matrix

Evolutionary Correlation• Amino acid substitution matrix such as BLOSUM45• Similarity score between two sequence profiles

ia

i

ib

ib

i

ia SfSfd

fa, fb are the 20 amino acid target frequencies of profile a and b, respectivelySa, Sb are the PSSM of profile a and b, respectively

5/3/12

Page 46: Bioinformatics in the Bourne Lab

UCSD BILD 94 46

The Problem with Tuberculosis

• One third of global population infected• 1.7 million deaths per year• 95% of deaths in developing countries• Anti-TB drugs hardly changed in 40 years• MDR-TB and XDR-TB pose a threat to

human health worldwide• Development of novel, effective and

inexpensive drugs is an urgent priority

5/3/12

Page 47: Bioinformatics in the Bourne Lab

UCSD BILD 94 47

The TB-Drugome

1. Determine the TB structural proteome

2. Determine all known drug binding sites from the PDB

3. Determine which of the sites found in 2 exist in 1

4. Call the result the TB-drugomeKinnings et al 2010 PLoS Comp Biol 6(11): e1000976

5/3/12

Page 48: Bioinformatics in the Bourne Lab

UCSD BILD 94 48

1. Determine the TB Structural Proteome

284

1, 446

3, 996 2, 266

TB proteome

homology models

solved structu

res

• High quality homology models from ModBase (http://modbase.compbio.ucsf.edu) increase structural coverage from 7.1% to 43.3%

5/3/12

Page 49: Bioinformatics in the Bourne Lab

UCSD BILD 94 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 370

20

40

60

80

100

120

140

2. Determine all Known Drug Binding Sites in the PDB

• Searched the PDB for protein crystal structures bound with FDA-approved drugs

• 268 drugs bound in a total of 931 binding sites

No. of drug binding sites

No.

of d

rugs

MethotrexateChenodiol

AlitretinoinConjugated estrogens

DarunavirAcarbose

5/3/12

Page 50: Bioinformatics in the Bourne Lab

UCSD BILD 94

Map 2 onto 1 – The TB-Drugomehttp://funsite.sdsc.edu/drugome/TB/

Similarities between the binding sites of M.tb proteins (blue), and binding sites containing approved drugs (red).

Page 51: Bioinformatics in the Bourne Lab

Research is a Good Life