computational protein design. 3. applications in systems and synthetic biology

58
Computational Protein Design 3. Applications of Computational Protein Design Pablo Carbonell [email protected] iSSB, Institute of Systems and Synthetic Biology Genopole, University d’Évry-Val d’Essonne, France mSSB: December 2010 Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 1 / 58

Upload: pablo-carbonell

Post on 29-Oct-2014

30 views

Category:

Technology


8 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Computational Protein Design3. Applications of Computational Protein Design

Pablo [email protected]

iSSB, Institute of Systems and Synthetic BiologyGenopole, University d’Évry-Val d’Essonne, France

mSSB: December 2010

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 1 / 58

Page 2: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Outline

1 Applications in Systems and Synthetic Biology

2 Protein Affinity Enhancement

3 Protein Modular Design

4 Protein Promiscuity Reengineering

5 Conclusions

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 2 / 58

Page 3: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Outline

1 Applications in Systems and Synthetic Biology

2 Protein Affinity Enhancement

3 Protein Modular Design

4 Protein Promiscuity Reengineering

5 Conclusions

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 3 / 58

Page 4: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Applications of CPD in Systems Biology

The challenge : robust and reliablemethods of information correlation andintegration of HT -omics networks

Unveiling new relationships that closesthe gap between

molecular characteristics of proteinsand other compounds within the cellsystems characteristics of the cellas whole

Computational intelligence algorithmsfor large-scale discovery studies

Choosing the right set of descriptors

Generating cellular interactionnetworks : the structuralinteractome

The Structural Interactome

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 4 / 58

Page 5: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Applications of CPD in Synthetic Biology

Engineering signal transduction: modifying the specificity and specificity ofreceptors

Engineering genetic networksModifying transcriptionTargeting gene repair and modification

Novel biosensorsMinimal cells and synthetic genomesMetabolic pathway engineering

Feedback loops design and sensitivity analysis

Programmable switches: allosteric, epigenetic, riboswitchesConditionally delivery of drugsModulation of signal transduction pathwaysInhibition of protein functionAdoption of a toxic conformation

Cell-cell communication

Orthogonal genes

Mathematical dynamical models

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 5 / 58

Page 6: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Outline

1 Applications in Systems and Synthetic Biology

2 Protein Affinity Enhancement

3 Protein Modular Design

4 Protein Promiscuity Reengineering

5 Conclusions

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 6 / 58

Page 7: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Antibody-Antigen Interactions

Antibodies are gamma globulin proteins foundin the immune system of vertebrates

Basic structural units:Two large heavy chains (VH )Two small light chains (VL)

The Fab region or fragment antigen-binding isa region of an antibody that binds to antigens

The Fc region or fragment crystallizable regionis the tail region that interact with cell surfacereceptors

The FV region : variable domain

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 7 / 58

Page 8: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

The Variable Domain FV

The variable domain is the most importantregion for binding to antigens

The FV contains3 variable loops of β-strands on the light chainVL3 variable loops of β-strands on the heavy chainVH

These loops are referred to as thecomplementarity determining regions (CDRs)

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 8 / 58

Page 9: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

In Silico Design of Immunodiagnostics Assays for Anti TNF-α

Tumor necrosis factor-alpha (TNF-α), a cytokine involved in systemic inflammation,can induce several cell responses depending on the cellular context:

activation of NF-κβ-mediated proliferative programsprogrammed cell death.

The early detection of innusual concentrations of TNF-α is a diagnosticbiomarker of inflammation conditions such as metabolic disorders (obesity),rheumatoid, tuberculosis, and cancer diseases.

Moreover, the use of anti-TNF-α inhibitors have appeared in recent years as a newtherapeutic approach for inflammatory immune-mediated diseases.

The currently used TNF-α inhibitory molecules are antibodies or soluble TNFreceptors which sequester TNF-α.

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 9 / 58

Page 10: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Computational Protein Affinity Design for Anti TNF-α Antibodies

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 10 / 58

Page 11: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Building the Model

No crystal structure available of theTNF-α antibody-antigen complex

Therefore, our first step is to build amodel of the complex throughstructural homology and docking

TNF-α trimer

Anti-TNF-α model from Swiss-Model

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 11 / 58

Page 12: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Docking and Scoring

Using zDock (Accelrys Inc.) for thegeneration of docked complexes

Fast Fourier Transform based proteindocking program.The top 2000 ranked predictions arereturned.

Scoring the complexes through theuse of FastContact

Contact binding free energy scoringtool for protein-protein complexstructuresThe estimates are based on rigidbodies

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 12 / 58

Page 13: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Hot-spots and Energy Minimization

Predicting hot-spotsBy using Foldx , we performed an in silico alaninescanning in order to predict consensus hot-spots forthe models.These hot-spots were experimentally verified in thelaboratory by the experimental group.

3 initial models were selected based on differentcriteria:

minimum predicted binding energy in FastContacthighest coverage of known hot-spots in anti-TNF-α.

Energy was then minimized for the complexes byusing Discovery Studio (Accelrys Inc.).

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 13 / 58

Page 14: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

In Silico Combinatorial Library

In silico combinatorial libraries of mutants around thecomplementary determining regions (CDR) were built asfollows:

Models for single-mutation variants were computedthrough through the use of Biopolymer and Builder(Accelrys Inc.) for rotamer selection and side chainpositioning

Mutants were then submitted to a cluster machine of64× 4-core nodes for local energy minimization of theCDRs by using gromacs

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 14 / 58

Page 15: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Virtual Screening

The most beneficial mutations were selected in order to build a combinatoriallibrary of double and triple mutants.

Variants with the lowest predicted binding affinity were shortlisted and comparedwith beneficial mutations observed in the literature

Computation time: 2 weeks in 64 nodes × 4 cores cluster.

The 6 best mutation were transferred to the molecularbiology laboratory to be tested through ELISAimmunoprecipitation assays.

Then, a new round of virtual screening was launched starting from the bestpredicted variants.

After three rounds, values close to a 3-fold improvement in binding affinity(measured as − log10 Kd ) were obtained.

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 15 / 58

Page 16: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Outline

1 Applications in Systems and Synthetic Biology

2 Protein Affinity Enhancement

3 Protein Modular Design

4 Protein Promiscuity Reengineering

5 Conclusions

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 16 / 58

Page 17: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

The Modular Organization of Binding Sites

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 17 / 58

Page 18: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

The modular Distribution of Domain-Domain Binding

Why choosing domains?

Domains form independent structural andfunctional units

Domains are building blocks that can berearranged to create proteins with differentfunctions

Domains are evolutionarily conserved:different organisms use the same domains inprotein-protein interactions

Objective : large-scale topological analysis ofbinding domains

DatasetSource : iPFAM

330 protein domains

370 domain-domain interactions

Multiple alignments

5 organisms: E. coli, S. cerevisiae, C. elegans D.melanogaster, H. sapiens

Binding site clustering :

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 18 / 58

Page 19: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Graph Modular Decomposition

Domains can be decomposed furtherinto connectivity modules byclustering the domain contact mapG(V ,E ,C)

Girvan-Newman algorithm [PNAS(2002)] with maximum modularity stoprule [Kashtan and Alon, PNAS (2005)]:

1 The betweenness of all existing edgesin the network is calculated first.Edge betweenness : the number ofshortest paths between pairs of nodesthat run along the edge

2 The edge with the highestbetweenness is removed

3 The betweenness of all edgesaffected by the removal is recalculated

4 Repeat 2 and 3 until the modularity Qfor the K connected clusters in thenetwork becomes maximum

Q =KX

s=1

"lsL−„

ds

2L

«2#

(1)

ls = number of edges between nodes in module s

ds = sum of node degrees in module s

L = total number of edges in the network

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 19 / 58

Page 20: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Modularity

Modularity Qs is a measure of how tightly members of a module s interact

Qs =lsL−„

ds

2L

«2

(2)

ls = number of edges between nodes in module sds = sum of node degrees in module sL = total number of edges in the network

lsL : fraction of edges in the network that connect vertices in the module s` ds

2L

´2: the expected value of the same quantity if edges fall at random

l̂s =ds

2ps =

ds

2ds/2

L(3)

ps : probability of an edge to connect nodes in module s

In a randomly partitioned network, the expected modularity is Q̂s = 0

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 20 / 58

Page 21: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Biding Site and Modular Overlaps

Modular composition of binding site j :

mj = (mj1,mj2, . . . ,mjM ) (4)

Similarity in modular compoisitionbetween binding sites i and j :

M(i, j) =

PMk=1 mik mjk

|mi||mj |(5)

Relative interface between i ad j :

C(i, j) =12

»ni

Ni+

nj

Nj

–(6)

ni (nj ) : number of residues in i (j) withcontacts in j (i)Ni (Nj ): number of residues in bindingsite i (j)

Kringle domain (PF00051)

Binding site A (blue)

Binding site B (red)

C(A, B) =1

2

4

10+

3

8

!(7)

M(A, B) =(2, 8, 0, 0, 0) · (0, 2, 3, 3, 0)T

√68√

23(8)

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 21 / 58

Page 22: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

The Modular Organization of Domain-Domain Interfaces

Non-overlapping binding sitesare assigned to differentmodules

Modules with high modularityQ contain a significantpercentage of binding siteregions

[Del Sol, Carbonell, PLOS Comp. Biology, (2007)]

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 22 / 58

Page 23: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Using Modularity to Identify Binding Regions

Modularity can be used toidentify binding surfaces

Accuracy and coverage ofmodularity and surfacehydrophobic patches aregreater than residueconservationCombining modularity withthe other two methodsimproves notably theperformance

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 23 / 58

Page 24: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Intra-Module Cooperativity and Inter-Module Independence

Human IL-4: a cytokine that plays aregulatory role in the immune system

IL-4 contains 3 energeticallyindependent clusters of hot-spotslocated in 3 modules

These hot-spots can be used togenerate binding affinity andspecificity

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 24 / 58

Page 25: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Intra-Module Cooperativity and Inter-Module Independence

TEM1 β-lactamase confers antibioticresistance to E. coli

This enzyme is inhibited by BLIP

A mutagenesis study showed thatthere are 2 hot-spot clusters which areenergetically independentThese clusters are located in differentmodules

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 25 / 58

Page 26: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Intra-Module Cooperativity and Inter-Module Independence

TCR hVβ2.1 (TSST-1 antibody). 2 cooperative distant clusters

of hot-spots around the binding site located in 1 module

CI-2 Serine protease Chymotrypsin inhibitor. A cluster of

hot-spot located far away from the binding interface

hGHbp (human growth hormone). Cooperative hot-spots

distant to the binding site

RI (ribonuclease inhibitor). Hot-spots located in different

modules are known to be independent

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 26 / 58

Page 27: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Modularity as a Measure of Residue Cooperativity

Protein domains can be decomposed into a set of modules that contain groups ofspecialized residuesBinding sites are usually located in highly cooperative modulesModularity, combined with sequence conservation and surface patches, can beused to predict functional regions

This modular architecture confers robustness to protein structures andcontributes to the determination of binding affinity and specificity

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 27 / 58

Page 28: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Energetic Determinants of Protein Binding Affinity

The modular decomposition of proteinstructures is a structural characterization ofprotein interactions

In order to know more about the interplaybetween binding affinity and specificity, it isnecessary a thermodynamicscharacterizationWe focus in this study on one specificinteractome: the yeast interactome (mainsource: MIPS)

Structural interactome: for 259 hubs(>5 partners) participating in 877 differentinteractions

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 28 / 58

Page 29: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Binding Site Clustering

Single and multiple interfacesBinding sites correspond to residues interacting with the partner at a distance≤ 5 ÅBinding sites are mapped into the reference sequence of the hub and clustered byusing a version of the algorithm in Teyra et al. [2008]

1 Compute the N × N binary distance matrix D where

D(i, j) = δij

1 i ∩ j 6= ∅0 i ∩ j = ∅ (9)

2 Start with k = N clusters3 Compute the {k − 1}-means clustering of D4 Recompute D for the k − 1 clusters5 Repeat step 3 while all binding sites within clusters overlap

Total interfaces: 539, involved in 1 to 5 interactions

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 29 / 58

Page 30: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Protein Binding Affinity and Specificity

Binding energies and alanine scanning for each complex estimated using FoldX[Schymkowitz et al., 2005]

Specific binding sites tend to bind their partners with higher affinity thanpromiscuous sites

Interactions between promiscuous binding sites tend to be weaker

Interaction type −∆G [(kcal/mol)/resid]

Specific-specific 0.93Promiscuous-promiscuous 0.85Specific-promiscuous 0.50

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 30 / 58

Page 31: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Hot-Spots and Partner Motifs

A hot-spot : |∆∆Gbind | = |∆GMUT→ALA −∆GWT | ≥ 2 kcal/mol

In most of the cases, hot-spots are specific to one interaction. Some of them arepromiscuousAre hot-spots specific?

Binding site motifs of interacting partners are determinants of specificityAs the promiscuity of the hot-spots increases, the number of common motifs in thepartners increaseA common evolutionary origin of divergent partners in promiscuous binding

Number of interac-tions in hot-spots

Average number of commonmotifs interacting with hot-spots

1 1.42 2.53 3.04 4.0

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 31 / 58

Page 32: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Hot-spots Modular Distribution and Specificity

We have shown already examples of energetic independence of hot-spots inmodulesFurthermore, the relative number of binding site modules containing hot-spotsincreases with the number of partnersA small part of hot-spots participate in more than one interaction, probably actingas binding site anchors

[ Carbonell, Nussinov, Del Sol, Proteomics, 2009]

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 32 / 58

Page 33: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Modular Distribution of Hot-spots and Specificity

Ubiquitin. A promiscuous protein with weak interactions

cdc42 GTPase. It contains a central module acting as a site

anchor

Cytochrome b. An example of a specific binding site

Calmoduline-dependent kinase. An example of a specific

binding site

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 33 / 58

Page 34: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

The Role of Thermodynamics in Promiscuous Binding

In general, protein-protein interactions involving promiscuous binding sites areweakerProteins generally interact with partners with a similar degree of promiscuityHot-spots in promiscuous binding sites tend to be more distributed over differentmodules

Knowing the modular distribution of hot-spots involved in different interactionsmight allow us to rationally modify binding specificity and affinity

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 34 / 58

Page 35: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Large-scale Analysis Workflow

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 35 / 58

Page 36: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Outline

1 Applications in Systems and Synthetic Biology

2 Protein Affinity Enhancement

3 Protein Modular Design

4 Protein Promiscuity Reengineering

5 Conclusions

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 36 / 58

Page 37: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Applications in Synthetic Biology: Design of Metabolic Pathways

The Bio-RetroSynth project

ANR Chair d’Excellence, Faulon’s Lab

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 37 / 58

Page 38: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Tasks in the Bio-RetroSynth project

Bioretrosynthesis. Graphs for heterologous compounds production in E. coli

Computational protein design. Machine learning to mine genomic databases forpredicting protein function

Pathway design. Rank pathways to select the best to engineer

Quantitative Structure-Activity Relationship (QSAR) for enzyme activity andinhibition based on experimental databases and toxicity assays.

Metabolic engineering. E. coli plasmids in order to construct combinatoriallibraries of highest rank heterologous pathways found to produce a target product

Engineering optimization. Flux Balance Analysis (FBA) and non-linearoptimization methods to maximize target yield

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 38 / 58

Page 39: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

The Signature Reaction Space σ(R)

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 39 / 58

Page 40: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Examples of Retrosynthesis Graphs in the Reaction Signature Space

RetroPath : an online-toolfor retrosynthesis search ofmetabolic pathways

[D. Fichera, P. Carbonell, J.L. Faulon, Predicting

heterologous compound-forming reaction pathways

through retrosynthesis hypergraphs, in preparation]

Penicillin (antibiotic) Galantamine (treatment of Alzeihmer’s disease)

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 40 / 58

Page 41: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 41 / 58

Page 42: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Ranking Pathways

Gene heterogeneity

Heterologous gene expression

Enzyme performance for the specified reaction

Compound toxicity

Estimation of nominal fluxes

Consistency of the predicted phenotype

C(p) =X

genes(p)

0@ 1perf (gene)

+ het(gene) +X

prod(gene)

tox(prod)

1A+1

flux(10)

p∗ = arg minp

C(p) (11)

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 42 / 58

Page 43: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Predicting Compound Toxicity

MIC (IC50) assays in E. coli for commercial chemical compounds, includingantibioticsMolecular signature-based QSAR model

[A.G. Planson, E. Paillard, F. Vogliolo, P. Carbonell, J.L. Faulon, unpublished]

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 43 / 58

Page 44: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Enzyme Performance

Putative reactions R∗ discovered in the signature space hσ(R) by theretrosynthesis algorithm often lack annotated enzyme sequences in databases

A protein design procedure has to be implemented in order to identify the bestheterologous enzyme sequence candidate to insert

Conceptually, the idea is to definea metric in the reaction σ(R) andsequence σ(S) signature spacesa convolution operation * betweenboth spaces that generates the kernelfunction k((R1,S1), (R2,S2))a machine-learning algorithm

In practical terms, we are searching in the sequence space S for enzymes with aputative level of promiscuity for the desired reaction R∗

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 44 / 58

Page 45: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Taking Advantage of Enzyme Promiscuity in Protein Engineering

Enzymes can potentially process multiple substrates or reactionsWe can study enzyme promiscuity to enhance enzyme efficiency by proteinengineering techniquesEnzyme promiscuity is an intermediate step in directed evolution

[Tracewell and Arnold, 2009]

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 45 / 58

Page 46: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

A Quantitive Definition of Enzyme Promiscuity

Definitions

Enzyme multispecificity: the ability of enzymes to transform a broad range of closelyrelated substrates

Promiscuous function: enzyme activities other than the native one

Using reaction signatures to measure promiscuity :

An enzyme is promiscuous if catalyzes at least 2 reactions with differentsignatures

Reaction chemical diversity for reactions RA and RB at height h:

hd(RA,RB) = 1− ||hσ(RA) · hσ(RA)||||hσ(RA)||2 + ||hσ(RB)||2 − ||hσ(RA) · hσ(RB)|| (12)

Depending on the chosen h range, it is possible to distinguish between catalyticpromiscuity and substrate specificity

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 46 / 58

Page 47: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Catalytic and Substrate Promiscuity

Given two reactions RA and RB that an enzyme can process :

The enzyme has catalytic promiscuity if

1σ(RA) 6=1 σ(RB) (13)

(We look at the bonds that are created and/or broken by the chemical transformation)

The enzyme has substrate promiscuity if

0−3σ(RA) 6=0−3 σ(RB) (14)

(We look at the chemical structures of the substrates)

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 47 / 58

Page 48: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Molecular Signatures-Based Prediction of Enzyme Promiscuity

Building the dataset

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 48 / 58

Page 49: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Support Vector Machine Algorithm

Signature space ishighly-dimensional:

2-mers: 202

3-mers: 203

4-mers: 204

...

The SVM algorithm selects the weighted combination of data points (supportvectors) that performs the best separation

We compute from the support vectors the contribution or α-value of eachsignature to the prediction of promiscuity

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 49 / 58

Page 50: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Performance of the SVM Predictor

Accuracy reaches 85% for the whole datasetEukaryotes 88%Prokaryotes 87%

4-mer α-value frequency [%]ALAA 10.9 13.9%AVAA 10.4 12.7%LAAA 11.3 11.4%ELAA 11.5 10.9%

... ... ...

Distance to catalytic residues (Catalytic Site Atlas)

Distribution of top k -mers provide insights into promiscuous active regions ofthe enzyme

Top k -mers are depleted around catalytic sites of non-promiscuous enzymes

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 50 / 58

Page 51: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Secondary Structure Around Catalytic Sites

Average deviation from random

Secondary structure distributionBeta Helix Loop

All residues 15.69% 40.64% 43.67%Catalytic sites 23.79% 32.15% 44.05%Non-promiscuous 20.85% 33.65% 45.50%Promiscuous 30.00% 29.00% 41.00%

Helices are in general underrepresented in catalytic residues

Beta strands are significantly overrepresented in promiscuous enzymes

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 51 / 58

Page 52: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Top k -mers in Promiscuity

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 52 / 58

Page 53: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Application: Reverse Engineering of a Promiscuous Transaminase

Promiscuity induced by directed evolution [Rothman and Kirsch, 2003]:

AATase (EC 2.6.1.1)→ TATase (EC 2.6.1.5)

Signatures (k -mers) with highest α-value change

[Carbonell, P., Faulon, J.L., Bioinformatics, 2010]

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 53 / 58

Page 54: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Outline

1 Applications in Systems and Synthetic Biology

2 Protein Affinity Enhancement

3 Protein Modular Design

4 Protein Promiscuity Reengineering

5 Conclusions

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 54 / 58

Page 55: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Conclusions

Computational analysis of biological networks can provide insights into themechanisms of protein binding affinity and specificity

We use molecular graph descriptors in combination with systems-levelcharacteristics to train machine-learning predictors of protein activity

Applications

Protein optimization

Understanding protein function and evolution

Design of synthetic biological circuits

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 55 / 58

Page 56: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Acknowledgments

University of Evry / GenopoleiSSB - Faulon’s LabMetabolic Engineering & Synthetic Biology

Jean-Loup Faulon Anne-Gaelle PlansonDavide Fichera Ioana PopescuJulio Peyroncely Elodie PaillardFlorence Vogliolo Chloe SarnowskiAntoine Decrulle

FuijrebioStructural Bioinformatics

Antonio del Sol Hirotomo FujihashiDolors Amoros Marcos Arauzo-Bravo

Swiss Institute of BioinformaticsPeptide identification in HPLC/MS

Ron D. Appel Alexandre Masselot

National Museum of Natural HistoryPromiscuity & Evolution

Guillaume Lecointre

National Cancer Institute (NIH)Hot-spots & Specificity

Ruth Nussinov

University of North CarolinaNMR spectroscopy

Andrew Lee

Polytechnic University of ValenciaComputational Intelligence

Jose Luis Navarro Adolfo Hilario

Polytechnic Institute of NYUNonlinear dynamics

Zhong-Ping Jiang Shiwendra Panwar

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 56 / 58

Page 57: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Computational Protein Design3. Applications of Computational Protein Design

Pablo [email protected]

iSSB, Institute of Systems and Synthetic BiologyGenopole, University d’Évry-Val d’Essonne, France

mSSB: December 2010

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 57 / 58

Page 58: Computational Protein Design. 3. Applications in Systems and Synthetic Biology

Bibliography I

S. C. Rothman and J. F. Kirsch. How does an enzyme evolved in vitro compare to naturally occurring homologs possessing the targeted function? Tyrosineaminotransferase from aspartate aminotransferase. Journal of molecular biology, 327(3):593–608, March 2003. ISSN 0022-2836. URLhttp://view.ncbi.nlm.nih.gov/pubmed/12634055.

Joost Schymkowitz, Jesper Borg, Francois Stricher, Robby Nys, Frederic Rousseau, and Luis Serrano. The FoldX web server: an online force field. Nucleicacids research, 33(Web Server issue), July 2005. ISSN 1362-4962. doi: 10.1093/nar/gki387. URL http://dx.doi.org/10.1093/nar/gki387.

Joan Teyra, Maciej Paszkowski-Rogacz, Gerd Anders, and M. Teresa Pisabarro. SCOWLP classification: structural comparison and analysis of proteinbinding regions. BMC bioinformatics, 9:9+, January 2008. ISSN 1471-2105. doi: 10.1186/1471-2105-9-9. URLhttp://dx.doi.org/10.1186/1471-2105-9-9.

Cara A. Tracewell and Frances H. Arnold. Directed enzyme evolution: climbing fitness peaks one amino acid at a time. Current opinion in chemical biology,13(1):3–9, February 2009. ISSN 1879-0402. doi: 10.1016/j.cbpa.2009.01.017. URL http://dx.doi.org/10.1016/j.cbpa.2009.01.017.

Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 58 / 58