a network-based representation of protein fold space

21
A network-based representation of protein fold space Spencer Bliven Qualifying Examination 6/6/2011

Upload: anne

Post on 24-Feb-2016

30 views

Category:

Documents


1 download

DESCRIPTION

A network-based representation of protein fold space. Spencer Bliven. Qualifying Examination. 6/6 / 2011. Overview. Background & Motivation Preliminary Research Proposed Future Research. Fold Space. What protein folds ar e possible? Discrete or Continuous? Both? Neither ? - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A network-based representation of protein fold space

A network-based representation of protein fold space

Spencer Bliven

Qualifying Examination 6/6/2011

Page 2: A network-based representation of protein fold space

Overview1. Background & Motivation2. Preliminary Research3. Proposed Future Research

Page 3: A network-based representation of protein fold space

Fold SpaceWhat protein folds are possible?Discrete or Continuous? Both? Neither?What portion of fold space is utilized by nature?Long debated questions. Why?

Understanding of structure-function relationshipProtein design/engineeringProtein evolutionClassification

Page 4: A network-based representation of protein fold space

Previous Work Orengo, Flores, Taylor,

Thornton. Protein Eng (1993) vol. 6 (5) pp. 485-500

Holm and Sander. J Mol Biol (1993) vol. 233 (1) pp. 123-38

Holm and Sander. Science (1996) vol. 273 (5275) pp. 595-603

Shindyalov and Bourne. Proteins (2000) vol. 38 (3) pp. 247-60

Hou, Sims, Zhang, Kim. PNAS (2003) vol. 100 (5) pp. 2386-90

Taylor. Curr Opin Struct Biol (2007) vol. 17 (3) pp. 354-61

Sadreyev et al. Curr Opin Struct Biol (2009) vol. 19 (3) pp. 321-8

α

α+β

β

α/β

Page 5: A network-based representation of protein fold space

Why can we do better?More structuresSampling of globular folds “saturated”

Few novel folds being discoveredGeometric arguments for saturation of

small protein foldsRecent all-vs-all computation

Cluster sequence to 40% identity17,852 representative (updated weekly)189 million FATCAT rigid-body alignments

73503

http://www.rcsb.org/pdb/statistics/contentGrowthChart.do?content=total&seqid=100Accessed 5/31/2011

Page 6: A network-based representation of protein fold space

Structural Similarity Graph Nodes: PDB chains,

non-redundant to 40% Edges: FATCAT-rigid

alignments “Significant” edges:

p<0.001 Length > 25 Coverage > 50

Hierarchically cluster to reduce complexity in visualization

aba/ba+bMultiMembraneSmall

Page 7: A network-based representation of protein fold space

Agreement with SCOP

Class p<10-6

Fold p<10-7

Superfamily p<10-10

Page 8: A network-based representation of protein fold space

Continuity

Grishin. J Struct Biol (2001) vol. 134 (2-3) pp. 167-85

Skolnick claims ≤ 7 intermediates between any proteinsWe observe network diameter=15

Can find interesting paths

Page 9: A network-based representation of protein fold space

C4C5C6C7

Symmetry

Beta Propellers

Page 10: A network-based representation of protein fold space

SymmetryFunctionally important

Protein evolution (e.g. beta-trefoil)DNA bindingAllosteric regulationCooperativity

Widespread (~20% of proteins)Focus of algorithmic work

FGF-1 Lee & Blaber. PNAS 2011

TATA Binding Protein1TGH

Hemoglobin4HHB

Page 11: A network-based representation of protein fold space

Cross-class example 3GP6.A

PagP, modifies lipid A f.4.1 (transmembrane

beta-barrel)

1KT6.A Retinol-binding protein b.60.1 (Lipocalins)

Page 12: A network-based representation of protein fold space

Summary of Preliminary Research

Calculated all-vs-all alignment Prlić A, Bliven S, Rose PW, Bluhm WF, Bizon C, Godzik A, Bourne PE. Pre-

calculated protein structure alignments at the RCSB PDB website. Bioinformatics (2010) vol. 26 (23) pp. 2983-2985

Built network of significant alignmentsApproximately matches SCOP classifications

Improved structural alignment algorithms Identify symmetry, circular permutations, topology

independent alignments Discussed more in report

Page 13: A network-based representation of protein fold space

Future ResearchImprove the network

1. Improve all-vs-all comparison algorithm2. Tune parameters during graph generation

Annotate the network & draw biological inferences3. Annotate nodes with functional information4. Compare with other networks

Create new networks5. Enhance structural comparison algorithms

Page 14: A network-based representation of protein fold space

1. Improve all-vs-all comparison algorithm

Need domain decompositionUse Combinatorial Extension (CE)

Page 15: A network-based representation of protein fold space

2. Tune parameters during graph generation

Don’t use p-valuesShouldn’t compare p-values, statistically*Not normalized by secondary structureNot accurate due to multiple testing problem

Use TM-scoreRMSD, normalized to the alignment length

Determine optimal thresholds for determining “significance”For instance, train an SVG

* Technically ok here, since one-to-one with the FATCAT score

Page 16: A network-based representation of protein fold space

FATCAT p-value by Class

Perform poorly on all-alpha in “twilight zone”

Terrible on membrane proteins Probably reflects non-

structural considerations in SCOP assignment

Page 17: A network-based representation of protein fold space

3. Annotate nodes with functional information

SCOP/CATH classificationsGO termsMetal bindingLigand bindingSymmetry

aba/ba+bMultiMembraneSmall

Page 18: A network-based representation of protein fold space

4. Compare with other networks

Define other types of network over the set of protein representativesProtein-protein interactionsCo-expression

Correlate to the structural similarities

Structural similarity

Protein-protein interaction

Page 19: A network-based representation of protein fold space

5. Enhance structural comparison algorithms

Improve automated pseudo-symmetry detection

Find topology-independent relationships

C3

Page 20: A network-based representation of protein fold space

SummaryFold space as networkImprove network creationAnnotate network with functional informationImprove structural similarity detection

Page 21: A network-based representation of protein fold space

AcknowledgmentsBourne Lab

Philip BourneAndreas PrlićLab & PDB members

Qualifying Exam Committee

Ruben AbagyanPatricia JenningsAndy McCammon

Collaborators

Philippe YoukharibacheJean-Pierre Changeux

Rotation Advisors

Pavel PevznerPhilip BourneJosé Onuchic & Pat JenningsMike MacCossVirgil Woods