lessons from casp targets shuoyong shi, lisa kinch, jimin pei, ruslan sadreyev, and nick v. grishin...

38

Upload: howard-bond

Post on 29-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lessons from CASP targets ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry,
Page 2: Lessons from CASP targets ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry,

Lessons from CASP targets

ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin

Howard Hughes Medical Institute, Department of Biochemistry,

University of Texas Southwestern

Medical Center at Dallas

http://prodata.swmed.edu/CASP8

Page 3: Lessons from CASP targets ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry,

1. New folds: 397_1, 496_1;

2. A few known folds are predicted no better than new folds: 460, 407_2;

3. Short motif recognition = success: 465;

4. Short motif recognition = failure: 467;

5. Structural changes not predicted: 510;

6. Inspect your alignments carefully: 480

Page 4: Lessons from CASP targets ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry,

NF – new fold – historic category in CASP

New fold: were there any?

2008 – where did the new folds go?

176 domains: 2 possibly new folds: ~1%

Page 5: Lessons from CASP targets ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry,

N-domain of T0397: 3d4r chain A residues -7-82

New fold #1: N-domain of T0397

Page 6: Lessons from CASP targets ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry,

First models for T0397_1: Gaussian kernel density estimation for GDT-TS scores of the first server models, plotted at various bandwidths (=standard deviations). The GDT-TS scores are shown as a spectrum along the horizontal axis: each bar represents first server model. The bars are colored green, gray and black for top 10, bottom 25% and the rest of servers. The family of curves with varying bandwidth is shown. Bandwidth varies from 0.3 to 8.2 GDT-TS % units with a step of 0.1, which corresponds to the color ramp from magenta through blue to cyan. Thicker curves: red, yellow-framed brown and black, correspond to bandwidths 1, 2 and 4 respectively.

First server models for T0397_1

Page 7: Lessons from CASP targets ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry,

structure and topology diagrams of ferredoxin fold – fold closest to T0397_1

Most similar: ferredoxin-like fold

Page 8: Lessons from CASP targets ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry,

N-domain of T0496: 3do9 chain A, residues 4-126

New fold #2: N-domain of T0496

Page 9: Lessons from CASP targets ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry,

First models for T0496_1: Gaussian kernel density estimation for GDT-TS scores of the first server models, plotted at various bandwidths (=standard deviations). The GDT-TS scores are shown as a spectrum along the horizontal axis: each bar represents first server model. The bars are colored green, gray and black for top 10, bottom 25% and the rest of servers. The family of curves with varying bandwidth is shown. Bandwidth varies from 0.3 to 8.2 GDT-TS % units with a step of 0.1, which corresponds to the color ramp from magenta through blue to cyan. Thicker curves: red, yellow-framed brown and black, correspond to bandwidths 1, 2 and 4 respectively.

First server models for T0496_1

Page 10: Lessons from CASP targets ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry,

structure and topology diagrams of RNAseH fold – fold closest to T0496_1

Most similar: RNAse H fold

Page 11: Lessons from CASP targets ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry,

1. New folds: 397_1, 496_1;

2. A few known folds are predicted no better than new folds: 460, 407_2;

3. Short motif recognition = success: 465;

4. Short motif recognition = failure: 467;

5. Structural changes not predicted: 510;

6. Inspect your alignments carefully: 480

Page 12: Lessons from CASP targets ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry,

E.g.#1: T0460

Know fold: some predicted no better than new!

First models for T0460: Gaussian kernel density estimation for GDT-TS scores of the first server models, plotted at various bandwidths (=standard deviations). The GDT-TS scores are shown as a spectrum along the horizontal axis: each bar represents first server model. The bars are colored green, gray and black for top 10, bottom 25% and the rest of servers. The family of curves with varying bandwidth is shown. Bandwidth varies from 0.3 to 8.2 GDT-TS % units with a step of 0.1, which corresponds to the color ramp from magenta through blue to cyan. Thicker curves: red, yellow-framed brown and black, correspond to bandwidths 1, 2 and 4 respectively.

Page 13: Lessons from CASP targets ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry,

T0460: very difficult target

Cartoon diagram of 460: 2k4n model 1 residues 1-52,67-10

Jumping through 20 NMR models of 2k4n

Page 14: Lessons from CASP targets ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry,

Cartoon diagram of 460: 2k4n model 1 residues 1-52,67-10

Cartoon diagram of NADH-quinone oxidoreductase:

2fug chain 5 residues 1-106

T0460 is homologous to Nqo5

This homologous template was NOT FOUND BY ANY SERVER !

Why?

Singleton sequence!

Page 15: Lessons from CASP targets ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry,

E.g.#2: C-domain of T0407

Know fold: some predicted no better than new!

First models for T0407_2: Gaussian kernel density estimation for GDT-TS scores of the first server models, plotted at various bandwidths (=standard deviations). The GDT-TS scores are shown as a spectrum along the horizontal axis: each bar represents first server model. The bars are colored green, gray and black for top 10, bottom 25% and the rest of servers. The family of curves with varying bandwidth is shown. Bandwidth varies from 0.3 to 8.2 GDT-TS % units with a step of 0.1, which corresponds to the color ramp from magenta through blue to cyan. Thicker curves: red, yellow-framed brown and black, correspond to bandwidths 1, 2 and 4 respectively.

Page 16: Lessons from CASP targets ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry,

Date: Mon, 2 Jun 2008 23:56:39 -0500 (CDT)From: Nick Grishin <[email protected]>To: David Baker <[email protected]>Cc: Ruslan Sadreyev <[email protected]>, Robert M Vernon <[email protected]>Subject: Re: C-terminus of T0407

I liked IG because of 1) length; 2) ~7 strands; 3) many IG are interaction domains in enzymes.

These are very compelling reasons.

Page 17: Lessons from CASP targets ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry,

Cartoon diagram of 407, C-domain: 3e38 chain A

residues 277-363

Cartoon diagram of VAP-A MSP Homology

Domain: 3z9l

T0407_2 has Immunoglobulin fold

Page 18: Lessons from CASP targets ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry,

IG-based Baker modelTop GDT server model:

Phyre_de_novo TS1

No server predicted IG fold for T0407_2

Cartoon diagram of 407, C-domain: 3e38 chain A

residues 277-363

Page 19: Lessons from CASP targets ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry,

1. New folds: 397_1, 496_1;

2. A few known folds are predicted no better than new folds: 460, 407_2;

3. Short motif recognition = success: 465;

4. Short motif recognition = failure: 467;

5. Structural changes not predicted: 510;

6. Inspect your alignments carefully: 480

Page 20: Lessons from CASP targets ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry,

T0465: who found the template?

HHpred !!!

Page 21: Lessons from CASP targets ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry,

T0465 is a diverged FYSH domain

FYSH domain of hypothetical protein AF0491: 1t95 chain A

residues 11-94

Cartoon diagram of T0465: 3dfd chain A residues 21-136

Page 22: Lessons from CASP targets ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry,

T0465 fold is predicted by HHpred

HHpred2 TS1Cartoon diagram of T0465: 3dfd chain A residues 21-136

Falcon TS1

Page 23: Lessons from CASP targets ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry,

1. New folds: 397_1, 496_1;

2. A few known folds are predicted no better than new folds: 460, 407_2;

3. Short motif recognition = success: 465;

4. Short motif recognition = failure: 467;

5. Structural changes not predicted: 510;

6. Inspect your alignments carefully: 480

Page 24: Lessons from CASP targets ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry,

T0467: most interesting target !

Bioinfo.pl provides these predictions:

Page 25: Lessons from CASP targets ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry,

T0467: is bioinfo.pl correct ?

Page 26: Lessons from CASP targets ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry,

T0467 OB-fold C-terminal fragment:2k5q model 1 residues 64-97

Sso7d SH3-fold C-terminal fragment:2bf4 chain A residues 30-64

You can say so (if you want)

Page 27: Lessons from CASP targets ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry,

However, only local prediction is correct: extending it to cover the domain results in

a wrong fold prediction !

T0467 OB-fold: 2k5q model 1 residues 7-97

Sso7d SH3-fold: 2bf4 chain A

Page 28: Lessons from CASP targets ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry,

1. New folds: 397_1, 496_1;

2. A few known folds are predicted no better than new folds: 460, 407_2;

3. Short motif recognition = success: 465;

4. Short motif recognition = failure: 467;

5. Structural changes not predicted: 510;

6. Inspect your alignments carefully: 480

Page 29: Lessons from CASP targets ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry,

T0510: “server only” target with a twist

 

Cartoon diagram of 510 domains: 3doa, N-, middle and C-domains are shown in blue, green and red,

respectively.

Cartoon diagram of MutM domains: 1ee8_A, N-, middle and C-domains are shown in blue, green and red,

respectively.

Page 30: Lessons from CASP targets ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry,

Closer look at the N-domains reveals large topological differences

 

N-domain of 510: 3doa residues 1-165N-domain of MutM: 1ee8 chain A

residues 1-121insertion close to the N-terminus is red insertion in the middle of the domain is blue

Page 31: Lessons from CASP targets ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry,

N-domains are nevertheless homologous

Page 32: Lessons from CASP targets ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry,

1. New folds: 397_1, 496_1;

2. A few known folds are predicted no better than new folds: 460, 407_2;

3. Short motif recognition = success: 465;

4. Short motif recognition = failure: 467;

5. Structural changes not predicted: 510;

6. Inspect your alignments carefully: 480

Page 33: Lessons from CASP targets ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry,

T0480: easy alignment with templates

Page 34: Lessons from CASP targets ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry,

NADH pyrophosphatase intervening domain 1vk6: residues 94-127

Ribbon diagram of 480: 2k4x model 1 residues 17-50. Zinc ion is shown in magenta and side chains of its ligands (four Cys) are displayed.

T0480: most predictions had an error

480 MULTICOM-CLUSTER TS1

Page 35: Lessons from CASP targets ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry,

Jumping through 20 NMR models of 2k4x

Ribbon diagram of 480: 2k4x model 1 residues 17-50. Zinc ion is shown in magenta and side chains of its ligands (four Cys) are displayed.

T0480: unusual bulge

Page 36: Lessons from CASP targets ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry,

T0480: bulge could have been predicted

Page 37: Lessons from CASP targets ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry,

Summary:

1. New folds constitute less than 2% of newly solved non-redundant structures.

2. Many known folds cannot be predicted because templates are impossible to find.

3. Globalization of correct local alignment may or may not yield correct fold prediction.

4. Large structural changes happen in protein cores.

5. Careful inspection of alignments may solve some modeling problems.

Page 38: Lessons from CASP targets ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry,

Acknowledgement

Our group Collaborators

HHMI, NIH, UTSW,The Welch Foundation

Shuoyong Shi Jing TongRuslan Sadreyev Lisa KinchJimin Pei Ming TangSasha Safronova Yuan QiHua Cheng Jamie WrablIndraneel Majumdar Erik NelsonYong Wang S. Sri KrishnaBong-Hyun Kim Dorothee Staber

David Baker U. WashingtonKimmen Sjölander UC BerkeleyWilliam Noble U. Washington