development of automatic score to simulate manual assessment for casp fm targets qian cong from...

50
Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

Upload: domenic-atkins

Post on 12-Jan-2016

226 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

Development of automatic score to simulate manual

assessment for CASP FM targets

Qian Cong from Grishin lab

Page 2: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

Tesla: curiosity driven Me: laziness driven

Qian Cong from Grishin lab

Development of automatic score to simulate manual

assessment for CASP FM targets

Page 3: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

Tesla: curiosity driven Me: laziness driven

Qian Cong from Grishin lab

36 targets(whole chain + domain) around 18000 models

Development of automatic score to simulate manual

assessment for CASP FM targets

Page 4: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

Inspiration from expert’s manual analysis

Expert: global features + local featuresLocal feature: secondary structure assignment of each residueGlobal feature: global positions of each Secondary Structure Elements (SSEs) packing and interactions between SSEs

Page 5: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

Inspiration from expert’s manual analysis

Expert: global features + local featuresLocal feature: secondary structure assignment of each residueGlobal feature: global positions of each Secondary Structure Elements (SSEs) packing and interactions between SSEs

Develop a score to “mimic” expert inspection: check each secondary structure element, and inspect their packing and interactions.

Page 6: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

Overview of features get considered

Measurements on single secondary structure element or residue

The global position of each SSEThe length of each SSEThe residue DSSP assignment

Measurements on secondary structure pairs or residue pairs

The angle between SSE pairThe interactions between SSE pairThe residue contact score (used for CASP8)

Page 7: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

Overview of features get considered

Measurements on single secondary structure element or residue

The global position of each SSEThe length of each SSEThe residue DSSP assignment

Measurements on secondary structure pairs or residue pairs

The angle between SSE pairThe interactions between SSE pairThe residue contact score (used for CASP8)

Local features as modulator

Page 8: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

Step 1.1: Get SSE definition and vector set for target

T0531

Page 9: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

Step 1.1: Get SSE definition and vector set for target

I.Majumdar et al. (2005) BMC Bioinformatics

HELIX SER 26 PRO 32 7SHEET GLU 14 CYS 19 6SHEET GLU 19 CYS 24 6SHEET GLU 39 CYS 43 5SHEET GLY 43 SER 48 6SHEET SER 49 CYS 57 9

Type Start End Length

PALSSE

T0531

Page 10: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

Step 1.1: Get SSE definition and vector set for target

I.Majumdar et al. (2005) BMC Bioinformatics

HELIX SER 26 PRO 32 7SHEET GLU 14 CYS 19 6SHEET GLU 19 CYS 24 6SHEET GLU 39 CYS 43 5SHEET GLY 43 SER 48 6SHEET SER 49 CYS 57 9

Type Start End Length

PALSSE

T0531

Page 11: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

Step 1.1: Get SSE definition and vector set for target

I.Majumdar et al. (2005) BMC Bioinformatics

HELIX SER 26 PRO 32 7SHEET GLU 14 CYS 19 6SHEET GLU 19 CYS 24 6SHEET GLU 39 CYS 43 5SHEET GLY 43 SER 48 6SHEET SER 49 CYS 57 9

Type Start End Length

PALSSE

T0531

Page 12: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

Step 1.2: Get the interacting residue pairs

Interactions criteria: 1. The shortest distance of central part of two SSEs2. Below 8.5 Å

T0531

Page 13: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

Step 1.2: Get the interacting residue pairs

15, 46 15, 52 17, 42 20, 44 21, 42 21, 55 23, 29 32, 41 32, 54 42, 52 45, 55

Interactions criteria: 1. The shortest distance of central part of two SSEs2. Below 8.5 Å

T0531

Page 14: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

Step 1.2: Get the interacting residue pairs

15, 46 15, 52 17, 42 20, 44 21, 42 21, 55 23, 29 32, 41 32, 54 42, 52 45, 55

Interactions criteria: 1. The shortest distance of central part of two SSEs2. Below 8.5 Å

T0531

Page 15: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

Step 1.2: Get the interacting residue pairs

15, 46 15, 52 17, 42 20, 44 21, 42 21, 55 23, 29 32, 41 32, 54 42, 52 45, 55

Interactions criteria: 1. The shortest distance of central part of two SSEs2. Below 8.5 Å

T0531

Page 16: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

Step 2: Simplify models into vectors and key points

The SSE definition and interacting residue pair definition are propagated to models, and thus models are simplified as a set of vectors and point pairs too.

Page 17: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

Step 2: Simplify models into vectors and key points

TS490_2

TS399_4

The SSE definition and interacting residue pair definition are propagated to models, and thus models are simplified as a set of vectors and point pairs too.

Page 18: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

Step 2: Simplify models into vectors and key points

TS490_2

TS399_4

The SSE definition and interacting residue pair definition are propagated to models, and thus models are simplified as a set of vectors and point pairs too.

Page 19: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

What should we look at?

Target A good modelA “bad” model

Page 20: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

Definition of global position: the distance between the geometry center of SSE and the center of the whole protein

Step 3.1: compare the global position of SSE vectors

Page 21: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

Definition of global position: the distance between the geometry center of SSE and the center of the whole protein

Step 3.1: compare the global position of SSE vectors

Page 22: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

Definition of global position: the distance between the geometry center of SSE and the center of the whole protein

Step 3.1: compare the global position of SSE vectors

Page 23: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

Definition of global position: the distance between the geometry center of SSE and the center of the whole protein

Step 3.1: compare the global position of SSE vectors

Page 24: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

Definition of global position: the distance between the geometry center of SSE and the center of the whole protein

Step 3.1: compare the global position of SSE vectors

2))(5.0

)()((1

1)(

RP

RPMPis

i

iiPosition

ii

iPositioni

Position w

iswS

)(*

Page 25: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

Assumption: wrong SS prediction and improper break of SSE will result in difference from target in vector length

Step 3.2: compare the length of SSE vectors

Page 26: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

Assumption: wrong SS prediction and improper break of SSE will result in difference from target in vector length

Step 3.2: compare the length of SSE vectors

Page 27: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

Assumption: wrong SS prediction and improper break of SSE will result in difference from target in vector length

Step 3.2: compare the length of SSE vectors

2

i

iiLength

)0.25(R)L(R)L(M)L

(1

1(i)s

ii

iLengthi

Length w

iswS

)(*

Page 28: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

Assumption: wrong SS prediction and improper break of SSE will result in difference from target in vector length

Step 3.2: compare the length of SSE vectors

2

i

iiLength

)0.25(R)L(R)L(M)L

(1

1(i)s

ii

iLengthi

Length w

iswS

)(*

Too broad a measurement for secondary structure quality measurement

Page 29: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

Percent agreement of DSSP assignment reflects the detailed quality of secondary structures

Step 3.3: compare DSSP assignment

Total

CorrectS DSSP

Page 30: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

Step 3.4: compare the angle between SSE vector pairs

2,, )7.0

)()((1

1),(

RMjis

jijiAngle

jiji

jiAngleji

Angle w

jisw

S

,,

,, ),(*

Page 31: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

Step 3.5: compare the interactions between SSE pairs

2

i

iiLength

)0.25(R)L(R)L(M)L

(1

1(i)s

ii

iLengthi

Length w

iswS

)(*

Motivation: some key interactions defined the general packing of elements, they should be emphasized more

Page 32: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

C-alpha contact score is added as a modulator for key SSE interaction score

Step 3.6: compare all C-alpha contact score

Define all alpha contact at a cut off of 8.44 Å, similar program is proved to be good measurement by CASP8 assessors

2ii )0.2

(R)D(M)D(

Contact 2(i)s

N

isS i

Length

Contact

)(

Shuoyong Shi, Jimin Pei, Ruslan I. Sadreyev, Lisa N. Kinch, Indraneel Majumdar, Jing Tong, Hua Cheng, Bong-Hyun Kim, Nick V. Grishin. Analysis of CASP8 targets, predictions and assessment methods. Database: The Journal of Biological Database and Curation (2009).

Page 33: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

Let’s sum up all the scores

ContactnInteractioAngleDsspLengthPosition SSSSSS

Page 34: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

Let’s sum up all the scores

ContactnInteractioAngleDsspLengthPosition SSSSSS

superimposition independentglobal and local comparisonmanual analysis simulating scoreSIGLACMASS ?

Page 35: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

Let’s sum up all the scores

ContactnInteractioAngleDsspLengthPosition SSSSSS

superimposition independentglobal and local comparisonmanual analysis simulating scoreSIGLACMASS ?

Qian Cong score = QCS

Page 36: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

Let’s sum up all the scores

ContactnInteractioAngleDsspLengthPosition SSSSSS

superimposition independentglobal and local comparisonmanual analysis simulating scoreSIGLACMASS ?

Qian Cong score = QCS = Quality control score

Page 37: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

Global view: Correlations

Page 38: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

Global view: Correlations

ContactnInteractioAngleDsspLengthPosition SSSSSS

Optimization on weight??? Correlation improve 2%

Page 39: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

Global view: Correlations

Page 40: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

Most cases, the top models selected by GDT generally agree with top models selected by QCS and manual assessment

Go to individual: Top picks

Page 41: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

Most cases, the top models selected by GDT generally agree with top models selected by QCS and manual assessment

There are cases where QCS reveals features we like

Go to individual: Top picks

Page 42: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

324_5

382_5

Example where QCS reveals better model

TS324_5QCS: 67.4GDT: 39.4

TS382_5QCS: 80.7GDT: 30.9 Target 561

Page 43: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

324_5

382_5

T0561

QCS reveals good global topology

TS324_5QCS: 67.4GDT: 39.4

Page 44: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

324_5

382_5

T0561

QCS reveals good global topology

TS324_5QCS: 67.4GDT: 39.4

Page 45: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

324_5

382_5

T0561

QCS reveals good global topology

TS324_5QCS: 67.4GDT: 39.4

Page 46: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

324_5

QCS reveals good global topology

T0561

TS382_5QCS: 80.7GDT: 30.9

Page 47: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

QCS reveals model with good interactions

Page 48: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

QCS reveals model with good interactions

Get all 3 key SSE interactions

Page 49: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

Get only 1 key SSE interactions

QCS reveals model with good interactions

Get all 3 key SSE interactions

Page 50: Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

Being lazy cannot be a final solution

To assess CASP need a lot of diligent, efficient and smart and careful analysis in a large scale