mining the structural genomics pipeline: …predrag/classes/2006springi...4/28/2006 2:10 am msg...

14
1 1 4/28/2006 2:10 AM 4/28/2006 2:10 AM MSG Pipeline Jingjun Sun MSG Pipeline Jingjun Sun Mining the Structural Genomics Mining the Structural Genomics Pipeline: Identification of Protein Pipeline: Identification of Protein Properties that Affect High Properties that Affect High - - throughput Experimental Analysis throughput Experimental Analysis by by Chern Chern - - Sing Goh Sing Goh , , Ning Lan Ning Lan et al. et al. Presented by Jingjun Sun Presented by Jingjun Sun /14

Upload: vukhue

Post on 10-Apr-2018

218 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Mining the Structural Genomics Pipeline: …predrag/classes/2006springi...4/28/2006 2:10 AM MSG Pipeline Jingjun Sun 1Mining the Structural Genomics Pipeline: Identification of Protein

114/28/2006 2:10 AM4/28/2006 2:10 AM MSG Pipeline Jingjun Sun MSG Pipeline Jingjun Sun

Mining the Structural Genomics Mining the Structural Genomics Pipeline: Identification of Protein Pipeline: Identification of Protein

Properties that Affect HighProperties that Affect High--throughput Experimental Analysisthroughput Experimental Analysis

by by ChernChern--Sing GohSing Goh,, Ning Lan Ning Lan et al.et al.

Presented by Jingjun SunPresented by Jingjun Sun

/14

Page 2: Mining the Structural Genomics Pipeline: …predrag/classes/2006springi...4/28/2006 2:10 AM MSG Pipeline Jingjun Sun 1Mining the Structural Genomics Pipeline: Identification of Protein

224/28/2006 2:10 AM4/28/2006 2:10 AM MSG Pipeline Jingjun Sun MSG Pipeline Jingjun Sun

BackgroundBackgroundMethodsMethodsResults Results ConclusionConclusion

/14

Page 3: Mining the Structural Genomics Pipeline: …predrag/classes/2006springi...4/28/2006 2:10 AM MSG Pipeline Jingjun Sun 1Mining the Structural Genomics Pipeline: Identification of Protein

334/28/2006 2:10 AM4/28/2006 2:10 AM MSG Pipeline Jingjun Sun MSG Pipeline Jingjun Sun

Structural genomics generateStructural genomics generatess unique unique datasets in terms of datasets in terms of proteinprotein physical and physical and chemical properties chemical properties discover significant protein features that discover significant protein features that influence a protein being determined influence a protein being determined structurallystructurallyidentify potential bottlenecks in various identify potential bottlenecks in various stages of the structural genomics pipelinestages of the structural genomics pipeline

/14

Page 4: Mining the Structural Genomics Pipeline: …predrag/classes/2006springi...4/28/2006 2:10 AM MSG Pipeline Jingjun Sun 1Mining the Structural Genomics Pipeline: Identification of Protein

444/28/2006 2:10 AM4/28/2006 2:10 AM MSG Pipeline Jingjun Sun MSG Pipeline Jingjun Sun

decision treesdecision treesconservation, charged residues, conservation, charged residues, hydrophobic patches, binding partners, hydrophobic patches, binding partners, lengthlengthcloning, expression, purification, structural cloning, expression, purification, structural determinationdetermination

/14

Page 5: Mining the Structural Genomics Pipeline: …predrag/classes/2006springi...4/28/2006 2:10 AM MSG Pipeline Jingjun Sun 1Mining the Structural Genomics Pipeline: Identification of Protein

554/28/2006 2:10 AM4/28/2006 2:10 AM MSG Pipeline Jingjun Sun MSG Pipeline Jingjun Sun

Structure vs Structure vs No StructureNo Structure15952 0.59

arg 27267proteins have COGs

total t et= =

COG < 1 [27267]

.00511262 53

DE < 9.7

GAVLI < 31.7

C < 1.8

.002

.006

.03 .01

2432 6

2600 16

8044 249 2579 26

297 0.85arg 350

proteins structure COGstotal structure t et

= =

TargetDB

15655 297/14

Page 6: Mining the Structural Genomics Pipeline: …predrag/classes/2006springi...4/28/2006 2:10 AM MSG Pipeline Jingjun Sun 1Mining the Structural Genomics Pipeline: Identification of Protein

664/28/2006 2:10 AM4/28/2006 2:10 AM MSG Pipeline Jingjun Sun MSG Pipeline Jingjun Sun

FFeature eature Total targetsTotal targets Structure targetsStructure targets

COGsCOGsGAVLI GAVLI 34.6%34.6% 38.4%38.4%HHp_aa p_aa 15 15 77LLength ength 291291 243 243 KR KR 12.5% 12.5% 12.6% 12.6% ………………

15952 0.59arg 27267

proteins have COGstotal t et

= = 297 0.85arg 350

proteins structure COGstotal structure t et

= =

/14

Page 7: Mining the Structural Genomics Pipeline: …predrag/classes/2006springi...4/28/2006 2:10 AM MSG Pipeline Jingjun Sun 1Mining the Structural Genomics Pipeline: Identification of Protein

774/28/2006 2:10 AM4/28/2006 2:10 AM MSG Pipeline Jingjun Sun MSG Pipeline Jingjun Sun

Feature Description Feature Description Total targetsTotal targets SD targetsSD targetsGAVLI Average GAVLI GAVLI Average GAVLI 34.634.6 38.4 38.4

composition(%) composition(%) hhp_aa average No. of p_aa average No. of 15 15 77

hp residueshp residues

within a hp stretchwithin a hp stretch

SEGAVLIWKAAVLIRGAVLI…

NQSEGAVLIEKDWRLIRMS…

20

/14

Page 8: Mining the Structural Genomics Pipeline: …predrag/classes/2006springi...4/28/2006 2:10 AM MSG Pipeline Jingjun Sun 1Mining the Structural Genomics Pipeline: Identification of Protein

884/28/2006 2:10 AM4/28/2006 2:10 AM MSG Pipeline Jingjun Sun MSG Pipeline Jingjun Sun

Expressed vs Expressed vs cloned & not expressedcloned & not expressed

exp 3182 0.57exp 5622

proteins not ressed COGstotal not ressed

= =

COG < 1 [14385]

.582203 3043

any_partners<1

length < 524

.73

.76

.44 .13

979 2785

1816 1776

208 659

416 57exp 5827 0.70

exp 8319proteins ressed COGs

total ressed= =

3182 5827

pI < 5.9

cloned

/14

Page 9: Mining the Structural Genomics Pipeline: …predrag/classes/2006springi...4/28/2006 2:10 AM MSG Pipeline Jingjun Sun 1Mining the Structural Genomics Pipeline: Identification of Protein

994/28/2006 2:10 AM4/28/2006 2:10 AM MSG Pipeline Jingjun Sun MSG Pipeline Jingjun Sun

Purified vs Purified vs expressed & not purifiedexpressed & not purified

0.65proteins not purified COGstotal not purified

=

0.75proteins purified COGstotal purified

=

0.75proteins not structure COGstotal not structure

=

0.85proteins structure COGstotal structure

=

Structure vs Structure vs purified & not structurepurified & not structure

/14

Page 10: Mining the Structural Genomics Pipeline: …predrag/classes/2006springi...4/28/2006 2:10 AM MSG Pipeline Jingjun Sun 1Mining the Structural Genomics Pipeline: Identification of Protein

10104/28/2006 2:10 AM4/28/2006 2:10 AM MSG Pipeline Jingjun Sun MSG Pipeline Jingjun Sun

PPercents that belong to COGsercents that belong to COGs

59% total

(27711)

63% cloned

(14767)

70% expressed

(8587)

85% structures

(370)

54% not cloned (1

2944)

53% not expressed (6

180)

65% not purified (4

472)

75% not structu

res (3745)

COGs

not expressed

cloned

not cloned

expressed

purified

not purified

structure No structure

75% purified

(4115)

/14

Page 11: Mining the Structural Genomics Pipeline: …predrag/classes/2006springi...4/28/2006 2:10 AM MSG Pipeline Jingjun Sun 1Mining the Structural Genomics Pipeline: Identification of Protein

11114/28/2006 2:10 AM4/28/2006 2:10 AM MSG Pipeline Jingjun Sun MSG Pipeline Jingjun Sun

Numbers of Hydrophobic resides Numbers of Hydrophobic resides

15.1 total

(27711)

16.3 cloned

(14767)

11.6 expressed

(8587)

6.6 structures

(370)

13.8 not cloned (1

2944)

22.7 not expressed (6

180)

16.3 not purified (4

472)

6.5 not structu

res (3745)

not expressed

cloned

not cloned

expressed

purified

not purified

structure No structure

6.5 purified

(4115)

/14

Page 12: Mining the Structural Genomics Pipeline: …predrag/classes/2006springi...4/28/2006 2:10 AM MSG Pipeline Jingjun Sun 1Mining the Structural Genomics Pipeline: Identification of Protein

12124/28/2006 2:10 AM4/28/2006 2:10 AM MSG Pipeline Jingjun Sun MSG Pipeline Jingjun Sun

charged residue composition charged residue composition bottleneck at purification bottleneck at purification number of binding partnersnumber of binding partnersbottlenecks at purification and structurebottlenecks at purification and structurelengthlengthbottleneck at expressionbottleneck at expression

/14

Page 13: Mining the Structural Genomics Pipeline: …predrag/classes/2006springi...4/28/2006 2:10 AM MSG Pipeline Jingjun Sun 1Mining the Structural Genomics Pipeline: Identification of Protein

13134/28/2006 2:10 AM4/28/2006 2:10 AM MSG Pipeline Jingjun Sun MSG Pipeline Jingjun Sun

COGsCOGscharged residue composition charged residue composition number of hydrophobic residues in number of hydrophobic residues in hydrophobic stretcheshydrophobic stretchesnumber of binding partners it hasnumber of binding partners it hasLengthLength

nuclear localization signals nuclear localization signals ……/14

Page 14: Mining the Structural Genomics Pipeline: …predrag/classes/2006springi...4/28/2006 2:10 AM MSG Pipeline Jingjun Sun 1Mining the Structural Genomics Pipeline: Identification of Protein

14144/28/2006 2:10 AM4/28/2006 2:10 AM MSG Pipeline Jingjun Sun MSG Pipeline Jingjun Sun

Thanks!Thanks!

/14