localization prediction of transmembrane proteins

17
Localization prediction of transmembrane proteins Stefan Maetschke, Mikael Bodén and Marcus Gallagher The University of Queensland

Upload: ulmer

Post on 14-Jan-2016

80 views

Category:

Documents


0 download

DESCRIPTION

Localization prediction of transmembrane proteins Stefan Maetschke, Mikael Bod én and Marcus Gallagher The University of Queensland. Protein. Membrane. Soluble. Integral. Peripheral. Anchored. Transmembrane.  -barrel.  -helical. Multi-spanning. Single-spanning. Protein classes. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Localization prediction of transmembrane proteins

Localization prediction of transmembrane proteinsStefan Maetschke, Mikael Bodén and Marcus GallagherThe University of Queensland

Page 2: Localization prediction of transmembrane proteins

Maetschke et al, The University of Queensland2

Protein classes

-helical-barrel

TransmembraneAnchored

IntegralPeripheral

Protein

Soluble Membrane

Single-spanningMulti-spanning

Page 3: Localization prediction of transmembrane proteins

Maetschke et al, The University of Queensland3

Transmembrane protein types

N

N

C

C

Type-I Type-II Type-IV(multi-spanning)

Cytosol (inside)

signal peptide

Type-III

NC

Page 4: Localization prediction of transmembrane proteins

Maetschke et al, The University of Queensland4

NucleusMitochondrion

Peroxisome

Lysosome

Endoplasmic Reticulum

Golgi Complex

ERGIC

Endosome

RNARibosome

Eukaryotic cell

Page 5: Localization prediction of transmembrane proteins

Maetschke et al, The University of Queensland5

Secretory and endocytic pathway

Page 6: Localization prediction of transmembrane proteins

Maetschke et al, The University of Queensland6

Problem and hypothesis• Sorting signals for transmembrane proteins serve multiple

purposes (targeting, retention, retrieval, avoidance) and are largely unknown (the problem is challenging/multi-faceted)

• Current localization prediction of eukaryotic transmembrane proteins is poor (models based on soluble proteins are ill-suited) (previous work is inadequate/incomplete)

• Localization prediction for transmembrane proteins is virtually unexplored (paucity/variance of data) (it is an open problem)

• Explicit modelling of protein topology should enhance localization prediction accuracy(parameter tuning receives explicit guidance to biologically sensible solutions) (the way to do it!)

Page 7: Localization prediction of transmembrane proteins

Maetschke et al, The University of Queensland7

Hidden Markov model

ii Sq 1

Inital state probabilities:

)|( 1 itjtij SqSqPaA

State transition probabilities:

a12S1 S2 S3

b1

a23

a11a33

b3b2

a22

)|()( itkti SqVoPkbB

Observation probabilities:

A

R

1

V...

2

20

A

R

1

V

...

2

20

A

R

1

V

...

2

20

s1 s1 s1 s2 s2 s2 s2 s2 s2 s3 State sequence:

Observation sequence:

Page 8: Localization prediction of transmembrane proteins

Maetschke et al, The University of Queensland8

2-order Hidden Markov model

ii Sq 1

Inital state probabilities:

)|( 1 itjtij SqSqPaA

State transition probabilities:

a12S1 S2 S3

b1

a23

a11a33

b3b2

a22

)|()( itkti SqVoPkbB

Observation probabilities:

AA

AR

1

VV

...

2

400

s1 s1 s1 s2 s2 s2 s2 s2 s2 s3 State sequence:

Observation sequence:

AN

AD

3

4

AA

AR

1

VV

...

2

400

AN

AD

3

4

AA

AR

1

VV

...

2

400

AN

AD

3

4

Page 9: Localization prediction of transmembrane proteins

Maetschke et al, The University of Queensland9

3-order Hidden Markov model

ii Sq 1

Inital state probabilities:

)|( 1 itjtij SqSqPaA

State transition probabilities:

a12S1 S2 S3

b1

a23

a11a33

b3b2

a22

)|()( itkti SqVoPkbB

Observation probabilities:

AAA

AAR

1

VVV

...

2

8000

s1 s1 s1 s2 s2 s2 s2 s2 s2 s3 State sequence:

Observation sequence:

AAN

AAD

3

4

AAC

AAQ

5

6

AAA

AAR

1

VVV

...

2

8000

AAN

AAD

3

4

AAC

AAQ

5

6

AAA

AAR

1

VVV

...

2

8000

AAN

AAD

3

4

AAC

AAQ

5

6

Page 10: Localization prediction of transmembrane proteins

Maetschke et al, The University of Queensland10

Signal peptide

cleavage region

hydrophobic coreN-terminal

regionmature protein

Page 11: Localization prediction of transmembrane proteins

Maetschke et al, The University of Queensland11

Transmembrane domain

icap TMD ocap

Page 12: Localization prediction of transmembrane proteins

Maetschke et al, The University of Queensland12

Protein topology model

ocap TMD icap C-termN-termSP outside inside

Page 13: Localization prediction of transmembrane proteins

Maetschke et al, The University of Queensland13

Localization model (5 x topology models)

NucleusMitochondrion

Peroxisome

Lysosome

Endoplasmic Reticulum

Golgi Complex

ERGIC

Endosome

Page 14: Localization prediction of transmembrane proteins

Maetschke et al, The University of Queensland14

LOCATE dataset

Subset LOCATE database FANTOM3, Mouse proteome Filter for transmembrane proteins No multi-targeted proteins Redundancy reduced (<25%) TMDs and SPs are labeled (predicted) High quality localization annotation

873 Plasma Membrane

261 Endoplasmic Reticulum

141 Golgi Complex

45 Lysosome

31 Endosome

1351

Page 15: Localization prediction of transmembrane proteins

Maetschke et al, The University of Queensland15

Prediction performance

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

SVM-1

SVM-2

HMM-1

HMM-2

HMM-3

Prediction Performance (MCC)

LOCATE dataset Mean correlation coefficient 10 fold, 10 times Five locations (ER, PM, GO, EN, LY) SVM: linear kernel 1-, 2- and 3-order HMMs

Confusion Matrix HMM-2

=> Di-peptide composition superior to single amino acid composition

=> Topological model superior to non-topological model

Page 16: Localization prediction of transmembrane proteins

Maetschke et al, The University of Queensland16

Predictor comparison

18

33

48

75

0

10

20

30

40

50

60

70

80

CELLO WolfPSort PAnalyst HMM-2

Prediction accuracy in %

CELLO 2.5: http://cello.life.nctu.edu.tw/WolfPSort: http://wolfpsort.seq.cbrc.jp/ProteomeAnalyst 2.5: http://www.cs.ualberta.ca/~bioinfo/PA/Sub/HMM-2: http://pprowler.itee.uq.edu.au/TMPHMMLoc

Test set (20 PM, 20 ER, 20 Golgi) HMM: only three classes but test set train set Other predictors: more classes but

test set train set

→ difficult to compare!

Page 17: Localization prediction of transmembrane proteins

Maetschke et al, The University of Queensland17

Conclusion

• Novel predictor for subcellular localization of transmembrane proteins along the secretory pathway: http://pprowler.itee.uq.edu.au/TMPHMMLoc

• Protein model has less states than topology predictors (TMHMM, HMMTOP, etc) but is of second order

• Localization model is trained and tested using LOCATE, a recent, high-quality localization dataset

• Overall better performance than current localization predictors (transmembrane proteins, eukaryotic, secretory pathway)– Di-peptide composition superior to single amino acid composition– "Topological" model superior to "non-topological" baseline model