recent advances in understanding of protein sub-cellular

47
© Paul Horton 2013 Recent advances in understanding of protein sub-cellular localisation signals Paul Horton, Computational Biology Research Center, AIST, Japan

Upload: others

Post on 23-Nov-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

© Paul Horton 2013

Recent advances in understanding of protein sub-cellular localisation signals

Paul Horton, Computational Biology Research Center, AIST, Japan

© Paul Horton 2013

TalkOutline

• Brief Summary of Protein Sub-cellular Localization

• Brief Discussion of Prediction and Causality

• Nuclear Localization Signals

• Mitochondrial Localization Signals

– My group is working on:• Matrix Targeting Signal Prediction

• Prediction of cleavage sites by mitochondrial peptidases (3 types)

• mRNA localization and co-translational translocation of mitochondrial proteins

– Saved for another 45 minute talk• I'll be around this week. Ask me if you are

interested!

© Paul Horton 2013

Joe Cell

© Paul Horton 2013

Motivation

• Aberrant localization has been implicated in many diseases– Will use Treacher Collins Syndrome gene as

example

– Zellweger Syndrome

– ...

• Co-localization can help validate protein-protein interaction and other “omic” data– co-localization is necessary condition for

biologically relevant interaction

© Paul Horton 2013

Why Predict?• Large scale data is available for some

organisms or organelles– yeast: Huh et al., Nature, 2003; Kumar et al.,

Genes & Dev., 2002,

• but still appear to contain many artifacts, effect of tags, expression levels, etc.– Independent error estimates of around 20%, Nair

& Rost, JMB, 2005, Heazlewood et al., Plant Cell, 2004.

• Most organisms have little or no direct experimental evidence for protein localization

© Paul Horton 2013

Why not use sequence similarity alone?

• Orthologous proteins generally would be expected to have the same localization

• Degree of sequence similarity needed to give high confidence of co-localization is much higher than that needed for high confidence of similar 3D structure, Nair & Rost, Protein Science 2002.

• Situation is complicated because isoforms of the same protein may have different localization, Nakao et al. NAR, 2005.

© Paul Horton 2013

Protein Subcellular Localization(aka Protein Sorting) in Eukaryotes

 Nuclear encoded proteins are produced in the cytosol and generally require specific mechanisms to pass across membranes

The amino acid sequence of proteins contains much of thesignal information which determines localization and alsomuch non-causal information which correlates strongly withlocalization site in natural proteins

Translocation across membranes usually requires energy:GTP, ATP, proton gradient etc.

© Paul Horton 2013

Organelles and Function• cytosol: protein synthesis,...

• endoplasmic reticulum: membrane protein insertion, protein glycosylation, calcium sequestration

• Golgi body: modification (phosphorylation, removal/addition of sugar groups) of proteins and lipids

• mitochondria: aerobic respiration

• chloroplasts: photosyntheses,...

• lysosome: low PH hydrolysis

• peroxisime: fatty acid decomposition

• nucleus: transcription, handling of chromosomes...

© Paul Horton 2013

Sorting Signals Often Compared to Postal Address

Interleukin 24 (IL-24)E.R., Golgi, Vesicle,Extracellular Space

ER-Bound RibosomeDo not return to sender!

Interleukin 24 (IL-24)E.R., Golgi, Vesicle,Extracellular Space

ER-Bound RibosomeDo not return to sender!

© Paul Horton 2013

Protein trafficking pathways

By Christine [email protected]

darwin.bio.uci.edu/~bardwell/231B_2006_Suetterlin_Lec1.ppt

© Paul Horton 2013

Representative N-terminal Sorting Signals

• bacterial “signal peptide” exports proteins out of the cytoplasm.

• Eukaryotic “signal peptide”, N-terminal signal with variable length hydrophobic section, causes proteins to be co-translationally transported through (or into) the E.R. membrane

• Mitochondrial Targeting Sequence, roughly similar but often longer and somewhat less hydrophobic, can form a helix with +-charge on one side

• Chloroplast Targeting Sequence

© Paul Horton 2013

N-terminal signals cont.• Signal peptide and related sorting signals

all involve membrane translocation/insertion– signals and receptors homologous to each

other

• Cytosol --> E.R.– co-translational

• Cytosol --> Mito or Cholo– either co- or post-translation. Unfolded (by

chaperones).

© Paul Horton 2013

N-terminal signals largely independent of carrier protein

• Numerous experiments show signal peptides are generally interchangeable between different proteins

• Often cleaved

• Limited to first ~90 residues

• Cleavage, occurrence on the N-terminal, and co-translational recognition make signal peptides largely orthogonal to the rest of the protein– but not perfect separation of a postal address

© Paul Horton 2013

Sequence Logo for Eukaryotic Signal Peptide

http://clc.bio.com, I think from SignalP paper, Nielsen et al.

© Paul Horton 2013

C-terminal Sorting Signals

• KDEL (soluble) or KKXX (membrane protein) signal for E.R. retention

• SKL for peroxisomal targeting (soluble)

• NPIR vacuole

• LPXTG bacterial cell wall

• Y (or other aromatic residue) for β-barrels of Gram-negative bacterial outer membranes

© Paul Horton 2013

Internal Sorting Signals

• Nuclear Localization signals occur on surface of folded protein (possible after a conformational change) but can be anywhere on the 1D sequence

• There are others...

© Paul Horton 2013

Internal Sorting Signals● Nuclear Localization signals occur on surface of

folded protein (possible after conformational change) but can be anywhere in the sequence.● There are others...

Fatty acid bindingprotein with NLS and NES, both closer in 3D thanin 1D

© Paul Horton 2013

Two kinds of Correlation• Non-causal

– Good for predictions of naturally occurring proteins

– Easy to obtain from localization site labeled sequences

– Includes much information from seq. similarity and amino acid content

• Causal– Robust even when

applied to artificial proteins

– Difficult for machine learning methods to distinguish from non-causal

– Mutational analysis results may be useful here, but not found in Uniprot...

© Paul Horton 2013

(non)-Causal Correlations Example

NLS Zn Finger

Nuclear Localization DNA Binding

High Evolutionary Fitness

Appropriate Transcription Regulation

© Paul Horton 2013

(non)-Causal Correlations Example Revisited

NLS

Nuclear Localization

DNA Binding

Transcription Regulation

Evolutionary Fitness

DNA Binding Region

NuclearRetention?

often overlap(e.g. Zn Finger)

© Paul Horton 2013

Summary of Causality Discussion

• Surprisingly easy to overlook or confuse causality issues

• Whether heavy reliance on non-causal correlation is okay depends on the application

© Paul Horton 2013

• Brief Summary of Protein Sub-cellular Localization

• Brief Discussion of Prediction and Causality

• Nuclear Localization Signals

– Novel predictor for nuclear export signals

– Work on identifying cargo-carrier and carrier-signal relationships for nuclear import

Talk Outline

© Paul Horton 2013

NES predictor: NESsential“Prediction of leucine-rich nuclear export signal containing proteins with NESsential”Nucleic Acids Research, online, June 24, 2011.http://seq.cbrc.jp/NESsential/Szu-Chin Fu, Kenichiro Imai, Paul Horton

Note to US passport control:This has nothing to do withexporting nuclear weapons!!

Why does it take so longto get a US travel visa...

Rated as“must read”by F1000

"ValidNESs: a database of validated leucine-rich nuclear export signals",Szu-Chin Fu et al.Nucleic Acids Research Jan;41(Database issue):D338-43, 2013.

© Paul Horton 2013

Background on NES's

● We focus on the classical or “leucine-rich Nuclear Export Signal”

● Export signal to move proteins out of the nucleus● Recognized by a protein called CRM1 (exportin 1)● NES found in many viral proteins (e.g. HIV Rev2,

Influenza A NS2 protein) and oncogenes (e.g. P53,BRCA1,Survivin,nucleophosmin)

© Paul Horton 2013

Nucleus

Cytosol

Leucine-rich Nuclear Export Signal (NES)

2-way traffic through the Nuclear Pore Complex

The Exportin-1/CRM1 mediated export pathway

NES

N C10~12-mer

CRM1(Exportin

1)

NES-containing

Protein

Ran GTP

e.g. NES of HIV-1 REV: LPPLERLTL NES of MAPKK: LQKKLEELEL

First proposed consensus patternL-x(2,3)-[LIVFM]-x(2,3)-L-x-[LI]

© Paul Horton 2013

Kevin Nguyen, M. Holloway, R. AlturaInt J Biochem Mol Biol 2012;3:137-51

© Paul Horton 2013

Nucleophosmin & Leukemia-- a motivating example --

Investigate how the acquisition of classical nuclear exportsignals in Nucleophosmin occurs in acute myeloid leukemia.

© Paul Horton 2013

Nucleophosmin & Leukemia-- a motivating example --

● Nucleophosmin is a multifunctional phosphoprotein which normally localizes mainly to the nucleolus

● ~30% of de novo acute myeloid leukemia (AML) carry NPM1 gene mutations that cause aberrant nucleophosmin accumulation in leukemic cell cytoplasm

© Paul Horton 2013

Nucleophosmin localization features(record from ValidNESs)

Weak, wild-type NES's ...WQW...Both W's bind to nucleoli

© Paul Horton 2013

Common Nucleophosmin AML patients creates NES

Falini et al., N Engl J Med, 352:254-66, 2005.

Duplication of 4-basescauses C-terminal frameshift

Deletes W's and createsnew NES signal, e.g. LclaVeeVsL

© Paul Horton 2013

Mechanism of altered localization of mutant nucleophosmin

The de novo NES plays a key rolein the etiology of acute myeloid leukemia!

Bolli et al. Cancer Res. 67:6230-7, 2007.

© Paul Horton 2013

The Exportin-1/CRM1 mediated export pathway-the major export pathway; with a broad range of substrates.

TRENDS in Cell Biology, 15:3 2005.

100+ proteins had been verified (now 200+)…• Which contain leucine-rich NES’s • and are Exported by the Exportin-1/CRM1-mediated export pathway

© Paul Horton 2013

NESbase (la Cour et al. 2003)

Containing 64 NES-containing proteins with experimental data on CRM1(Exportin1) dependency

NESbase has not been updated since 2003!

•Database of NES-containing proteins (Nucleic Acids Res 2003, 31:393-396.)

•Web prediction server of NES’s (Protein Engineering Design and Selection 2004, 17:527-536. )

NetNES web

server (la Cour et al. 2004)

• Trained by NES-containing proteins in NESbase• Using a combination of neural networks and hidden Markov models

Tested by only 5 independent NES-containing proteins discovered in 2004!

NetNES server is the only predictor currently available, but license is required for standalone version.

NES had been neglected!

© Paul Horton 2013

Project Goals

● Provide an updated NES dataset● We collected 70 proteins, 85 sites

– Later expanded to 221 proteins, 262 sites (ValidNES)

● Provide an open source predictor which can effectively be used to screen proteomes for promising new NES's

● seq.cbrc.jp/NESsential/

© Paul Horton 2013

6-mer pattern, 154 seqs 7-mer pattern, 114 seqs.

Sequence logos of NES's

Characteristic pattern of hydrophobic residues O..O.O or O...O.Owhere 'O' is a hydrophobic residue [LIVMF], and . is any residue,This pattern is sometimes preceded by another upstream hydrophobic residue.

© Paul Horton 2013

NES site prediction as binary classification problem

Although several exceptions exists, most confirmedNES's match either a 6-mer O..O.O or 7-mer O...O.Oconsensus match, where O  ∈ {L,I,V,F,M} is ahydrophobic residue.

Prediction of NES sites can be formulated as a binary classificationproblem: is a given position in a protein the start of an NES or not.

However the ratio of false to true examples is extremely high, around100:1 even for NES containing examples. And the boundaries of NESsites are not always well defined.

We alleviate those problems by assuming that NES sites always matchthe consensus pattern (at the cost of having no hope to predictexceptional NES's.

© Paul Horton 2013

The region surronding NES's have a tendency to be disordered

6-mer 7-mer

Error barsrepresentstandarderror,not standarddeviation.

Prediction by POODLE-L (Hirose et al. 2007) DISOPRED (Ward et al. 2004)

© Paul Horton 2013

NES's tend to be disordered over a long range (around 100 residues)

6-mer

7-mer

POODLE-Lprediction

© Paul Horton 2013

Some NES's appear disordered

● NES's more likely to be disordered● The distribution for NES's may be bimodal

6-mer 7-mer

POODLE-Lprediction

© Paul Horton 2013

NESsential prediction pipeline

© Paul Horton 2013

NESsential screening results

Top scoringNESsentialproteins oftenare true NES's

Yeast proteins

Both methods trainedon older databaseentries and tested onpost­2003 entries

© Paul Horton 2013

NESsential Conclusions

● True NES sites are significantly more likely to be disordered than spurious matches

● NESsential can be effective to screen proteomes for promising candidate novel NES's

● But even for NESsential, the coverage is quite low● See NAR paper for the bad news...

© Paul Horton 2013

Importins/Exportins (Karyopherins)cargo specificity

● In humans, the 21 importin-β family proteins transport proteins and RNA molecular across the nuclear pore complex

● Why so many kinds of carriers?– Presumably for regulation

● The carrier(s) specific for most cargo proteins has not been clarified

● A small step in this direction● Kimura et al., Molecular & Cellular Proteomics, 2012.

© Paul Horton 2013

Experimental Screen for Transportin cargoes

Using mass spectroscopy, light proteins importedfrom outside the nucleus can be distinguished fromheavy proteins already there.

© Paul Horton 2013

Schematic of NLS types and carriers

For clarity, I outline the results first.

PY-NLS is recognized by Trn

The “BIB-domain like" NLS isrecognized by both Transportin andImportin-β. It hassimilar properties to the classicalNLS recognized by Importin-αin an Importin-α:β heterodimer.

© Paul Horton 2013

We confirmedthe newlyidentified Trncargoes includeboth PY-NLS'sand BIB-domainlike NLS's.

© Paul Horton 2013

• Brief Summary of Protein Sub-cellular Localization

• Brief Discussion of Prediction and Causality

• Nuclear Localization Signals

– Novel predictor for nuclear export signals

– Work on identifying cargo-carrier and carrier-signal relationships for nuclear import

Talk Outline