mapping protein-protein interactions

31
Mapping Protein-Protein Interactions MEDG 505 (Genome Analysis) 13 January 2005 •Morin: -Overview -IP-MS -Data integration •Student presentations: -Y2H interactions -RNA vs Protein expression analysis •Discussion: -Lessons -Application

Upload: sana

Post on 12-Jan-2016

51 views

Category:

Documents


5 download

DESCRIPTION

Mapping Protein-Protein Interactions. MEDG 505 (Genome Analysis) 13 January 2005. •Morin: -Overview -IP-MS -Data integration •Student presentations: -Y2H interactions -RNA vs Protein expression analysis •Discussion: -Lessons -Application. Central Dogma. - PowerPoint PPT Presentation

TRANSCRIPT

Mapping Protein-Protein Interactions

MEDG 505 (Genome Analysis)13 January 2005

•Morin: -Overview-IP-MS-Data integration

•Student presentations:-Y2H interactions-RNA vs Protein expression analysis

•Discussion:-Lessons-Application

Central Dogma

DNA RNA Protein Function

Humans:- ~25,000 genes- 25-40% with functional annotations

General Goal: Annotation of proteome -Identify disease related proteins-Identify therapeutic targets

How identify protein functions?

Protein Function

General purpose of proteins is to interaction with other molecules-Enzyme/substrate-Protein/protein

Cellular processes governed by complex networks of interacting proteins-Determination of protein-protein interactions infers functional hypotheses

Protein Annotation

-verifies biological role-translation to humans

problematic-differences in biology cloud

interpretation

-can verify biological role-binary interactions-often protein fragments-high false positives-extensively employed

-comprehensive and HTP-mRNAs infer proteome-identifies expression changes-silent to PTMs-cause and effect difficult to infer -interactions difficult to predict

Large Scale Methods for annotation of protein function:-Genetic

-Mutational analysis in model organisms-Yeast 2-hybrid

-Genomic-mRNA profiling

-Biochemical-MS analysis of purified protein complexes

-identifies interactions directly-yields higher order interactions -identifies PTMs-binding affinity can be employed-technically challengingLesson: All methods need to be employed to fully annotate proteome.

IP-MS

Immunoprecipitation -

Mass Spectrometry

Immunoprecipitate Interaction Partners

Protein identification

Excisebands

LC-MS/MS fragmentation

Gelseparation

Tagged Protein Structure

ORF FLAGloxlox

C-tagged construct     

N-tagged construct FLAG  lox ORFlox 

CMV

CMV

Properties of Immunoprecipitated Protein Complexes

Types of interacting proteins• Background binding to bait/matrix/MS (filter?)• Proteins from throughout lifespan • Processing/transport/degradation proteins (filter?)• Weak affinity (less reproducible?)• Strong affinity• Primary interactors• Secondary interactors• High data volume

Experimental design and analysis should be designed for expectations

Methodology for evaluation1-Experimental validation2-Bioinformatic evaluation3-Experimental reproducibility

-transfection/IP protocols

Method Characterization

Characterization Project

1- 49 Baits, from diverse protein families-tag both N and C termini

2- IP-MS, repeat 4+ times

3- 190 preys-hit:

-observed 2+ times-frequency less than 5%

4- Analyze

N- & C-Tag Hit Overlap

# hitsseen with N only 110 0.68seen with C only 29 0.18seen in both N&C 15 0.09seen when N+C are combined 8 0.05total 162

% of total hits

Lessons:1) 5 Hits per Bait.2) N-tags interfere less than C-tags.3) Both tags needed to get good representation.

0.770.27C-tag only experiment

N-tag only experiment

Fraction of total hits observed

Sample33 Baits

Prey Reproducibility

Sample42 Baits190 Preys

Note: ~50% of C-tags have 1.0 rate.

Lesson: Improve immunoprecipitation conditions.

Question: How many trials to see a prey 2 times?

Observed Reproducibility Rate

0.01 0.02 0.

07

0.39

0.01 0.

04

0.17

0.31

0.00

0.10

0.20

0.30

0.40

0.50

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Reproducibility Rate

Fra

ctio

n o

f H

its

N

Average

C

Number of Trials Needed to Observe Prey 2+ Times

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Reproducibility Rate

Fra

cti

on

of

Hit

Po

ol

2

3

4

5

6

# of trials

Note:•If hit = 3+ times then probability = 0.125

Planning Trial Size

2 3 4 5 60 0.00 0.00 0.00 0.00 0.00

0.1 0.01 0.03 0.05 0.08 0.110.2 0.04 0.10 0.18 0.26 0.340.3 0.09 0.22 0.35 0.47 0.580.4 0.16 0.35 0.52 0.66 0.770.5 0.25 0.50 0.69 0.81 0.890.6 0.36 0.65 0.82 0.91 0.960.7 0.49 0.78 0.92 0.97 0.990.8 0.64 0.90 0.97 0.99 1.000.9 0.81 0.97 1.00 1.00 1.00

1 1.00 1.00 1.00 1.00 1.00

Reproducibility Rate

Theoretical Probability of 2+ observations in X # of trials

H H

H T

T H

T T

H

H

T

T

H

T

H

T

H

T

H

T

H

H

T

T

H

T

H

T

T

H

T

H

Rate = 0.5

2 trials 3 trials

p: prey observation frequency

n: number of trials

k: number of observations required (2)

n

k

knkobs pp

knk

nP ))1)(((

)!(!

!)(

Binomial distribution equation

2 3 4 5 60 0.00 0.00 0.00 0.00 0.00 0.00

0.1 0.00 0.00 0.00 0.00 0.00 0.000.2 0.01 0.00 0.00 0.00 0.00 0.000.3 0.02 0.00 0.00 0.01 0.01 0.010.4 0.07 0.01 0.02 0.03 0.04 0.050.5 0.39 0.10 0.19 0.27 0.31 0.340.6 0.01 0.00 0.01 0.01 0.01 0.010.7 0.04 0.02 0.03 0.04 0.04 0.040.8 0.17 0.11 0.15 0.16 0.17 0.170.9 0.00 0.00 0.00 0.00 0.00 0.00

1 0.31 0.31 0.31 0.31 0.31 0.311.00 0.55 0.72 0.83 0.89 0.93

Fraction of Prey

Pool

Predicted Fraction of Observed Prey Pool Found in X # of trialsReproducibility

Rate

Lessons:•Identifies suspect data•Improving reproducibility rate

reduces number of trials needed.

Lessons:•Identifies suspect data•Improving reproducibility rate

reduces number of trials needed.

False Negative Rate

Predicted False Negative Rate

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Reproducibility Rate

Fra

ctio

n o

f H

it P

oo

l

2

3

4

5

6

# of trials

2 3 4 5 60 1.00 1.00 1.00 1.00 1.00

0.1 0.99 0.97 0.95 0.92 0.890.2 0.96 0.90 0.82 0.74 0.660.3 0.91 0.78 0.65 0.53 0.420.4 0.84 0.65 0.48 0.34 0.230.5 0.75 0.50 0.31 0.19 0.110.6 0.64 0.35 0.18 0.09 0.040.7 0.51 0.22 0.08 0.03 0.010.8 0.36 0.10 0.03 0.01 0.000.9 0.19 0.03 0.00 0.00 0.00

1 0.00 0.00 0.00 0.00 0.00

Reproducibility Rate

Theoretical Probability of NOT Observing 2+ in X # of trials

2 3 4 5 60 0.00 0.00 0.00 0.00 0.00 0.00

0.1 0.00 0.00 0.00 0.00 0.00 0.000.2 0.01 0.00 0.00 0.00 0.00 0.000.3 0.02 0.01 0.01 0.01 0.01 0.010.4 0.07 0.05 0.04 0.03 0.02 0.010.5 0.39 0.29 0.19 0.12 0.07 0.040.6 0.01 0.01 0.00 0.00 0.00 0.000.7 0.04 0.02 0.01 0.00 0.00 0.000.8 0.17 0.06 0.02 0.01 0.00 0.000.9 0.00 0.00 0.00 0.00 0.00 0.00

1 0.31 0.00 0.00 0.00 0.00 0.001.00 0.45 0.28 0.17 0.11 0.07

Fraction of Prey

Pool

Predicted Fraction of Prey Pool NOT Found in X # of trialsReproducibility

Rate

Lesson:•1 or 2 trials provides highly

incomplete dataset.

Predicted False Positive Rate Predicted False Postive Rate vs. Database

Frequency

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

2

3

4

5

6

# of trials

PathMap (global) observation frequency

Fals

e po

siti

ve f

requ

ency

Method-determine prey frequency in

database-Assume background proteins have

a uniform random distribution-Assume background does not

change with time or experimental conditions

-Compare prey frequency to predicted observation rate

)()((cutoff)E

))1)((()!(!

!)(

0

ivefalseposit

2

pNumhitsprisk

ppknk

nprisk

cutoffp

p

nm

k

knk

p: prey observation frequency

n: number of trials

k: number of observations required (2)

Efalsepositive: expected number of false positives

cutoff: frequency cutoff

Numhits(p): number of hits at each prey observation frequency

5%

< 0.05

“safe” region

Estimated Experimental False Positive Rate

Random Sampling Method -randomly reassign bait labels for each IP for all 49 baits-repeat -obtain 3, 4, and 5 trial sets, 49 baits each, with preys randomly assigned to a bait

(5% database frequency)-assume random distribution (no relation between baits)

# trials numberpercent (n=190)

Calculated number of false postives

3 1 1% < 14 6 3% < 25 7 4% < 3

Observed reproduced hits (false positives)

Results

-false positive rate 2-3X greater than calculated.-non-uniform distribution

Reasons-not independent experiments

-non-random-baits are related

-cross-contamination-equipment contamination

Managing False Positives

1-Control subtraction-empty vector immunoprecipitation-irrelevant protein immunoprecipitation

2-Reproducibility-2+ times

-3-4 biological replicates

3-Database frequency-observation frequency cutoff

4- Prioritization-annotation

5-Validation-reciprocal immunoprecipitation-co-expression

Interaction Network Example

Human Pathway Pilot Project

Contract design:-20 baits, chosen by customer (17 actually provided)-N & C FLAG tags, constructed by MDSP.-Report all observed interactions.

Additional design parameters:-Expressed and immunoprecipitated 4 times each.-Report all interactions classified as hits.

TNF pathway-Proinflammatory cytokine expressed mainly by activated monocytes and macrophages-Highly studied

-Pathway members provide ready availability of baits.-Understanding incomplete, providing opportunity for discovery

-Disease involvement-Tumor progression and killing -Diabetes-Infection-Inflammation

-Pharmaceutical potential-Find protein targets that perform isolated TNF functions without side-effects.

Nucleus

NK CellFunction

???

xxx5

Rab5

EndocytosisRegulation

CS1

NaChannels

SGK

kinase

CyclinCell cycleControl

???

Transcription

SGK Gene

IKAP

RIPK2

CLARP

FADD

PPP1R3

CaspasesJak

Stat

IL10RBTGFr

NF-kB

IKK-1

IKB

TRAF6

NIK

TANK

CD40 KIAAxxx

Src

Ptyr PP

Fas

PPP

ENaC

CS1/Jasmine/19A24 Gene

A20

TNFr

Cell Death

IL10RA

xxx4

xxxA1xxxA8

KIAAxxx

PITSLRE(8)

xxxGP

xxxA' xxxBxxxA9

B-xxx1

Protein Sorting / Targetingxxx23, xxx-SR, xxx3, FLJxxx, xxx3, xxx4

DNA repair/Damagexxx14, xxx2

OthersxxxL1, xxxC1, FLJxxx,FLJxxx, xxx1, MGCxxx,

KIAAxxx,FLJxxx, xxxA11

xxxA13

xxx12

xxx1A

FLJxxx

CDC2

Xxxx

XRCC7

xxxL1

xxx7

xxx

xxxB12

xxx

xxxA3 xxx8

xxx1

xxx1D

Gxxx

FLJxxx

xxxA1

KIAAxxx

14-3-3

14-3-3 14-3-3

14-3-3

14-3-3

xxx8

xxx37xxx1-L

xxxCB

xxxF1

xxx15

xxx1

xxx4

TRAF2xxxCATBK1

TRAF3

3-xxxxxx1

xxxxxx1

xxx-99

GYS1

PP1CB

xxx

xxx130xxxL1xxxG4

xxx4

xxx14

xxx11

Transcriptional Regulation

xxx13xxx19

Protein Transport

xxx4, xxxA, xxxE, xxxG1, xxxG2, xxx4

xxxB

Transcription

Transcription

TNFa Bait Protein

Other TNFa Pathway protein

Prey protein

Interactions with Bait protein

Activation

Inhibition

Causal (indirect) interactions

with PreysTNFα Pathway: Inflammation/Cancer

- 17 Baits- Both N & C tags- 4 Immunoprecipitations

TNF Pathway Project Summary

Bait information number commentbaits 17membrane baits 3expressed 14 2 not expected to expressmembrane baits expressed 2baits with interactions 13expressed baits with no TNF context 7

Bait/Prey informationpreys 99known interactions 13new interactions 86baits placed in context 5new bait/prey/bait linkages 4 also observed 1 known linkage

Prey informationenzymes 37

proteins in druggable families 20+protease, GTPase, ATPase, kinase, phosphatase, receptor

proteins with no function 13 6 enzymes, 1 receptorhypothetical proteins 4transmembrane (TM) domain containing proteins 15

7 TM 1 receptor?potential plasma membrane proteins 8 others ER or mitochondrial

Potential antibody targets

Integrating Proteomic and Genomic Information

Genes Regulating Cell Growth and Division

Systematic identification of pathways that couple cell growth and division in yeast

 Science 297: 395-400, 2002.

  

Paul Jorgensen Joy L. Nishikawa

Bobby-Joe BreitkreutzMike Tyers

Program in Molecular Biology and CancerSamuel Lunenfeld Research Institute

Mount Sinai HospitalToronto, Ontario, Canada

Genetic Screen for Yeast Size Mutants

lge

lge

Wild typesize profile

whi

whi

Cell volume (fL)10 35 60 85 110

4812strains(~2 yrs)

sfp1

GALSFP1WT

GAL genes (10)

Nucleotide biosynthesis (12)

tRNA synthetases (6)

ribosome biogenesis (21)

RNA Polymerases I and III (10)

nucleolus (29)

Translation initiationand elongation (17)

Ribosomal protein genes (136)

SFP1

5

31.5-1.5-3

-5

scale

SFP1 regulated genes

Yeast Interaction Map

Ho et al. Nature 10:180-3, 2002.FLAG IP > LC-MS/MS

-725 bait attempts-493 baits > 1578 preys-646 unannotated preys

Protein interactions

Overlap of Genetic, Expression & Interaction Data

Common mRNA regulation

Genetic interaction

NucleolarNetwork

Gene Regulation in Breast Cancer

98 breast tumors x 25000 genes

430 2312460

van’t Veer et al. (2002) Nature 415, 530-6.

“genes that are overexpressed in tumors with a poor prognosis profile are potential targets for the rational development of new cancer drugs”

Proteins in the functional pathway of disease associated genes may identify additional or better therapeutic targets.

Overlap of PathMap and Breast Cancer Genes

van’t Veer et al. (2002) Nature 415, 530-6.

reporter Rosettaenz enz

ER 2460 194 8% 42 515 87BRCA1 430 27 6% 7 208 38Prognostic 231 28 12% 9 27 4

primary secondaryMDSP

Protein Networks in Prognosis Reporters

up regulated

down regulated

only

+ 55

35

4

16

enzyme

Interaction network provides contextInteraction network provides context

Integrated Genomic/Proteomic Breast Cancer Project

van’t Veer et al. (2002) Nature 415, 530-6.

reporter # of genes

ER 2460BRCA1 430Prognostic 231

•Profile gene expression changes during tumor progression •Assemble experimental gene set

-genes with expression changes-genes suspect for breast cancer progression

•Perform IP-MS to determine interacting proteins•Analyze for regulatory networks and critical pathways