identification, annotation and visualisation of extreme changes in splicing with switchseq

26
Identification, annotation and visualisation of extreme changes in splicing with SwitchSeq Mar Gonzàlez-Porta Functional Genomics team

Upload: mar-gonzalez-porta

Post on 22-Nov-2014

164 views

Category:

Science


0 download

DESCRIPTION

Talk for the ECCB'14 workshop: Analysis of differential isoform usage by RNA-seq: statistical methodologies and open software - Strasbourg, 7th September 2014

TRANSCRIPT

Page 1: Identification, annotation and visualisation of extreme changes in splicing with SwitchSeq

Identification, annotation and visualisation of extreme changes in splicing with SwitchSeq

Mar Gonzàlez-Porta

Functional Genomics team

Page 2: Identification, annotation and visualisation of extreme changes in splicing with SwitchSeq

Outline

•  The extent of transcriptome diversity

•  Applications:

•  Improving the existing annotation

•  Detecting extreme changes in splicing with SwitchSeq

•  Evaluating the impact of splicing at the protein level

Page 3: Identification, annotation and visualisation of extreme changes in splicing with SwitchSeq

Outline

•  The extent of transcriptome diversity

•  Applications:

•  Improving the existing annotation

•  Detecting extreme changes in splicing with SwitchSeq

•  Evaluating the impact of splicing at the protein level

Gonzàlez-Porta, Frankish, Rung, Harrow & Brazma. Genome Biology 14, R70 (2013)

Page 4: Identification, annotation and visualisation of extreme changes in splicing with SwitchSeq

What is the extent of transcriptome diversity?

~95% of human genes have more than one splice form expressed

What are the expression levels for the different transcripts from a given gene in a given sample?

[Pan et al 2008; Wang et al 2008; Djebali and Davis et al 2012 (ENCODE)]

Page 5: Identification, annotation and visualisation of extreme changes in splicing with SwitchSeq

Methodology

•  3 tools for transcript expression estimation MISO, Cufflinks, MMSEQ

•  2 mapping strategies TopHat (genome), Bowtie (transcriptome)

5 cell lines PE (50bp x 2)

GAII

ENCODE Illumina Body Map

16 tissues PE (50bp x 2) HiSeq 2000

2 different datasets: 46 samples

Page 6: Identification, annotation and visualisation of extreme changes in splicing with SwitchSeq
Page 7: Identification, annotation and visualisation of extreme changes in splicing with SwitchSeq

Most protein coding genes express one dominant transcript

TopHat + MISO

Page 8: Identification, annotation and visualisation of extreme changes in splicing with SwitchSeq

Most protein coding genes express one dominant transcript

TopHat + MISO

The evaluation of different methods led to

a consistent outcome

Page 9: Identification, annotation and visualisation of extreme changes in splicing with SwitchSeq

Outline

•  The extent of transcriptome diversity

•  Applications:

•  Improving the existing annotation

•  Detecting extreme changes in splicing with SwitchSeq

•  Evaluating the impact of splicing at the protein level

Page 10: Identification, annotation and visualisation of extreme changes in splicing with SwitchSeq

Outline

•  The extent of transcriptome diversity

•  Applications:

•  Improving the existing annotation

•  Detecting extreme changes in splicing with SwitchSeq

•  Evaluating the impact of splicing at the protein level

Page 11: Identification, annotation and visualisation of extreme changes in splicing with SwitchSeq

Major transcripts do not always contain the longest CDS

Page 12: Identification, annotation and visualisation of extreme changes in splicing with SwitchSeq

Major transcripts do not always code for proteins

Page 13: Identification, annotation and visualisation of extreme changes in splicing with SwitchSeq

Outline

•  The extent of transcriptome diversity

•  Applications:

•  Improving the existing annotation

•  Detecting extreme changes in splicing with SwitchSeq

•  Evaluating the impact of splicing at the protein level

Page 14: Identification, annotation and visualisation of extreme changes in splicing with SwitchSeq

On the concept of switch event

Switch event (SE)

Page 15: Identification, annotation and visualisation of extreme changes in splicing with SwitchSeq

Detecting switch events with SwitchSeq

https://github.com/mgonzalezporta/SwitchSeq

Goal: detect changes in major transcripts across conditions

Page 16: Identification, annotation and visualisation of extreme changes in splicing with SwitchSeq

Detecting switch events with SwitchSeq

INPUT

•  Annotation [switchseq  –t  get_data]

•  Transcript expression levels:

-  Focus on differentially spliced genes (e.g. MMDIFF, DEXSeq…) recommended

-  Any matrix will do

Page 17: Identification, annotation and visualisation of extreme changes in splicing with SwitchSeq

Detecting switch events with SwitchSeq

OUTPUT

•  Self-contained html (+txt, JSON)

•  High resolution plots

Page 18: Identification, annotation and visualisation of extreme changes in splicing with SwitchSeq

Example use case (I): switch events in cancer

45 matched samples PE (100bp x 2)

HiSeq 2000

The transcriptome is broadly altered in ccRCC ~40% of the expressed genes are differentially spliced (n = 7,842) Big and recurrent changes in splicing are rare ~25% of the differentially spliced genes undergo switch events (n = 3,943)

Context: the CAGEKID project (ICGC), for the genomic, transcriptomic and epigenetic characterisation of kidney cancer (ccRCC).

Scelo*, Riazalhosseini*, Greger* et al. Nature Communications (in press).

Page 19: Identification, annotation and visualisation of extreme changes in splicing with SwitchSeq

Example use case (I): switch events in cancer

PPP2R4

Ensembl: switch between two PC transcripts APPRIS: principal transcript in N, but not in T EMBOSS Needle + UniPDB: <35% protein overlap

Page 20: Identification, annotation and visualisation of extreme changes in splicing with SwitchSeq

Example use case (I): switch events in cancer

SRSF6

Ensembl: switch from PC to NMD APPRIS: principal transcript in N, but not in T

Page 21: Identification, annotation and visualisation of extreme changes in splicing with SwitchSeq

Example use case (II): switch events across human tissues

27 human tissues (171 samples) PE (100bp x 2)

HiSeq 2000 + HiSeq 2500

Fagerberg et al. Molecular & Cellular Proteomics (2013)

Context: E-MTAB-1733 %

diff

eren

tially

splic

ed g

enes

04

812

1 FPKM 5 FPKM 10 FPKM

all SE2-fold dominant SE5-fold dominant SE

Page 22: Identification, annotation and visualisation of extreme changes in splicing with SwitchSeq

Example use case (II): switch events across human tissues

CLTB

Ensembl: switch between two PC transcripts APPRIS: principal transcript in both conditions EMBOSS Needle: 92.1% protein overlap

Page 23: Identification, annotation and visualisation of extreme changes in splicing with SwitchSeq

Outline

•  The extent of transcriptome diversity

•  Applications:

•  Improving the existing annotation

•  Detecting extreme changes in splicing with SwitchSeq

•  Evaluating the impact of splicing at the protein level

Page 24: Identification, annotation and visualisation of extreme changes in splicing with SwitchSeq

Can changes in splicing be recapitulated at the protein level?

Most of the efforts aimed at detecting proteins from alternatively spliced transcripts, rather than validating changes in splicing

[e.g. Blakeley et al. 2010; Ezkurdia et al. 2012; Leoni et al. 2011]

Context: •  Control vs PRPF8* KD Cal51 cells (human) •  RNA-seq + MS data

*core spliceosomal factor

Integration of RNA-seq + SWATH-MS data

11/17 switch events could be recapitulated

Page 25: Identification, annotation and visualisation of extreme changes in splicing with SwitchSeq

Outline

•  Most protein coding genes express one dominant transcript

•  Major transcript predictions can be used to improve the exiting annotation

•  SwitchSeq enables the study of extreme changes in splicing

•  Splicing changes can be recapitulated at the protein level

Page 26: Identification, annotation and visualisation of extreme changes in splicing with SwitchSeq

Acknowledgements

Thesis commitee John Marioni, Jan Korbel, Simon Tavaré

Everyone Wolfgang Huber, Nicholas Luscombe, Roderic Guigó,

who provided Sushma-Nagaraja Grellscheidand and the Functional Genomics

feedback team

Yansheng Liu

Collaborators

Supervisor Alvis Brazma