identification, annotation and visualisation of extreme changes in splicing with switchseq
DESCRIPTION
Talk for the ECCB'14 workshop: Analysis of differential isoform usage by RNA-seq: statistical methodologies and open software - Strasbourg, 7th September 2014TRANSCRIPT
Identification, annotation and visualisation of extreme changes in splicing with SwitchSeq
Mar Gonzàlez-Porta
Functional Genomics team
Outline
• The extent of transcriptome diversity
• Applications:
• Improving the existing annotation
• Detecting extreme changes in splicing with SwitchSeq
• Evaluating the impact of splicing at the protein level
Outline
• The extent of transcriptome diversity
• Applications:
• Improving the existing annotation
• Detecting extreme changes in splicing with SwitchSeq
• Evaluating the impact of splicing at the protein level
Gonzàlez-Porta, Frankish, Rung, Harrow & Brazma. Genome Biology 14, R70 (2013)
What is the extent of transcriptome diversity?
~95% of human genes have more than one splice form expressed
What are the expression levels for the different transcripts from a given gene in a given sample?
[Pan et al 2008; Wang et al 2008; Djebali and Davis et al 2012 (ENCODE)]
Methodology
• 3 tools for transcript expression estimation MISO, Cufflinks, MMSEQ
• 2 mapping strategies TopHat (genome), Bowtie (transcriptome)
5 cell lines PE (50bp x 2)
GAII
ENCODE Illumina Body Map
16 tissues PE (50bp x 2) HiSeq 2000
2 different datasets: 46 samples
Most protein coding genes express one dominant transcript
TopHat + MISO
Most protein coding genes express one dominant transcript
TopHat + MISO
The evaluation of different methods led to
a consistent outcome
Outline
• The extent of transcriptome diversity
• Applications:
• Improving the existing annotation
• Detecting extreme changes in splicing with SwitchSeq
• Evaluating the impact of splicing at the protein level
Outline
• The extent of transcriptome diversity
• Applications:
• Improving the existing annotation
• Detecting extreme changes in splicing with SwitchSeq
• Evaluating the impact of splicing at the protein level
Major transcripts do not always contain the longest CDS
Major transcripts do not always code for proteins
Outline
• The extent of transcriptome diversity
• Applications:
• Improving the existing annotation
• Detecting extreme changes in splicing with SwitchSeq
• Evaluating the impact of splicing at the protein level
On the concept of switch event
Switch event (SE)
Detecting switch events with SwitchSeq
https://github.com/mgonzalezporta/SwitchSeq
Goal: detect changes in major transcripts across conditions
Detecting switch events with SwitchSeq
INPUT
• Annotation [switchseq –t get_data]
• Transcript expression levels:
- Focus on differentially spliced genes (e.g. MMDIFF, DEXSeq…) recommended
- Any matrix will do
Detecting switch events with SwitchSeq
OUTPUT
• Self-contained html (+txt, JSON)
• High resolution plots
Example use case (I): switch events in cancer
45 matched samples PE (100bp x 2)
HiSeq 2000
The transcriptome is broadly altered in ccRCC ~40% of the expressed genes are differentially spliced (n = 7,842) Big and recurrent changes in splicing are rare ~25% of the differentially spliced genes undergo switch events (n = 3,943)
Context: the CAGEKID project (ICGC), for the genomic, transcriptomic and epigenetic characterisation of kidney cancer (ccRCC).
Scelo*, Riazalhosseini*, Greger* et al. Nature Communications (in press).
Example use case (I): switch events in cancer
PPP2R4
Ensembl: switch between two PC transcripts APPRIS: principal transcript in N, but not in T EMBOSS Needle + UniPDB: <35% protein overlap
Example use case (I): switch events in cancer
SRSF6
Ensembl: switch from PC to NMD APPRIS: principal transcript in N, but not in T
Example use case (II): switch events across human tissues
27 human tissues (171 samples) PE (100bp x 2)
HiSeq 2000 + HiSeq 2500
Fagerberg et al. Molecular & Cellular Proteomics (2013)
Context: E-MTAB-1733 %
diff
eren
tially
splic
ed g
enes
04
812
1 FPKM 5 FPKM 10 FPKM
all SE2-fold dominant SE5-fold dominant SE
Example use case (II): switch events across human tissues
CLTB
Ensembl: switch between two PC transcripts APPRIS: principal transcript in both conditions EMBOSS Needle: 92.1% protein overlap
Outline
• The extent of transcriptome diversity
• Applications:
• Improving the existing annotation
• Detecting extreme changes in splicing with SwitchSeq
• Evaluating the impact of splicing at the protein level
Can changes in splicing be recapitulated at the protein level?
Most of the efforts aimed at detecting proteins from alternatively spliced transcripts, rather than validating changes in splicing
[e.g. Blakeley et al. 2010; Ezkurdia et al. 2012; Leoni et al. 2011]
Context: • Control vs PRPF8* KD Cal51 cells (human) • RNA-seq + MS data
*core spliceosomal factor
Integration of RNA-seq + SWATH-MS data
11/17 switch events could be recapitulated
Outline
• Most protein coding genes express one dominant transcript
• Major transcript predictions can be used to improve the exiting annotation
• SwitchSeq enables the study of extreme changes in splicing
• Splicing changes can be recapitulated at the protein level
Acknowledgements
Thesis commitee John Marioni, Jan Korbel, Simon Tavaré
Everyone Wolfgang Huber, Nicholas Luscombe, Roderic Guigó,
who provided Sushma-Nagaraja Grellscheidand and the Functional Genomics
feedback team
Yansheng Liu
Collaborators
Supervisor Alvis Brazma