regulation of core splicing factors by alternative ... · splicing factors, as well as a role for...
TRANSCRIPT
-
Regulation of Core Splicing Factors by Alternative Splicing and Nonsense-mediated mRNA Decay
by
Arneet L. Saltzman
A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy
Department of Molecular Genetics
University of Toronto
© Copyright by Arneet L. Saltzman 2011
-
ii
Regulation of Core Splicing Factors by Alternative Splicing and Nonsense-mediated mRNA Decay
Arneet L. Saltzman
Doctor of Philosophy
Department of Molecular Genetics
University of Toronto
2011
Abstract
The majority of human genes are transcribed into a precursor messenger RNA (pre-mRNA) that
is processed to produce multiple mRNA variants through alternative splicing. Although
alternative splicing is known for its role in generating proteomic diversity, it can also regulate
gene expression by introducing premature termination codons that target the spliced transcript
for nonsense-mediated mRNA decay (AS-NMD). In order to understand the impact of AS-NMD
on gene expression, I performed quantitative AS microarray profiling of NMD-inhibited human
cells. Using this system, I address the prevalence, trans-acting factor requirements and the range
of cellular functions regulated by AS-NMD. While this pathway had been implicated in
homeostatic feedback regulation of genes encoding splicing-regulatory proteins, my results
revealed highly conserved alternative exons regulated by AS-NMD in genes encoding basal or
‘core’ splicing factors. I further characterized one of these exons in the gene encoding SmB/B′,
and demonstrated that SmB/B′ autoregulates its expression through AS-NMD. Furthermore, AS
profiling revealed that knockdown of this core splicing factor affects the inclusion levels of
additional alternative exons enriched in genes with functions in RNA processing and RNA
binding. In summary, my results reveal a role for AS-NMD in regulating the expression of core
splicing factors, as well as a role for the core spliceosomal machinery in coordinating a network
of alternative exons in RNA processing factor genes.
-
iii
Acknowledgments
I am grateful to many people for their support during my graduate work. My supervisor and
mentor Dr. Ben Blencowe has never wavered from the full support that he offered me right from
the time I joined the lab as a naïve and “generally keen” student. He fostered my scientific
development by providing me with the opportunity to do exciting research, present at
conferences and publish my work. Ben’s guidance and encouragement have been essential for
my progress as a graduate student and beyond. I would also like to thank my supervisory
committee members, Dr. Howard Lipshitz, Dr. Tim Hughes and Dr. Quaid Morris, whose
support and advice have helped me to develop as a scientist.
For their direct contributions to the work in this thesis, I would like to thank Matthew Fagnani
and my collaborators Dr. Yoon Ki Kim, Dr. Lynne Maquat, Dr. Ofer (“the data are the data”)
Shai and Dr. Brendan Frey. For their indirect contributions, I would like to thank the people
behind the UCSC genome browser and Galaxy, who have made the genome accessible to the
masses. For financial support, I am grateful to NSERC and the Jennifer Dorrington Graduate
Student Endowment Fund.
I’ve had the privilege to work with many great people during my time in the Blencowe lab. I
wish to sincerely thank all my current and former lab-mates, whose friendship, support, helpful
advice, and love of fine beverages have been invaluable. I am particularly indebted to our Lab
“Sages” Mr. Dave O’Hanlon and Dr. Susan McCracken. For supporting me along my path to
graduate school, I also thank my past teachers and mentors, Mr. Flemming Kress, Dr. Shelagh
Mirski, Ms. Kathy Sparks, Dr. Peter Davies and Dr. Igor Bendik.
Of course words cannot express my gratitude to my family and my ‘life partner’ for their
unflappable love and support.
-
iv
Table of Contents
Acknowledgments ........................................................................................................................ iii
Table of Contents ......................................................................................................................... iv
List of Tables .............................................................................................................................. viii
List of Figures............................................................................................................................... ix
List of Appendices........................................................................................................................ xi
Abbreviations Used.................................................................................................................... xiii
Chapter 1 ........................................................................................................................................1
1 Introduction...............................................................................................................................2
1.1 Coordination of the gene expression machinery..................................................................2
1.1.1 Interdependence among transcription, mRNA processing and chromatin ..............2
1.1.2 mRNA processing remodels the messenger RNP....................................................5
1.2 Pre-mRNA splicing..............................................................................................................6
1.2.1 Core and auxiliary splicing signals ..........................................................................6
1.2.2 Spliceosome assembly .............................................................................................7
1.2.3 Exon definition.........................................................................................................9
1.2.4 Spliceosomal snRNPs and Sm proteins .................................................................10
1.3 Regulation of alternative splicing ......................................................................................12
1.3.1 Roles of alternative splicing...................................................................................13
1.3.2 Mechanisms of alternative splicing regulation ......................................................13
1.3.3 Families of alternative splicing regulatory factors.................................................14
1.3.4 Regulation of splice site recognition......................................................................15
1.3.4.1 SR and SR-related proteins .....................................................................15
1.3.4.2 hnRNPs....................................................................................................16
1.3.5 Regulation of splice site pairing and catalysis.......................................................17
1.3.6 Roles of basal splicing factors in alternative splicing regulation ..........................17
1.3.7 Breaking the ‘code’ of cis-acting alternative splicing regulatory sequences.........18
1.3.8 Large-scale analysis of alternative splicing regulation..........................................19
1.3.9 Overview of large-scale alternative splicing detection methods used in this
thesis ......................................................................................................................20
1.3.9.1 Alternative splicing microarray profiling................................................22
1.3.9.2 AS profiling by high throughput RNA sequencing (RNA-Seq) .............22
1.4 Nonsense-mediated mRNA decay (NMD) ........................................................................23
1.4.1 Features targeting transcripts for NMD .................................................................23
1.4.2 NMD trans-acting factors and mechanisms of decay............................................24
1.4.3 Discriminating between normal and premature nonsense codons: integrating
the EJC-dependent and faux-3′UTR models..........................................................26
1.5 Feedback regulation of gene expression ............................................................................28
1.5.1 Post-transcriptional autoregulation ........................................................................29
1.5.1.1 Splicing regulatory factors ......................................................................29
1.5.1.2 Ribosomal proteins, translation factors and other examples ...................31
1.5.2 Roles of post-transcriptional autoregulation..........................................................32
-
v
1.5.2.1 Developmentally-regulated AS programs ...............................................32
1.5.2.2 Plant circadian oscillations ......................................................................32
1.5.2.3 Coordinating gene expression .................................................................33
1.5.3 Sequence and functional conservation...................................................................33
1.6 Rationale and outline .........................................................................................................34
Chapter 2 ......................................................................................................................................36
2 Impact of nonsense-mediated mRNA decay (NMD) factors on alternative splicing
(AS) ...........................................................................................................................................37
2.1 Introduction........................................................................................................................37
2.1.1 Prevalence of AS-NMD.........................................................................................37
2.1.2 Differential requirements for UPF factors in NMD...............................................37
2.1.3 Summary ................................................................................................................38
2.2 Materials and Methods.......................................................................................................39
2.2.1 Cell culture, siRNA and plasmid transfection .......................................................39
2.2.2 RT-PCR assays and Western blotting....................................................................39
2.2.3 Microarray design and hybridization .....................................................................40
2.2.4 Microarray data analysis ........................................................................................40
2.2.5 Annotation of PTC-introducing AS events............................................................40
2.2.6 Categorization of conserved and species-specific alternative exons .....................41
2.3 Results................................................................................................................................41
2.3.1 Predicted PTC-containing splice variants represent minor isoforms across ten
mouse tissues .........................................................................................................41
2.3.2 Most predicted PTC-introducing AS events are not conserved between human
and mouse ..............................................................................................................46
2.3.3 Alternative splicing microarray profiling following knockdown of the
essential NMD factor UPF1 in HeLa cells ............................................................48
2.3.4 A subset of PTC-introducing AS events are regulated by NMD...........................48
2.3.5 Effect of UPF1 knockdown on the expression of genes containing PTC-
introducing AS events............................................................................................50
2.3.6 Alternative splicing microarray profiling following individual knockdowns of
NMD factors UPF1, UPF2 or UPF3X ...................................................................52
2.3.7 Overlapping but distinct effects of UPF1, UPF2 and UPF3X knockdowns on
PTC-introducing AS events ...................................................................................54
2.4 Discussion..........................................................................................................................56
2.4.1 Function versus ‘noise’ in PTC-introducing AS events ........................................56
2.4.2 Alternative branches of the mammalian NMD pathway .......................................56
Chapter 3 ......................................................................................................................................58
3 Conserved AS-NMD in genes encoding core splicing factors .............................................59
3.1 Introduction........................................................................................................................59
3.1.1 Cellular functions regulated by AS-NMD .............................................................59
3.1.2 Summary ................................................................................................................59
3.2 Materials and Methods.......................................................................................................60
3.2.1 RT-PCR and Western blotting ...............................................................................60
3.2.2 Analysis of conservation of flanking intron sequence and conserved AS.............60
3.2.3 Identification of AS events in spliceosomal and control gene sets........................60
-
vi
3.2.4 Statistical Analysis.................................................................................................61
3.3 Results................................................................................................................................61
3.3.1 PTC-introducing AS events affected by UPF knockdowns are flanked by
highly conserved sequences...................................................................................61
3.3.2 Core spliceosomal proteins are new regulatory targets of AS-NMD ....................64
3.3.3 Conserved AS in genes encoding spliceosomal factors enriched in PTC-
introducing events..................................................................................................65
3.3.4 Autoregulation of core splicing factors by AS-NMD............................................69
3.4 Discussion..........................................................................................................................70
3.4.1 AS-NMD and the regulation of core spliceosomal proteins..................................71
Chapter 4 ......................................................................................................................................72
4 Auto-regulation of the core splicing factor SmB/B′ via AS-NMD......................................73
4.1 Introduction........................................................................................................................73
4.1.1 AS-NMD of SNRPB, encoding SmB/B′ ................................................................73
4.1.2 Summary ................................................................................................................73
4.2 Materials and Methods.......................................................................................................74
4.2.1 Cell culture, siRNA and plasmid transfection .......................................................74
4.2.2 Estimation of mRNA half-lives .............................................................................74
4.2.3 RNA and protein isolation, RT-PCR and Western blotting...................................74
4.2.4 Plasmid Construction .............................................................................................75
4.3 Results................................................................................................................................75
4.3.1 Inclusion of a highly conserved premature termination codon (PTC)-
introducing alternative exon in SNRPB pre-mRNA is affected by SmB/B′
protein levels..........................................................................................................75
4.3.2 Knockdown of the core snRNP protein SmD1 affects the inclusion of the
conserved SNRPB alternative exon .......................................................................78
4.3.3 Knockdown of SmB/B′ or SmD1 affects the levels of Sm-class snRNAs ............79
4.3.4 Cis-acting elements regulating inclusion of the SNRPB alternative exon ............80
4.3.5 Mutations that strengthen the 5′ss reduce the effects of SmB/B′ knockdown.......82
4.4 Discussion..........................................................................................................................83
4.4.1 Feedback and cross-regulation of splicing factors.................................................84
Chapter 5 ......................................................................................................................................86
5 Regulation of alternative splicing by the core spliceosomal machinery.............................87
5.1 Introduction........................................................................................................................87
5.1.1 Summary ................................................................................................................87
5.2 Materials and Methods.......................................................................................................88
5.2.1 Analysis of AS and transcript levels by RNA-Seq ................................................88
5.2.2 Calculation of Splice Site Strength ........................................................................89
5.2.3 Gene ontology (GO) analysis.................................................................................89
5.2.4 Statistical Analysis.................................................................................................89
5.3 Results................................................................................................................................90
5.3.1 A widespread role for core splicing factors in promoting the inclusion of
alternative exons ....................................................................................................90
5.3.2 Characteristics of SmB/B′ knockdown-dependent alternative exons ....................95
-
vii
5.3.3 Changes in transcript levels associated with SmB/B′ knockdown-dependent
PTC-introducing alternative exons ........................................................................95
5.3.4 SmB/B′ knockdown affects AS events in RNA-processing factor genes..............97
5.4 Discussion..........................................................................................................................97
5.4.1 Mechanisms of AS regulation by core splicing factors .........................................97
5.4.2 Physiological roles of AS regulation by general splicing factors ..........................98
Chapter 6 ....................................................................................................................................100
6 Conclusions ............................................................................................................................101
6.1 Future Directions .............................................................................................................102
6.1.1 What features underlie the differential dependencies of NMD substrates on
UPF2 and UPF3/UPF3X?....................................................................................102
6.1.2 Mechanisms of core splicing factor-dependent AS regulation ............................102
6.1.3 Origins of ultra- and highly-conserved nonsense exons ......................................103
6.1.4 Networks of auto- and cross-regulation among RNA processing factors............104
References...................................................................................................................................106
Appendices..................................................................................................................................133
-
viii
List of Tables
Table 1-1. Post-transcriptional auto- and cross-regulation of proteins with roles in RNA
biogenesis and metabolism. .............................................................................................. 30
Table 3-1. Selected microarray PTC-introducing AS events in genes with functions related to
RNA processing. ............................................................................................................... 65
Table 3-2. Conserved, PTC-introducing AS events identified in transcripts from spliceosome-
associated proteins. ........................................................................................................... 68
-
ix
List of Figures
Figure 1-1. Coordination of transcription and pre-mRNA processing machineries. ...................... 3
Figure 1-2. Overview of core splicing signals and early stages of spliceosome assembly. ........... 7
Figure 1-3. Outline of microarray and RNA-Seq AS profiling methods used in this work. ........ 21
Figure 1-4. Alternative splicing of cassette-type exons can lead to introduction of a premature
termination codon (PTC) in the included or skipped splice variant (AS-NMD).............. 24
Figure 1-5. An integrated model for discrimination between premature and normal stop codons.
........................................................................................................................................... 28
Figure 1-6. Simplified model for autoregulation of a splicing-regulatory factor through AS-
NMD. ................................................................................................................................ 31
Figure 2-1. Overview of Chapter 2. .............................................................................................. 38
Figure 2-2. Alternative splicing microarray data reveal that predicted PTC-introducing splice
variants represent minor forms across ten mouse tissues. ................................................ 43
Figure 2-3. Representative RT-PCRs of PTC upon inclusion and PTC upon skipping AS events
in ten mouse tissues. ......................................................................................................... 45
Figure 2-4. Predicted PTC-introducing AS events are more often species-specific than conserved
between human and mouse. .............................................................................................. 47
Figure 2-5. Knockdown of the essential NMD factor UPF1 leads to an increase in a subset of
PTC-containing splice variants. ........................................................................................ 49
Figure 2-6. Changes in % exon inclusion and transcript levels upon UPF1 knockdown predicted
by the AS microarray are confirmed by RT-PCR............................................................. 51
Figure 2-7. Overlapping but distinct effects of UPF protein knockdowns on PTC-introducing AS
events. ............................................................................................................................... 53
Figure 2-8. Representative RT-PCR assays showing effects of UPF protein knockdowns on
levels of PTC-introducing alternative exons..................................................................... 55
Figure 3-1. Overview of Chapter 3. .............................................................................................. 59
Figure 3-2. Conservation of intron sequences flanking PTC-introducing exons affected by UPF
factor knockdowns. ........................................................................................................... 62
Figure 3-3. PTC upon inclusion alternative exons that show UPF1- or UPF2-dependent changes
in inclusion level are often flanked by highly conserved intronic sequences................... 63
Figure 3-4. Conserved PTC-introducing AS events in genes encoding spliceosomal proteins.... 67
Figure 3-5. SNRPB (also known as SmB/B’) or SMNDC1 (also known as SPF30) over-
expression leads to increased levels of the respective PTC-containing (PTC+) alternative
transcript. .......................................................................................................................... 70
Figure 4-1. Overview of Chapter 4. .............................................................................................. 73
Figure 4-2. The inclusion of a highly conserved PTC-introducing alternative exon in SNRPB is
affected by SmB/B′ knockdown. ...................................................................................... 77
-
x
Figure 4-3. The half-life of the endogenous SNRPB PTC-containing included splice variant (A)
but not that of the exon-included variant from the SNRPB reporter ‘miniSmB’ (B) is
increased upon treatment with cycloheximide (CHX) to inhibit NMD............................ 78
Figure 4-4. Knockdown of SmD1 leads to more skipping of the SNRPB alternative exon in
miniSmB (A), and knockdown of SmB/B′ (B) or SmD1 (C) affects snRNA levels. ....... 79
Figure 4-5. Auxiliary cis-acting elements regulating inclusion of the SNRPB alternative exon in
miniSmB are proximal to the splice sites. ........................................................................ 81
Figure 4-6. Mutations that strengthen the 5′ss (splice site), but not mutations that strengthen the
3′ss, reduce the effects of SmB/B′ knockdown on miniSmB AS. .................................... 83
Figure 5-1. Overview of Chapter 5. .............................................................................................. 87
Figure 5-2. Quantitative analysis of alternative splicing by RNA-Seq reveals that knockdown of
SmB/B′ leads to increased skipping of alternative exons. ................................................ 91
Figure 5-3. Changes in alternative exon inclusion levels measured by RNA-Seq are confirmed by
RT-PCR assays. ................................................................................................................ 93
Figure 5-4. Confirmation of the effects of SmB/B′ knockdown on alternative exon inclusion in
two independent knockdowns with different siRNAs. ..................................................... 94
Figure 5-5. Characteristics of alternative exons affected by knockdown of SmB/B′. .................. 96
-
xi
List of Appendices
Appendices to Chapter 2: Impact of nonsense-mediated mRNA decay (NMD) factors on
alternative splicing (AS)
Appendix 1. Reprint: Pan Q, Saltzman AL, Kim YK, Misquitta C, Shai O, Maquat LE, Frey BJ,
Blencowe BJ. 2006. Quantitative microarray profiling provides evidence against
widespread coupling of alternative splicing with nonsense-mediated mRNA decay to
control gene expression. Genes Dev 20 (2): 153-158..................................................... 133
Appendix 2. Reprint: Saltzman AL, Kim YK, Pan Q, Fagnani MM, Maquat LE, Blencowe BJ.
2008. Regulation of multiple core spliceosomal proteins by alternative splicing-coupled
nonsense-mediated mRNA decay. Mol Cell Biol 28 (13): 4320-4330. .......................... 133
Appendix 3. Correlation of probe intensities (A) or % exon inclusion (B) between Cy3 and Cy5
fluor reversals for six samples. ....................................................................................... 134
Appendix 4. Correlation of % inclusion between pairs of AS events with duplicate probes on the
AS microarray................................................................................................................. 135
Appendix 5. Correlation between % exon skipping (A) or knockdown-dependent difference in %
exon skipping (B) measurements by AS microarray or RT-PCR................................... 136
Appendix 6. Microarray data for 1704 AS events that met our detection criteria...................... 137
Appendix 7. Annotation for 1704 microarray-monitored AS events that met our detection
criteria. ............................................................................................................................ 137
Appendix 8. Significant overlaps in AS events with a consistent change in exon inclusion levels
when comparing any two UPF KDs. .............................................................................. 138
Appendix 9. Effects of each UPF factor knockdown on PTC-introducing AS events. .............. 139
Appendix 10. Frequency of changes in exon inclusion level upon knockdown of UPF1, UPF2, or
UPF3X for all detectable AS events (A) or for specific categories (B-D). .................... 140
Appendices to Chapter 3: Conserved AS-NMD in genes encoding core splicing factors
Appendix 11. Cumulative distribution function (CDF) plots of flanking intron sequence overlap
with phastCons elements for the ‘No PTC’ group.......................................................... 141
Appendix 12. Annotation for microarray-monitored PTC-introducing AS events with conserved
flanking intron sequences. .............................................................................................. 142
Appendix 13. Annotation of cassette AS events identified in spliceosome-associated genes.... 142
Appendix 14. Annotation of cassette AS events identified in the control gene set. ................... 142
Appendices to Chapter 4: Auto-regulation of the core splicing factor SmB/B′ via AS-NMD
Appendix 15. Reprint: Saltzman AL, Pan Q, Blencowe BJ. 2011. Regulation of alternative
splicing by the core spliceosomal machinery. Genes Dev 25 (4), 373-384.................... 142
Appendix 16. Comparison of SmB/B′ and SmN amino acid sequences (A) and mRNA expression
patterns across 84 tissue and cell types (B)..................................................................... 143
-
xii
Appendix 17. Abrogation of NMD by treatment of HeLa cells with the translation inhibitor
cycloheximide (CHX) leads to an increase in the steady-state level of the endogenous
exon-included PTC-containing SNRPB variant (A), but not the exon-included variant
from the SNRPB reporter ‘miniSmB’ (B). ...................................................................... 144
Appendix 18. A deletion adjacent to the 5′ss that strengthens potential base-pairing to U1 snRNA
abrogates SmB/B′ knockdown-dependent skipping. ...................................................... 145
Appendix 19. Mutations that strengthen the 3′ss do not abrogate SmB/B′ knockdown-dependent
skipping........................................................................................................................... 146
Appendices to Chapter 5: Regulation of alternative splicing by the core spliceosomal
machinery
Appendix 20. Data and annotations for 5752 AS events monitored by RNA-Seq that passed our
filtering criteria. .............................................................................................................. 147
Appendix 21. Data and annotations for 8626 triplets of consecutive 'constitutive' exons
monitored by RNA-Seq that passed our filtering criteria. .............................................. 147
Appendix 22. Gene Ontology (GO) and Pathway Commons enrichment analysis for 235 genes
containing AS events with a ≥30% change in % exon inclusion upon SmB/B'
knockdown...................................................................................................................... 147
Appendix 23. Exon inclusion levels and knockdown-dependent changes for all 27 assayed
alternative exons agree well with RNA-Seq predictions. ............................................... 148
-
xiii
Abbreviations Used
AS alternative splicing
cDNA complementary DNA
CTD carboxy-terminal domain
EJC exon junction complex
EST expressed sequence tag
GO gene ontology
mRNP messenger ribonucleoprotein
NMD nonsense-mediated mRNA decay
NT/siNT non-targeting siRNA
pol II RNA polymerase II
PTC premature termination codon
RNA ribonucleic acid
RNA-Seq high throughput RNA sequencing
RT-PCR reverse transcription-polymerase chain reaction
snRNA small nuclear RNA
snRNP small nuclear ribonucleoprotein
siRNA short interfering RNA
UTR untranslated region
-
1
Chapter 1
-
2
1 Introduction
A major theme of my thesis research is how gene expression can be regulated by coordination
between different steps, particularly alternative splicing (AS) and nonsense-mediated mRNA
decay (NMD). I will therefore begin with an overview of the coordination among different steps
in mammalian gene expression (Section 1.1). Next, I will discuss AS and its regulation, focusing
on the relatively uncharacterized roles of the basal splicing machinery and on insights from high-
throughput analysis methods (Sections 1.2 and 1.3). This is followed by an introduction to the
NMD pathway, focusing on the recognition of premature stop codons and on AS-NMD (Section
1.4). Finally, I will discuss how genes with diverse roles in RNA biogenesis and metabolism take
advantage of their own cellular functions to autoregulate their expression (Section 1.5).
1.1 Coordination of the gene expression machinery
Almost all human protein-coding genes are transcribed into a precursor messenger RNA (pre-
mRNA) that must be extensively processed before it is exported to the cytoplasm and recognized
by the translation machinery. In the nucleus, pre-mRNA undergoes capping, splicing, cleavage
and polyadenylation. These processes are integrated with transcription by RNA polymerase II
(pol II) and also result in the association of proteins with the mRNA to form a messenger
ribonucleoprotein (mRNP). Extensive crosstalk among these processes plays important roles in
the fidelity, efficiency and regulation of gene expression (Figure 1-1) (reviewed in Maniatis and
Reed 2002; Komili and Silver 2008; Pandit et al. 2008; Moore and Proudfoot 2009).
1.1.1 Interdependence among transcription, mRNA processing and chromatin
The carboxy-terminal domain (CTD) of the largest subunit of pol II plays a central role in the
crosstalk between transcription and pre-mRNA processing (reviewed in Perales and Bentley
2009; Munoz et al. 2010). The mammalian CTD contains 52 heptamer repeats with the
consensus sequence YS2PTS5PS, and it is required for efficient mRNA processing (McCracken
et al. 1997b). During the transcription cycle, changes in the phosphorylation pattern of the CTD
serine residues allow the recruitment of pre-mRNA processing, elongation, and histone-
modifying factors (reviewed in Buratowski 2009). Early in transcription, the CTD is
phosphorylated on Ser-5 by TFIIH, and the Ser-5-phosphorylated CTD recruits and activates the
mRNA capping enzymes (Cho et al. 1997; McCracken et al. 1997a; Cho et al. 1998; Ho and
-
3
Shuman 1999) (Figure 1-1A). The nuclear cap-binding complex (CBC, CBP80/20 heterodimer)
recognizes the capped mRNA 5′-end and promotes efficient splicing, export and translation
initiation (Section 1.1.2). As elongation proceeds, the CTD becomes highly phosphorylated on
Ser-2 residues by the positive transcription elongation factor b (P-TEFb) and pol II enters into
the productive elongation phase (Figure 1-1B) (Marshall et al. 1996) (reviewed in Bres et al.
2008). The Ser-2-phosphorylated CTD recruits the cleavage and polyadenylation machinery,
which then stimulates 3′end formation once the poly(A) site is transcribed (Figure 1-1D)
(Licatalosi et al. 2002; Ahn et al. 2004; Meinhart and Cramer 2004; Ni et al. 2004; Rosonina and
Blencowe 2004).
Figure 1-1. Coordination of transcription and pre-mRNA processing machineries.
See text for details.
Crosstalk between transcription elongation and splicing is mediated by the CTD as well as by
factors that associate with the nascent transcript and the chromatin template (Figure 1-1B)
(reviewed in Perales and Bentley 2009). This ‘coupling’ is functionally important for the
efficiency of splicing (Das et al. 2006; Hicks et al. 2006) and for regulating the differential use of
splice sites through alternative splicing (AS) (Cramer et al. 1997; Auboeuf et al. 2002; Kadener
-
4
et al. 2002; Nogues et al. 2002; de la Mata et al. 2003; Pagani et al. 2003; Ip et al. 2011).
Transcription can influence AS through both ‘recruitment’ and ‘kinetic’ coupling (reviewed in
Munoz et al. 2010). In recruitment coupling, specific splicing regulatory proteins as well as
factors with dual roles in transcription and splicing regulation (see below for examples) are
recruited to the transcribing polymerase, often via association with the CTD. In kinetic coupling,
changes in the pol II elongation rate affect splice site choice by influencing the timing of
presentation of splicing signals in the pre-mRNA to the splicing machinery. The pol II
elongation rate may be influenced by promoter identity and associated transcriptional activators
or co-activators and by elongation factors associated with the CTD.
Studies of the Ser/Arg-rich (SR) proteins, a class of sequence-specific RNA-binding factors that
bind pre-mRNA to regulate AS (Section 1.3.4), illustrate the interdependence between
transcription and AS regulation. The activities of several SR proteins are modulated in a
promoter- and CTD-dependent manner (Cramer et al. 1999; de la Mata and Kornblihtt 2006). In
addition, several factors involved in splicing, including the SR protein SRSF2 (also known as
SC35), enhance transcription through the recruitment or stimulation of elongation factors such as
the CTD kinase P-TEFb (Figure 1-1B) (Fong and Zhou 2001; Bres et al. 2005; Lin et al. 2008).
Thus, transcription can affect splicing and, reciprocally, splicing can affect transcription.
During the transcription cycle, Ser-2 or Ser-5-phosphorylated pol II and associated elongation
factors recruit chromatin modifying complexes that establish or maintain characteristic patterns
of histone modifications on active genes (reviewed in Buratowski 2009). In human cells, the 5′-
ends of active genes are typically marked by histone H3 lysine 4 trimethylation (H3K4me3)
(Bernstein et al. 2005). This chromatin mark is recognized by CHD1 (chromodomain helicase
DNA binding protein 1), which can recruit the splicing machinery (U2 snRNP; see Section 1.2.2)
to facilitate efficient pre-mRNA splicing (Sims et al. 2007). In addition, the histone modification
H3K36me3 is enriched in gene regions encoding alternative exons regulated by the splicing
factor PTB (polypyrimidine tract binding protein; see Section 1.3.3). These modified histone
tails are recognized by MRG15 (MORF-related gene 15), which enhances the recruitment of
PTB to the pre-mRNA (Luco et al. 2010). Thus, physical crosstalk between chromatin and the
splicing machinery represents an additional layer of gene regulation (Figure 1-1C) (reviewed in
Allemand et al. 2008; Luco et al. 2011).
-
5
1.1.2 mRNA processing remodels the messenger RNP
Capping, splicing and polyadenylation in the nucleus result in the association of protein
complexes with the mRNA, which in turn influence mRNA export, localization, translation and
stability. The transcription and export (TREX) complex is recruited to the 5′ end of mRNAs in a
cap- and splicing-dependent manner in human cells (Figure 1-1B) (Masuda et al. 2005; Cheng et
al. 2006). The TREX subunit ALY (also known as REF, THOC4 or Yra1 in yeast) directly binds
the mRNA as well as the CBP80 subunit of the cap-binding complex. ALY functions as an
mRNA export adapter by transferring the mRNA to TAP (TIP-associated protein; also known as
NXF1, nuclear RNA export factor 1, or Mex67 in yeast). Together with its partner p15, TAP
interacts with the nuclear pore complex to mediate mRNA export (Hautbergue et al. 2008).
Additonal RNA-binding proteins, including some SR proteins, can also act as TAP-dependent
export adapters (Huang et al. 2003).
Splicing results in the deposition of a multi-protein exon junction complex (EJC) approximately
20 nt upstream of exon-exon junctions (Figure 1-1B) (reviewed in Le Hir and Andersen 2008).
The four core factors of the EJC are eIF4A3, Y14, MAGOH (mago-nashi homologue), and
MLN51 (metastatic lymph node gene 51; also known as Barentz, Btz, CASC3). These four
proteins along with RNPS1 and UPF3 remain associated with the mRNA during export, until
they are removed during the first round of translation (Dostie and Dreyfuss 2002; Lejeune et al.
2002). Additional splicing-related proteins are peripherally associated with the EJC in the
nucleus but do not remain bound during export. While it was initially believed that all splice
junctions are marked by the EJC, recent evidence in fly cells suggests that EJC deposition may
be a regulated process (Sauliere et al. 2010). The EJC factors have multiple roles in RNA
metabolism, including in mRNA localization (Palacios et al. 2004) and translation (Wiegand et
al. 2003; Nott et al. 2004). The EJC also communicates the positions of splice junctions to
cytoplasmic factors involved in nonsense-mediated mRNA decay (NMD), a pathway that
degrades mRNAs containing premature termination codons (PTC). Specifically, the presence of
an EJC downstream of a PTC strongly stimulates mammalian NMD (see Section 1.4). In
addition to these post-splicing roles, new findings in Drosophila show that the EJC functions in
the splicing of exons flanked by long introns (Ashton-Beaucage et al. 2010; Roignant and
Treisman 2010).
-
6
1.2 Pre-mRNA splicing
Approximately 92% of human protein-coding genes are interrupted by introns, and on average
each gene contains 8-9 introns (Fedorova and Fedorov 2005). The excision of introns from pre-
mRNA, or splicing, is catalyzed by the spliceosome, a large ribonucleoprotein (RNP) complex
comprising the U1, U2, U4/6 and U5 small nuclear (sn)RNPs and a few hundred protein factors
(reviewed in Wahl et al. 2009). Both RNA and protein components of the spliceosome play
important roles in recognition of the core splicing signals and in catalysis. This section outlines
the recognition of the core splicing signals and subsequent assembly of the spliceosome on the
pre-mRNA. I will also focus on the core snRNP Sm proteins, which will be relevant in the later
chapters of my thesis.
1.2.1 Core and auxiliary splicing signals
The core splicing signals in the pre-mRNA are short motifs with considerable sequence
flexibility. The 5′ (donor) and 3′ (acceptor) splice sites (ss) are located at the 5′ and 3′ boundaries
of the intron, respectively, and the branch point is located upstream of the 3′ss. Consensus
sequences for the mammalian core splicing signals are shown in Figure 1-2A. The splicing
reaction involves two successive trans-esterifications. In the first step, the 2′ hydroxyl of the
branch point adenosine attacks the phosphodiester bond at the 5′ss, generating a free 3′ hydroxyl
on the 5′ exon and a branched intron-lariat-3′exon as intermediates. In the second step, the free 3′
hydroxyl of the 5′ exon attacks the phosphodiester bond at the 3′ss, resulting in ligation of the
exons and release of the intron lariat.
The short, degenerate core splicing signals that mark the boundaries of introns do not contain
sufficient information to accurately define the exons in human transcripts (Lim and Burge 2001).
Introns often contain many ‘pseudoexons’ – intronic sequences flanked by ‘decoy’ consensus ss
sequences that are not normally recognized by the splicing machinery. Thus additional cis-acting
regulatory sequences are necessary to distinguish introns and exons (reviewed in Chasin 2007).
These auxiliary sequences are known as exonic or intronic splicing enhancers when they
promote splicing (ESE/ISEs), or as splicing silencers when they inhibit splicing (ESS/ISS).
Splicing enhancers and silencers are usually short, degenerate sequence motifs (5-10 nt) and they
play roles in the recognition of constitutive exons (Section 1.2.3) as well as in the regulation of
inclusion of alternative exons (Section 1.3.2).
-
7
Figure 1-2. Overview of core splicing signals and early stages of spliceosome assembly.
(A) Consensus sequences of the mammalian core splicing signals. PPT, polypyrimidine tract; ss,
splice site.
(B) Early stages of spliceosome assembly are shown. The U1, U2, and U4/6.U5 snRNPs contain
the indicated snRNA(s) and associated proteins. Sequences of U1 and U2 snRNAs that base-pair
with the 5′ss and branch site, respectively, are shown in white text. Ψ, pseudouridine; R, A/G; Y,
U/C.
1.2.2 Spliceosome assembly
The consensus model of spliceosome assembly has been mostly characterized using in vitro
approaches (reviewed in Matlin and Moore 2007; Wahl et al. 2009). Spliceosome assembly is a
step-wise process involving recruitment of snRNPs and proteins to the pre-mRNA and dynamic
rearrangements of RNA–RNA, RNA–protein and protein–protein interactions (Figure 1-2B). In
the early (E) complex, also known in yeast as the ‘commitment complex’, the 5′ss is recognized
-
8
by U1 snRNP, the branch point is recognized by SF1 (Splicing Factor 1; also known in yeast as
BBP, branchpoint binding protein), and the PPT and 3′ss are recognized by the subunits of the
U2 accessory factor (U2AF) heterodimer (U2AF65 and U2AF35, respectively). Recognition of
the 5′ss involves base-pairing between the 5′-end of U1 snRNA and the pre-mRNA, which is
stabilized by proteins in the U1 snRNP (Zhang and Rosbash 1999).
The U2 snRNP then replaces SF1 at the branchpoint, forming the A complex (also referred to as
the pre-spliceosome) (Figure 1-2B). Formation of the A complex is ATP-dependent and involves
base-pairing of U2 snRNA at the branch site region, which is stabilized by components of the U1
and U2 snRNPs and by U2AF65 (Barabino et al. 1990; Valcarcel et al. 1996; Gozani et al.
1998). A bulged duplex formed between the U2 snRNA and the branch site region specifies the
protruding adenosine as the nucleophile for the first trans-esterification reaction of splicing
(Query et al. 1994). The bulged adenosine is also recognized by the U2 snRNP protein p14
(SF3B14) (MacMillan et al. 1994; Schellenberg et al. 2011). While the splice sites are
recognized in E complex, the pairing of splice sites for catalysis occurs at, or subsequent to, A
complex formation (Chiara and Reed 1995; Lim and Hertel 2004; Kotlajich et al. 2009).
The U4/6.U5 tri-snRNP then joins the spliceosome, forming the B complex. This complex
undergoes extensive remodeling to form the catalytically active spliceosome. The multiprotein
PRP19/CDC5L complex (also known in yeast as the NineTeen Complex or NTC) and additional
RNA helicases also associate with the spliceosome and function in spliceosome activation and
splicing fidelity (reviewed in Valadkhan 2007; Hogg et al. 2010). The remodelling of RNA–
RNA interactions during spliceosome activation includes disruption of U4–U6 snRNA base-
pairing to allow base-pairing of U6 snRNA with intronic nucleotides at the 5′ss, release or
destabilization of the U1 and U4 snRNPs, and rearrangement of interactions between U2 and U6
snRNA and within U6 snRNA. Following the two trans-esterification reactions of splicing, the
products are released and the components of the spliceosome are recycled.
In contrast to the step-wise model of spliceosome assembly characterized in vitro, the isolation
of a ‘penta-snRNP’ from yeast cells led to the hypothesis that the spliceosome may encounter the
pre-mRNA in a pre-assembled form in vivo (Stevens et al. 2002). However, it has been suggested
that the two models might be reconciled if the step-wise assembly characterized in vitro could be
viewed instead as step-wise rearrangement and activation of the penta-snRNP (reviewed in Brow
-
9
2002; Nilsen 2002). Recent work also supports the relevance of the step-wise assembly model in
vivo. Several groups used chromatin immunoprecipitation to monitor the co-transcriptional
recruitment of snRNP components and other splicing factors to nascent transcripts of yeast
intron-containing genes. These studies showed a sequential pattern of snRNP or splicing factor
recruitment that was consistent with step-wise spliceosome assembly (Gornemann et al. 2005;
Lacadie and Rosbash 2005; Tardiff and Rosbash 2006). In addition, live imaging of snRNP
components tagged with fluorescent proteins revealed distinct interaction dynamics of individual
snRNPs with pre-mRNA, in support of a step-wise recruitment model in human cells (Huranova
et al. 2010).
1.2.3 Exon definition
The splicing reaction takes place between 5′ and 3′ splice sites paired across an intron. However,
in metazoan genes, where introns are often longer than exons by an order of magnitude or more,
it is likely that splicing is facilitated by a process termed ‘exon definition’ (Berget 1995). In the
exon definition model, the factors bound to the splice sites on either side of internal exons
initially interact and are stabilized across the exon (Figure 1-2B). Early evidence for exon
definition included the finding that the presence and strength of a 5′ss downstream of an exon
affects the recognition and splicing of the upstream intron (Nasim et al. 1990; Robberson et al.
1990; Talerico and Berget 1990; Kuo et al. 1991). In addition, using a reporter containing an
isolated exon flanked by splice sites, it was found that the 5′ss sequence and U1 snRNP
promoted UV crosslinking of U2AF65 at the PPT/3′ss, thus providing further evidence for the
importance of cross-exon interactions (Hoffman and Grabowski 1992). Key mediators of this
cross-exon bridging activity include proteins in the SR family (Section 1.3.4). These proteins
interact with exonic splicing enhancers (ESEs) and promote binding of U1 snRNP and U2AF to
the pre-mRNA through direct interactions as well as through interactions with splicing co-
activator proteins (reviewed in Blencowe 2000). The RS domains of SR proteins also promote or
stabilize RNA–RNA contacts between the core splicing signals and the U-snRNAs (Shen and
Green 2006). Computational analysis of splicing signals in human and mouse also support the
exon definition model. Compensatory changes in the strength of 5′ and 3′ splice sites are
observed across exons, but not across introns (Xiao et al. 2007). Furthermore, splice sites, ESEs,
and ESSs coevolve to preserve the overall exon strength (Xiao et al. 2007).
-
10
Our understanding of exon definition complexes is incomplete, since the majority of in vitro
spliceosome assembly assays have used reporter pre-mRNAs containing two exons separated by
a single short intron. However, several recent studies have shed light on exon-defined
complexes. Assembly of spliceosome complexes in vitro on a three-exon pre-mRNA reporter
revealed that an exon-defined E complex can be chased into an exon-defined A complex in the
presence of ATP (Sharma et al. 2008). In addition, proteomics analysis indicated that these exon-
defined complexes were similar in composition to previously characterized intron-defined
complexes (Sharma et al. 2008). The mechanism for conversion of cross-exon interactions into
cross-intron interactions is also an area under active investigation. Recently, using an in vitro
trans-splicing assay, it was shown that the U4/6.U5 tri-snRNP can associate with an exon-
defined A complex, without requiring prior establishment of cross-intron interactions between
U1 and U2 snRNP (Schneider et al. 2010). In addition, the establishment of cross-intron
interactions upstream of an exon did not require disruption of the interactions formed across that
exon (Schneider et al. 2010). In a related study, conversion of cross-intron to cross-exon
interactions was investigated using pre-mRNA reporters with multiple introns. Following
splicing of one intron, U1 snRNP previously engaged in cross-exon interactions on the 3′exon
remains associated with the mRNA and promotes efficient splicing of the neighbouring intron
(Crabb et al. 2010).
1.2.4 Spliceosomal snRNPs and Sm proteins
The snRNPs are major components of the spliceosome. Each snRNP contains a uridine-rich
snRNA (U1, U2, U4, U5 or U6) and associated proteins, however U4 and U6 are base-paired in
a U4/6 di-snRNP (Bringmann et al. 1984; Hashimoto and Steitz 1984) which is also found
associated with U5 snRNP in a U4/6.U5 tri-snRNP complex (Konarska and Sharp 1987). The
purification of snRNP components from mammalian cells was fortuitously accomplished using
serum from a patient with the autoimmune disease systemic lupus erythematosus (SLE) (Lerner
and Steitz 1979). This SLE serum was known to contain antibodies that react with a nuclear
antigen present in many mammalian tissues (Tan and Kunkel 1966). The nuclear antigen was
designated ‘Sm’, for ‘Smith’, in honour of Stephanie Smith, the SLE patient from whom the
serum was isolated (Tan and Kunkel 1966) (reviewed in Reeves et al. 2003). Using the anti-Sm
serum, RNPs containing the U-snRNAs and 7 small (12-35 kDa) proteins designated A-G were
-
11
immunoprecipitated from mammalian cell extracts (Lerner and Steitz 1979). A subset of these
proteins that are common to the U1, U2, U4 and U5 snRNPs became known as the Sm proteins.
The snRNPs contain both common and unique proteins. The seven common Sm proteins (B/B′
(see Chapter 4), D1, D2, D3, E, F and G) are assembled onto the snRNAs by the SMN complex
(survival of motor neuron) (reviewed in Neuenkirchen et al. 2008). Formation of this snRNP
‘core’ is essential for subsequent steps in the biogenesis of mature snRNP particles. The Sm
proteins bind a conserved single-stranded ‘Sm site’ with consensus sequence PuA[U3-6]GPu,
located between two stem-loops near the 3′ end of the ‘Sm-class’ snRNAs (U1, U2, U4 and U5)
(Branlant et al. 1982; Liautard et al. 1982). Based on crystal structures of the B-D3 and D1-D2
Sm protein dimers along with previous biochemical data, a model was proposed in which the Sm
site RNA passes through the central cavity formed by a hetero-heptameric Sm protein ring
(Kambach et al. 1999). This model was recently confirmed by two crystal structures of the U1
snRNP assembled from recombinant components (Pomeranz Krummel et al. 2009) or generated
by limited proteolysis of native snRNPs isolated from HeLa cells (Weber et al. 2010).
The Sm proteins are essential for the assembly and stability of snRNPs. However, their role in
the splicing process is not well characterized. In yeast, Sm proteins B, D1 and D3 contact the
pre-mRNA near the 5′ss in the commitment/E complex (Zhang and Rosbash 1999). These three
Sm proteins have extensions or ‘tails’ located C-terminal to their conserved Sm domains.
Splicing assays in yeast strains harboring tail-truncated Sm proteins suggested that the tails of
Sm B, D1 and D3 contribute to the stability of the U1 snRNA–pre-mRNA interaction, perhaps
through basic arginine and lysine residues in the yeast Sm protein tails (Zhang et al. 2001). The
mammalian C-terminal tails are also rich in positively charged residues. The D1 and D3 tails
contain glycine-arginine (GR) repeats. In contrast, the SmB tail in mammals is quite divergent
from that of yeast and contains a striking stretch of repeats of 3-4 prolines interspersed with
glycine, methionine and arginine residues (e.g. GMPPPGMRPPPPGMR). These ‘PGM’ motifs
in the tail interact with the WW domain of FBP21 (formin-binding protein 21), a spliceosome-
associated protein implicated in cross-intron bridging interactions (Bedford et al. 1998).
However, the function of this interaction in splicing has not been studied. In addition, while the
U1 snRNP crystal structures mentioned above provided insight into recognition of the snRNA by
the Sm ring, they were less informative regarding the function of the C-terminal Sm tails, since
these repetitive regions were either omitted from recombinant proteins or found to be disordered
-
12
(Pomeranz Krummel et al. 2009; Weber et al. 2010). Overall, while the C-terminal tails of the
Sm B/B′, D1 and D3 proteins play a role in nuclear localization of the snRNPs (Bordonne 2000;
Girard et al. 2004), the roles of the mammalian tails in U1 snRNA–pre-mRNA interaction or
other steps in splicing remain to be studied.
Additional insights into the function of Sm proteins in splicing might be inferred by analogy to
the functions these proteins in other RNA–protein complexes. In addition to the spliceosomal
snRNPs, Sm proteins form a related but distinct heptamer on U7 snRNA. The U7 heptamer
contains five Sm proteins (B/B′, D3, E, F, and G), along with two Sm-like (LSm) proteins
LSm10 and LSm11, which replace Sm proteins D1 and D2, respectively. The U7 snRNP
functions in histone 3′end processing (reviewed in Dominski and Marzluff 2007). A recent study
found that the U7 snRNP components SmB, SmD3 and LSm10 UV-crosslinked to the histone
mRNA (Yang et al. 2009). A model was proposed in which these proteins might function as a
‘molecular ruler’ to specify the histone mRNA cleavage site at a fixed distance upstream of an
RNA sequence (the ‘histone downstream element’) that is recognized by base-pairing to U7
snRNA (Yang et al. 2009). In this model, Sm proteins B and D3 function as part of the heptamer
to mediate RNA–RNA interactions between the U7 snRNA and the histone mRNA. This
function is reminiscent of the proposed role of the yeast Sm complex in U1 snRNA–pre-mRNA
interaction discussed above (Zhang et al. 2001). However, it is not known if the Sm protein–
RNA interaction occurs via the C-terminal tails, as suggested in the yeast model, or another
region of the Sm proteins.
1.3 Regulation of alternative splicing
Alternative splicing (AS) is the process of differential splice site usage to generate multiple
mRNA variants from a single pre-mRNA. Upon release of the draft human genome, it was
estimated that at least 59% of genes undergo AS, based on aligning expressed sequence tags
(ESTs) and cDNAs to coding genes on chromosome 22 (International Human Genome
Sequencing Consortium 2001). A higher frequency of AS, affecting 74% of multi-exon genes,
was then estimated based on data from tissue profiling on exon junction microarrays and
EST/cDNA evidence (Johnson et al. 2003). More recently, the use of high-throughput RNA
sequencing (RNA-Seq) has led to an estimate that transcripts from 95% of human multi-exon
genes undergo AS (Pan et al. 2008; Wang et al. 2008). Alternative splicing affects transcript
-
13
diversity in several ways, including cassette-type exons, mutually exclusive exons, alternative 5′
or 3′ss selection, alternative promoters, alternative polyadenylation, and intron retention. In my
work, I will focus on cassette-type exons, which are either included or skipped in the spliced
mRNA, and which represent the most common type of AS (Castle et al. 2008; Wang et al. 2008).
Although AS is widespread, the functional importance of most splice variants remains to be
investigated.
1.3.1 Roles of alternative splicing
Very soon after the discovery that genes are interrupted by introns, it was proposed that exons
might be joined in different combinations to generate multiple polypeptides from a single gene
(Gilbert 1978). This role of AS in expansion of the proteome has been particularly emphasized
following the sequencing of the human genome (International Human Genome Sequencing
Consortium 2001), which was found to encode fewer protein-coding genes than anticipated by
many (reviewed in Aparicio 2000; Pennisi 2003). A primary outcome of AS is the expansion of
transcriptome complexity. An important consequence is an increase in the diversity of the
encoded proteome (reviewed in Maniatis and Tasic 2002; Nilsen and Graveley 2010). However,
an additional outcome of transcriptome expansion by AS is an increase in post-transcriptional
regulatory potential. For example, differences in the coding region, 5′UTR or 3′UTR between
mRNA variants produced from the same pre-mRNA can affect translation (e.g. upstream ORFs),
stability (e.g. microRNA binding sites, AU-rich elements, premature stop codons), and mRNA
localization, and thus have important consequences for the regulation of gene expression
(Majoros and Ohler 2007; Tan et al. 2007; Mayr and Bartel 2009; Resch et al. 2009; Bell et al.
2010; Salomonis et al. 2010) (reviewed in Smith et al. 1989; Hughes 2006). The roles of AS in
regulating gene expression will be discussed further in Section 1.5 below.
1.3.2 Mechanisms of alternative splicing regulation
Alternative splicing can be controlled in a developmental stage- and cell type-specific manner, as
well as in response to signaling or environmental cues (reviewed in Chen and Manley 2009).
This AS regulation is achieved through multiple levels of control. For example, transcription
elongation rate, chromatin modification, EJC deposition (see Section 1.1) and pre-mRNA
secondary structure (reviewed in Warf and Berglund 2010) can influence splice site choice.
However, the best-characterized mechanism of AS regulation is through the recognition of short
-
14
cis-acting RNA sequence motifs (ESE/S, ISE/S) by splicing-regulatory proteins. Initial studies of
AS regulation focused on the enhancement or repression of splice site recognition at the early
stages of spliceosome assembly (Section 1.3.4). In contrast, some regulatory mechanisms affect
splice site pairing, rather than recognition, or recruitment of the U4/U6.U5 tri-snRNP. These
diverse mechanisms allow regulation of splice site choice at later stages of spliceosome assembly
or even during splicing catalysis (Section 1.3.5).
1.3.3 Families of alternative splicing regulatory factors
The most extensively studied groups of splicing-regulatory factors are the SR (Ser/Arg-rich),
SR-related and hnRNP (heterogeneous ribonucleoprotein) families, which I will discuss in the
next section (1.3.4). Many of these proteins are widely expressed and thought to affect AS
regulation in a concentration-dependent manner (Mayeda et al. 1993; Caceres et al. 1994;
Hanamura et al. 1998) (reviewed in Chen and Manley 2009). However, some members of these
families have tissue-restricted expression patterns. For example, our lab recently identified and
characterized the first example of a nervous system-specific SR-related protein, nSR100 (also
known as SRRM4, serine/arginine repetitive matrix 4) (Calarco et al. 2009). In addition, the
hnRNP family member PTBP1 (polypyrimidine tract binding protein P1; also known as PTB,
hnRNPI) is widely expressed, while two PTBP1 paralogues, PTBP2 (also known as nPTB,
brPTB, neural/brain PTB) and ROD1 (regulator of differentiation 1) are expressed in specific
cell types. Interestingly, regulation of the AS of the genes encoding these proteins plays a role in
establishing their expression patterns (see Section 1.5) (Wollerton et al. 2004; Boutz et al. 2007b;
Makeyev et al. 2007; Spellman et al. 2007).
Several other AS factors with tissue-restricted expression have also been characterized. Members
of the NOVA (neuron-oncological ventral antigen) and ELAV-like (embryonic lethal, abnormal
vision-like; also known as paraneoplastic encephalomyelitis antigen Hu) families are expressed
in neurons, FOX (Feminizing gene On X homolog) and CELF (CUG-binding protein and ETR3-
like family, also known as Bruno-like) proteins are expressed in the brain, heart or muscle,
(reviewed in Li et al. 2007) and ESRPs (epithelial splicing regulatory proteins) are expressed in
epithelial cells (Warzecha et al. 2009). Like many of the SR proteins and hnRNPs, these factors
bind short RNA motifs in a sequence-specific manner, through RNA recognition motifs (RRMs)
or hnRNP-K homology (KH) domains (Cook et al. 2011).
-
15
1.3.4 Regulation of splice site recognition
1.3.4.1 SR and SR-related proteins
The SR proteins contain 1-2 N-terminal RNA recognition motifs (RRMs) and a C-terminal RS
domain that is rich in alternating serine and arginine dipeptides (reviewed in Lin and Fu 2007;
Long and Caceres 2009). The prototypical SR proteins function in both constitutive and
alternative splicing. Based on in vitro splicing assays, these SR proteins appear to be functionally
redundant in their ability to complement splicing-deficient HeLa S100 extract (Fu et al. 1992;
Mayeda et al. 1992). However, additional studies indicate that SR proteins bind distinct RNA
sequences and that they have non-redundant AS functions in vivo (reviewed in Long and Caceres
2009). For example, depletion of the prototypical SR protein SRSF1 (also known as SF2, ASF)
in C. elegans by RNAi results in late embryonic lethality (Longman et al. 2000). Similarly, loss
of SRSF1 in chicken DT-40 cells or in mouse embryos is lethal (Wang et al. 1996; Xu et al.
2005). Moreover, tissue-specific ablation of SRSF1 in the mouse heart resulted in misregulation
of an SRSF1-dependent AS event in Ca2+
/calmodulin-dependent kinase IIδ (CaMKIIδ) and a
defect in postnatal heart remodelling (Xu et al. 2005). Thus, SR proteins have specific, non-
redundant functions in the regulation of AS.
In addition to the prototypical SR proteins, many other ‘SR-related’ proteins also function as
regulators of splicing and AS. These proteins often contain RS and RRM domains, but in a
different configuration than the classical SR proteins. Examples of such SR-related proteins
include TRA2A and TRA2B, which are homologues of transformer-2, an AS regulator involved
in Drosophila sex determination. Other SR-related proteins contain RS domains alone or in
combination with other RNA-binding domains (reviewed in Blencowe et al. 1999).
Though best known as positive regulators of AS, SR proteins can both promote and inhibit the
inclusion of alternative exons (reviewed in Lin and Fu 2007; Long and Caceres 2009). SR
proteins function in ESE-dependent splicing in several ways (reviewed in Blencowe 2000;
Graveley 2000). SR proteins can bind specific ESE sequences and recruit the splicing machinery
via interactions of their RS domains with snRNP components (e.g. U2AF35 and U170K)
(Lavigueur et al. 1993; Wu and Maniatis 1993; Wang et al. 1995; Zuo and Maniatis 1996;
Graveley et al. 2001). Alternatively, some SR-related proteins can function in ESE-dependent
splicing by acting as splicing co-activators that bridge interactions between ESE-bound SR/SR-
-
16
related proteins and snRNPs (Blencowe et al. 1998; Eldridge et al. 1999; Blencowe et al. 2000).
Binding of SR proteins can also enhance exon inclusion by antagonizing the activity of negative
regulators bound at nearby silencer elements (Kan and Green 1999). Recent results also show
that inclusion of an alternative exon can be repressed by strong interactions of SR proteins with
the flanking constitutive exons (Han et al. 2011). In addition to roles in AS regulation, some SR
and SR-related proteins function in transcription, 3′end formation, mRNA export and translation
(reviewed in Blencowe et al. 1999; Long and Caceres 2009).
1.3.4.2 hnRNPs
The heterogeneous ribonucleoproteins (hnRNPs) are a diverse group of proteins functionally
defined by their association with nascent hnRNA (pre-mRNA). The hnRNPs typically contain
one to four RNA-binding domains (RRMs, quasiRRMs or KH domains), as well as other
auxiliary domains such as RGG boxes (Arg-Gly-Gly) or Gly-rich domains (reviewed in
Martinez-Contreras et al. 2007). Many of the hnRNPs that have been implicated in AS regulation
can inhibit splice site recognition through binding to specific silencer sequences (Caputi et al.
1999; Chen et al. 1999; Del Gatto-Konczak et al. 1999). Some hnRNPs such as hnRNPA1 may
also cooperatively multimerize on the pre-mRNA to block the association of other factors at a
distance (Zhu et al. 2001). The recognition of silencers by hnRNPs can thus block or compete
with the recognition of either nearby or distal enhancer sequences by positive regulatory factors.
Alternatively, hnRNPs may block or compete with the binding of snRNP-associated factors such
as U2AF to the core splicing signals (Lin and Patton 1995; Singh et al. 1995). Some hnRNPs
also stimulate intron definition through interactions between multiple proteins recognizing sites
at the boundaries of long introns (Martinez-Contreras et al. 2006). In addition, when intronic
hnRNP binding sites flank an alternative exon, interaction between the hnRNPs can lead to exon
silencing by ‘looping out’ the alternative exon and bringing the splice sites of the flanking exons
into close proximity (Chabot et al. 1997; Blanchette and Chabot 1999). However, at least in one
case of such a looping mechanism, the binding of U1 snRNP to the 5′ss of the silenced exon was
not inhibited (Chabot et al. 1997; Blanchette and Chabot 1999). Therefore, this mechanism may
involve inhibition of splice site pairing rather than recognition, as described in the next section.
-
17
1.3.5 Regulation of splice site pairing and catalysis
In addition to the regulation of splice site recognition at the earliest stages of spliceosome
assembly, a number of recent studies have revealed that AS can be regulated at later stages,
including the subsequent steps involved in the pairing of splice sites or the recruitment of the tri-
snRNP (reviewed in House and Lynch 2008). Moreover, some trans-acting splicing factors can
regulate AS at both early and late stages of spliceosome assembly. For example, the hnRNP
PTBP1 can repress alternative exon inclusion by inhibiting early steps leading to exon definition
(Izquierdo et al. 2005; Sharma et al. 2005). However, in another mechanism, PTB can act after
exon definition, by binding in an intron and blocking the functional cross-intron pairing of U1
and U2 snRNPs already associated with the splice sites (Sharma et al. 2008). Repression of
alternative exon inclusion by hnRNP-L and hnRNP-E2 can also occur through a post–exon
definition mechanism. In this case, the binding of the hnRNPs to an exon prevents the U1 and
U2 snRNPs bound at its splice sites from forming productive cross-intron interactions with
snRNPs at the flanking exons (House and Lynch 2006). Post exon definition mechanisms are
also not limited to hnRNPs. The SR-related tumor suppressor RBM5 can repress exon inclusion
by a dual mechanism involving both blocking the transition to intron definition of the snRNP-
recognized splice sites flanking a repressed alternative exon, as well as facilitating the pairing of
the splice sites of the flanking constitutive exons (Bonnal et al. 2008). Splice site choice can also
be regulated during catalysis. In the Drosophila melanogaster sex determination gene Sex-lethal,
the Sex-lethal protein causes skipping of an alternative exon in its own transcript through an
interaction with the splicing factor SPF45 that blocks splicing at the second catalytic step
(Lallena et al. 2002). Together, these studies reveal the diversity of splicing regulatory
mechanisms.
1.3.6 Roles of basal splicing factors in alternative splicing regulation
Studies in yeast and metazoans have shown that the levels of some basal or ‘core’ components of
the splicing machinery can affect splice site choice. Microarray profiling revealed transcript-
specific splicing effects in yeast strains harboring mutations in or deletions of core splicing
components (Clark et al. 2002; Pleiss et al. 2007; Kawashima et al. 2009). In addition, an RNAi
screen in Drosophila cells identified transcript-specific effects on AS upon depletion of general
spliceosome factors, including U2AF and components of U1, U2 and U4/U6 snRNPs (Park et al.
2004). Studies in C. elegans and mammalian cells also suggested that the U2AF subunits and the
-
18
U2 snRNP component SAP155 can affect splice site choice (Massiello et al. 2006; Pacheco et al.
2006; Hastings et al. 2007; Ma and Horvitz 2009). Two very recent studies implicate additional
core splicing factors in AS regulation and identify associated target sequence features. The
branchpoint recognition factor SF1 may regulate AS of some transcripts by binding to branch
site-like sequences (Corioni et al. 2011). Also, transcriptome profiling in zebrafish embryos
deficient in the U1 snRNP-specific protein U1C revealed altered splice site choice in targets with
intronic U-rich sequences (Rosel et al. 2011). In a mouse model of spinal muscular atrophy
(SMA), deficiency of the snRNP assembly factor SMN (Survival of Motor Neuron) resulted in
tissue-specific perturbations in snRNP levels and splicing defects (Gabanella et al. 2007; Zhang
et al. 2008; Baumer et al. 2009). Tiling microarray profiling analysis of fission yeast RNA also
revealed transcript-specific splicing defects of a temperature-degron allele of SMN, and that
some of the defects could be alleviated by strengthening the pyrimidine tract upstream of the
branch-point (Campion et al. 2010). In addition to these studies, the work in my thesis will
provide new evidence for the role of core splicing factors in AS regulation (Saltzman et al.
2011).
In summary, the features that underlie the differential sensitivity of introns or alternative exons
to particular defects in the core splicing machinery are only beginning to be explored. Moreover,
in contrast to the AS regulatory factors described above, the mechanisms of these effects are
poorly understood. Some clues may be provided by analogy to the kinetic proofreading model of
splicing fidelity in yeast. This model broadly predicts that any changes that alter the kinetics of
transitions in the splicing pathway, including the availability or activity of core splicing factors,
can alter splice site choice (Yu et al. 2008) (reviewed in Smith et al. 2008).
1.3.7 Breaking the ‘code’ of cis-acting alternative splicing regulatory sequences
A goal of the study of AS is to build predictive models for AS regulation, or a splicing regulatory
‘code’ (reviewed in Matlin et al. 2005; Blencowe 2006; Wang and Burge 2008). Deciphering the
rules that control AS will be important for understanding gene expression on a genome-wide
scale, and for the ability to predict how mutations affect this regulation. However, the nature of
splicing regulation complicates the path from genomic sequence to AS predictions. For example,
a particular cis-regulatory sequence can have opposite effects on AS regulation depending on its
position within an intron or exon, even when the sequence is recognized by the same trans-acting
-
19
regulator (reviewed in Chen and Manley 2009). The activity of an AS regulator can also depend
on local sequence context (Xiao et al. 2009; Motta-Mena et al. 2010) or on its post-translational
modification state (Feng et al. 2008). Many regulated alternative exons and their flanking introns
also have binding sites for multiple factors, suggesting they are controlled in a combinatorial
manner. Nevertheless, significant advances have been made recently in identifying sequence
features that predict tissue-regulated AS as well as regulation by specific trans-acting factors
(Barash et al. 2010; Zhang et al. 2010). This progress has been accelerated by integrating
information from multiple sources, especially sequence conservation across species, splicing
regulatory motifs identified through bioinformatic and experimental screening approaches, RNA
target binding data for AS regulators, RNA structural features, and splice variant profiling data
from microarrays or high throughput RNA sequencing (RNA-Seq).
1.3.8 Large-scale analysis of alternative splicing regulation
Many insights into AS and its regulation have been made possible using high-throughput
methods to study the transcriptome. Technologies used to detect and quantify the levels of splice
variants in an mRNA sample include microarrays (tiling, exon, exon-junction and exon/exon-
junction combinations) (Shoemaker et al. 2001; Johnson et al. 2003; Pan et al. 2004) (reviewed
in Calarco et al. 2007; Hallegger et al. 2010), fibre-optic bead arrays (Yeakley et al. 2002), high-
throughput RT-PCR (Klinck et al. 2008), and RNA-Seq (Cloonan et al. 2008; Mortazavi et al.
2008; Pan et al. 2008; Sultan et al. 2008; Wang et al. 2008) (reviewed in Blencowe et al. 2009).
These methods have been used to profile differences in the mammalian splice variant repertoire
among tissues, individuals, developmental stages and cell culture models of developmental
transitions, as well as in cancer versus normal tissues (reviewed in Calarco et al. 2007; Hartmann
and Valcarcel 2009; Hallegger et al. 2010). High throughput methods have also been used to
identify functional targets of specific AS regulators by profiling AS following knockdown or
loss of a particular protein (Blanchette et al. 2005; Ule et al. 2005) (reviewed in Calarco et al.
2007; Hallegger et al. 2010). Combining this profiling data with factor binding site preferences
determined by methods such as SELEX (Tuerk and Gold 1990) or RNAcompete (Ray et al.
2009) can then provide insights into the biological function of an AS regulator. Furthermore, to
distinguish direct from indirect targets, methods such as UV Cross-linking and
Immunoprecipitation coupled with high throughput sequencing (CLIP-Seq; also known as high
throughput sequencing of RNA isolated by CLIP, HITS-CLIP) allow the isolation of RNA
-
20
targets directly bound by a protein of interest on a genome-wide scale (Ule et al. 2003) (reviewed
in Witten and Ule 2011).
In addition to cataloguing transcriptome complexity, the approaches mentioned above have
revealed sequence features associated with AS regulation and allowed construction of ‘RNA
splicing maps’ of the position-dependent effects of AS regulators (reviewed in Witten and Ule
2011). More generally, while mRNA expression profiling microarrays showed that functionally
related genes are often co-expressed in mammalian cells and tissues (Eisen et al. 1998; Su et al.
2004; Zhang et al. 2004), AS microarray profiling studies revealed that functionally related
genes are also coordinately regulated by AS. These ‘AS networks’ or ‘exon networks’ have
functional properties reflecting tissue identity, but the groups of genes are often distinct from
those co-regulated at the transcriptional level (Le et al. 2004; Pan et al. 2004; Fagnani et al.
2007; Castle et al. 2008). In addition, functionally related genes are often co-regulated by tissue-
restricted AS factors such as NOVA, nSR100, ESRP and CELF/MBNL (reviewed in Licatalosi
and Darnell 2010; Calarco et al. 2011). The coordination of gene expression through AS
networks extends previous models proposing that mRNPs represent “post-transcriptional
operons” in eukaryotes (Keene and Tenenbaum 2002).
1.3.9 Overview of large-scale alternative splicing detection methods used in this thesis
In my thesis work, I used both microarray- and RNA-Seq-based methods to quantify the relative
abundance of mRNA splice variants. An overview comparing and contrasting these approaches
is presented in Figure 1-3. In both cases, the experimental workflow begins with isolation of
polyadenylated (polyA+) RNA from cells or tissues which is then reverse-transcribed to cDNA
(Figure 1-3A). Fluor-labeled single-stranded cDNA is generated for hybridization to AS
microarrays (Hughes et al. 2006), whereas fragmented, double-stranded cDNA flanked by
adapters is generated for RNA-Seq following the Illumina mRNA-Seq protocol. In parallel to
these steps, a database of cassette-type AS events is generated, by identifying cassette-type AS
events in cDNA and EST sequences that have been aligned to the genome (Figure 1-3B)
(performed by Sandy Pan) (Pan et al. 2004; Pan et al. 2005). This AS database is used to design
oligonucleotide probes for the AS microarray, or as a set of exon-exon junction sequences onto
which RNA-Seq reads are bioinformatically aligned (Figure 1-3A). The % alternative exon
inclusion measurements (‘% inclusion’, i.e. the percentage of transcripts in which the alternative
-
21
exon is included) calculated using the AS microarray platform or the RNA-seq method are then
quality-filtered using simple criteria. The resulting AS predictions correlate well with
measurements made by independent methods such as RT-PCR (Chapter 2, Chapter 5).
Figure 1-3. Outline of microarray and RNA-Seq AS profiling methods used in this work.
(A) Left: For AS microarray profiling, fluor-labeled cDNAs are hybridized to the AS microarray.
The GenASAP algorithm is then used to estimate the % exon inclusion levels and confidence
ranks from the signal intensities of the scanned microarray images.
Right: For RNA-Seq AS profiling, 50-nt high-throughput short read sequencing is performed on
cDNA libraries using the Illumina Genome Analyzer II. The % exon inclusion levels are
calculated by counting the number of sequence reads that align to the included or skipped
junctions in the AS database.
(B) Construction of a database of cassette-type AS events mined from ESTs/cDNAs. These AS
events are used to design exon and exon-exon junction microarray probes or to align RNA-Seq
reads to exon-exon junction sequences.
-
22
1.3.9.1 Alternative splicing microarray profiling
The AS microarray platform developed by the Blencowe and Frey labs contains sets of six
probes for ~3000 AS events (three exon probes: C1, A, C2 and three junction probes C1-A, A-
C2, C1-C2) (Figure 1-3A) (Pan et al. 2004). Ideally, both splice variants should hybridize to the
C1 and C2 exon probes, whereas the included variant should hybridize specifically to the C1-A,
A, and A-C2 probes, and the skipped variant should hybridize specifically to the C1-C2 junction
probe. Although the probes are designed for optimal specificity, in practice the probe signals do
not correspond to this ‘ideal hybridization profile’, especially as a result of cross-hybridization of
the splice variants to the junction probes. In addition, accurate prediction of relative splice
variant levels for some AS events is complicated by outlier probes, whose signals are not
consistent with the other five probes for the AS event, as well as by other sources of noise.
Therefore, a Bayesian learning algorithm called the Generative model for the Alternative
Splicing Array Platform (GenASAP) is used to accurately predict the AS levels (% inclusion)
from the microarray data (Shai et al. 2006) (Figure 1-3A). GenASAP uses the microarray data to
model the hybridization of the included and skipped splice variants to the six probes. This
significantly improves the accuracy of the % inclusion predictions in comparison to using the
‘ideal’ hybridization profile described above. In addition, Ge