regulation of core splicing factors by alternative ... · splicing factors, as well as a role for...

161
Regulation of Core Splicing Factors by Alternative Splicing and Nonsense-mediated mRNA Decay by Arneet L. Saltzman A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Department of Molecular Genetics University of Toronto © Copyright by Arneet L. Saltzman 2011

Upload: others

Post on 19-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

  • Regulation of Core Splicing Factors by Alternative Splicing and Nonsense-mediated mRNA Decay

    by

    Arneet L. Saltzman

    A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy

    Department of Molecular Genetics

    University of Toronto

    © Copyright by Arneet L. Saltzman 2011

  • ii

    Regulation of Core Splicing Factors by Alternative Splicing and Nonsense-mediated mRNA Decay

    Arneet L. Saltzman

    Doctor of Philosophy

    Department of Molecular Genetics

    University of Toronto

    2011

    Abstract

    The majority of human genes are transcribed into a precursor messenger RNA (pre-mRNA) that

    is processed to produce multiple mRNA variants through alternative splicing. Although

    alternative splicing is known for its role in generating proteomic diversity, it can also regulate

    gene expression by introducing premature termination codons that target the spliced transcript

    for nonsense-mediated mRNA decay (AS-NMD). In order to understand the impact of AS-NMD

    on gene expression, I performed quantitative AS microarray profiling of NMD-inhibited human

    cells. Using this system, I address the prevalence, trans-acting factor requirements and the range

    of cellular functions regulated by AS-NMD. While this pathway had been implicated in

    homeostatic feedback regulation of genes encoding splicing-regulatory proteins, my results

    revealed highly conserved alternative exons regulated by AS-NMD in genes encoding basal or

    ‘core’ splicing factors. I further characterized one of these exons in the gene encoding SmB/B′,

    and demonstrated that SmB/B′ autoregulates its expression through AS-NMD. Furthermore, AS

    profiling revealed that knockdown of this core splicing factor affects the inclusion levels of

    additional alternative exons enriched in genes with functions in RNA processing and RNA

    binding. In summary, my results reveal a role for AS-NMD in regulating the expression of core

    splicing factors, as well as a role for the core spliceosomal machinery in coordinating a network

    of alternative exons in RNA processing factor genes.

  • iii

    Acknowledgments

    I am grateful to many people for their support during my graduate work. My supervisor and

    mentor Dr. Ben Blencowe has never wavered from the full support that he offered me right from

    the time I joined the lab as a naïve and “generally keen” student. He fostered my scientific

    development by providing me with the opportunity to do exciting research, present at

    conferences and publish my work. Ben’s guidance and encouragement have been essential for

    my progress as a graduate student and beyond. I would also like to thank my supervisory

    committee members, Dr. Howard Lipshitz, Dr. Tim Hughes and Dr. Quaid Morris, whose

    support and advice have helped me to develop as a scientist.

    For their direct contributions to the work in this thesis, I would like to thank Matthew Fagnani

    and my collaborators Dr. Yoon Ki Kim, Dr. Lynne Maquat, Dr. Ofer (“the data are the data”)

    Shai and Dr. Brendan Frey. For their indirect contributions, I would like to thank the people

    behind the UCSC genome browser and Galaxy, who have made the genome accessible to the

    masses. For financial support, I am grateful to NSERC and the Jennifer Dorrington Graduate

    Student Endowment Fund.

    I’ve had the privilege to work with many great people during my time in the Blencowe lab. I

    wish to sincerely thank all my current and former lab-mates, whose friendship, support, helpful

    advice, and love of fine beverages have been invaluable. I am particularly indebted to our Lab

    “Sages” Mr. Dave O’Hanlon and Dr. Susan McCracken. For supporting me along my path to

    graduate school, I also thank my past teachers and mentors, Mr. Flemming Kress, Dr. Shelagh

    Mirski, Ms. Kathy Sparks, Dr. Peter Davies and Dr. Igor Bendik.

    Of course words cannot express my gratitude to my family and my ‘life partner’ for their

    unflappable love and support.

  • iv

    Table of Contents

    Acknowledgments ........................................................................................................................ iii

    Table of Contents ......................................................................................................................... iv

    List of Tables .............................................................................................................................. viii

    List of Figures............................................................................................................................... ix

    List of Appendices........................................................................................................................ xi

    Abbreviations Used.................................................................................................................... xiii

    Chapter 1 ........................................................................................................................................1

    1 Introduction...............................................................................................................................2

    1.1 Coordination of the gene expression machinery..................................................................2

    1.1.1 Interdependence among transcription, mRNA processing and chromatin ..............2

    1.1.2 mRNA processing remodels the messenger RNP....................................................5

    1.2 Pre-mRNA splicing..............................................................................................................6

    1.2.1 Core and auxiliary splicing signals ..........................................................................6

    1.2.2 Spliceosome assembly .............................................................................................7

    1.2.3 Exon definition.........................................................................................................9

    1.2.4 Spliceosomal snRNPs and Sm proteins .................................................................10

    1.3 Regulation of alternative splicing ......................................................................................12

    1.3.1 Roles of alternative splicing...................................................................................13

    1.3.2 Mechanisms of alternative splicing regulation ......................................................13

    1.3.3 Families of alternative splicing regulatory factors.................................................14

    1.3.4 Regulation of splice site recognition......................................................................15

    1.3.4.1 SR and SR-related proteins .....................................................................15

    1.3.4.2 hnRNPs....................................................................................................16

    1.3.5 Regulation of splice site pairing and catalysis.......................................................17

    1.3.6 Roles of basal splicing factors in alternative splicing regulation ..........................17

    1.3.7 Breaking the ‘code’ of cis-acting alternative splicing regulatory sequences.........18

    1.3.8 Large-scale analysis of alternative splicing regulation..........................................19

    1.3.9 Overview of large-scale alternative splicing detection methods used in this

    thesis ......................................................................................................................20

    1.3.9.1 Alternative splicing microarray profiling................................................22

    1.3.9.2 AS profiling by high throughput RNA sequencing (RNA-Seq) .............22

    1.4 Nonsense-mediated mRNA decay (NMD) ........................................................................23

    1.4.1 Features targeting transcripts for NMD .................................................................23

    1.4.2 NMD trans-acting factors and mechanisms of decay............................................24

    1.4.3 Discriminating between normal and premature nonsense codons: integrating

    the EJC-dependent and faux-3′UTR models..........................................................26

    1.5 Feedback regulation of gene expression ............................................................................28

    1.5.1 Post-transcriptional autoregulation ........................................................................29

    1.5.1.1 Splicing regulatory factors ......................................................................29

    1.5.1.2 Ribosomal proteins, translation factors and other examples ...................31

    1.5.2 Roles of post-transcriptional autoregulation..........................................................32

  • v

    1.5.2.1 Developmentally-regulated AS programs ...............................................32

    1.5.2.2 Plant circadian oscillations ......................................................................32

    1.5.2.3 Coordinating gene expression .................................................................33

    1.5.3 Sequence and functional conservation...................................................................33

    1.6 Rationale and outline .........................................................................................................34

    Chapter 2 ......................................................................................................................................36

    2 Impact of nonsense-mediated mRNA decay (NMD) factors on alternative splicing

    (AS) ...........................................................................................................................................37

    2.1 Introduction........................................................................................................................37

    2.1.1 Prevalence of AS-NMD.........................................................................................37

    2.1.2 Differential requirements for UPF factors in NMD...............................................37

    2.1.3 Summary ................................................................................................................38

    2.2 Materials and Methods.......................................................................................................39

    2.2.1 Cell culture, siRNA and plasmid transfection .......................................................39

    2.2.2 RT-PCR assays and Western blotting....................................................................39

    2.2.3 Microarray design and hybridization .....................................................................40

    2.2.4 Microarray data analysis ........................................................................................40

    2.2.5 Annotation of PTC-introducing AS events............................................................40

    2.2.6 Categorization of conserved and species-specific alternative exons .....................41

    2.3 Results................................................................................................................................41

    2.3.1 Predicted PTC-containing splice variants represent minor isoforms across ten

    mouse tissues .........................................................................................................41

    2.3.2 Most predicted PTC-introducing AS events are not conserved between human

    and mouse ..............................................................................................................46

    2.3.3 Alternative splicing microarray profiling following knockdown of the

    essential NMD factor UPF1 in HeLa cells ............................................................48

    2.3.4 A subset of PTC-introducing AS events are regulated by NMD...........................48

    2.3.5 Effect of UPF1 knockdown on the expression of genes containing PTC-

    introducing AS events............................................................................................50

    2.3.6 Alternative splicing microarray profiling following individual knockdowns of

    NMD factors UPF1, UPF2 or UPF3X ...................................................................52

    2.3.7 Overlapping but distinct effects of UPF1, UPF2 and UPF3X knockdowns on

    PTC-introducing AS events ...................................................................................54

    2.4 Discussion..........................................................................................................................56

    2.4.1 Function versus ‘noise’ in PTC-introducing AS events ........................................56

    2.4.2 Alternative branches of the mammalian NMD pathway .......................................56

    Chapter 3 ......................................................................................................................................58

    3 Conserved AS-NMD in genes encoding core splicing factors .............................................59

    3.1 Introduction........................................................................................................................59

    3.1.1 Cellular functions regulated by AS-NMD .............................................................59

    3.1.2 Summary ................................................................................................................59

    3.2 Materials and Methods.......................................................................................................60

    3.2.1 RT-PCR and Western blotting ...............................................................................60

    3.2.2 Analysis of conservation of flanking intron sequence and conserved AS.............60

    3.2.3 Identification of AS events in spliceosomal and control gene sets........................60

  • vi

    3.2.4 Statistical Analysis.................................................................................................61

    3.3 Results................................................................................................................................61

    3.3.1 PTC-introducing AS events affected by UPF knockdowns are flanked by

    highly conserved sequences...................................................................................61

    3.3.2 Core spliceosomal proteins are new regulatory targets of AS-NMD ....................64

    3.3.3 Conserved AS in genes encoding spliceosomal factors enriched in PTC-

    introducing events..................................................................................................65

    3.3.4 Autoregulation of core splicing factors by AS-NMD............................................69

    3.4 Discussion..........................................................................................................................70

    3.4.1 AS-NMD and the regulation of core spliceosomal proteins..................................71

    Chapter 4 ......................................................................................................................................72

    4 Auto-regulation of the core splicing factor SmB/B′ via AS-NMD......................................73

    4.1 Introduction........................................................................................................................73

    4.1.1 AS-NMD of SNRPB, encoding SmB/B′ ................................................................73

    4.1.2 Summary ................................................................................................................73

    4.2 Materials and Methods.......................................................................................................74

    4.2.1 Cell culture, siRNA and plasmid transfection .......................................................74

    4.2.2 Estimation of mRNA half-lives .............................................................................74

    4.2.3 RNA and protein isolation, RT-PCR and Western blotting...................................74

    4.2.4 Plasmid Construction .............................................................................................75

    4.3 Results................................................................................................................................75

    4.3.1 Inclusion of a highly conserved premature termination codon (PTC)-

    introducing alternative exon in SNRPB pre-mRNA is affected by SmB/B′

    protein levels..........................................................................................................75

    4.3.2 Knockdown of the core snRNP protein SmD1 affects the inclusion of the

    conserved SNRPB alternative exon .......................................................................78

    4.3.3 Knockdown of SmB/B′ or SmD1 affects the levels of Sm-class snRNAs ............79

    4.3.4 Cis-acting elements regulating inclusion of the SNRPB alternative exon ............80

    4.3.5 Mutations that strengthen the 5′ss reduce the effects of SmB/B′ knockdown.......82

    4.4 Discussion..........................................................................................................................83

    4.4.1 Feedback and cross-regulation of splicing factors.................................................84

    Chapter 5 ......................................................................................................................................86

    5 Regulation of alternative splicing by the core spliceosomal machinery.............................87

    5.1 Introduction........................................................................................................................87

    5.1.1 Summary ................................................................................................................87

    5.2 Materials and Methods.......................................................................................................88

    5.2.1 Analysis of AS and transcript levels by RNA-Seq ................................................88

    5.2.2 Calculation of Splice Site Strength ........................................................................89

    5.2.3 Gene ontology (GO) analysis.................................................................................89

    5.2.4 Statistical Analysis.................................................................................................89

    5.3 Results................................................................................................................................90

    5.3.1 A widespread role for core splicing factors in promoting the inclusion of

    alternative exons ....................................................................................................90

    5.3.2 Characteristics of SmB/B′ knockdown-dependent alternative exons ....................95

  • vii

    5.3.3 Changes in transcript levels associated with SmB/B′ knockdown-dependent

    PTC-introducing alternative exons ........................................................................95

    5.3.4 SmB/B′ knockdown affects AS events in RNA-processing factor genes..............97

    5.4 Discussion..........................................................................................................................97

    5.4.1 Mechanisms of AS regulation by core splicing factors .........................................97

    5.4.2 Physiological roles of AS regulation by general splicing factors ..........................98

    Chapter 6 ....................................................................................................................................100

    6 Conclusions ............................................................................................................................101

    6.1 Future Directions .............................................................................................................102

    6.1.1 What features underlie the differential dependencies of NMD substrates on

    UPF2 and UPF3/UPF3X?....................................................................................102

    6.1.2 Mechanisms of core splicing factor-dependent AS regulation ............................102

    6.1.3 Origins of ultra- and highly-conserved nonsense exons ......................................103

    6.1.4 Networks of auto- and cross-regulation among RNA processing factors............104

    References...................................................................................................................................106

    Appendices..................................................................................................................................133

  • viii

    List of Tables

    Table 1-1. Post-transcriptional auto- and cross-regulation of proteins with roles in RNA

    biogenesis and metabolism. .............................................................................................. 30

    Table 3-1. Selected microarray PTC-introducing AS events in genes with functions related to

    RNA processing. ............................................................................................................... 65

    Table 3-2. Conserved, PTC-introducing AS events identified in transcripts from spliceosome-

    associated proteins. ........................................................................................................... 68

  • ix

    List of Figures

    Figure 1-1. Coordination of transcription and pre-mRNA processing machineries. ...................... 3

    Figure 1-2. Overview of core splicing signals and early stages of spliceosome assembly. ........... 7

    Figure 1-3. Outline of microarray and RNA-Seq AS profiling methods used in this work. ........ 21

    Figure 1-4. Alternative splicing of cassette-type exons can lead to introduction of a premature

    termination codon (PTC) in the included or skipped splice variant (AS-NMD).............. 24

    Figure 1-5. An integrated model for discrimination between premature and normal stop codons.

    ........................................................................................................................................... 28

    Figure 1-6. Simplified model for autoregulation of a splicing-regulatory factor through AS-

    NMD. ................................................................................................................................ 31

    Figure 2-1. Overview of Chapter 2. .............................................................................................. 38

    Figure 2-2. Alternative splicing microarray data reveal that predicted PTC-introducing splice

    variants represent minor forms across ten mouse tissues. ................................................ 43

    Figure 2-3. Representative RT-PCRs of PTC upon inclusion and PTC upon skipping AS events

    in ten mouse tissues. ......................................................................................................... 45

    Figure 2-4. Predicted PTC-introducing AS events are more often species-specific than conserved

    between human and mouse. .............................................................................................. 47

    Figure 2-5. Knockdown of the essential NMD factor UPF1 leads to an increase in a subset of

    PTC-containing splice variants. ........................................................................................ 49

    Figure 2-6. Changes in % exon inclusion and transcript levels upon UPF1 knockdown predicted

    by the AS microarray are confirmed by RT-PCR............................................................. 51

    Figure 2-7. Overlapping but distinct effects of UPF protein knockdowns on PTC-introducing AS

    events. ............................................................................................................................... 53

    Figure 2-8. Representative RT-PCR assays showing effects of UPF protein knockdowns on

    levels of PTC-introducing alternative exons..................................................................... 55

    Figure 3-1. Overview of Chapter 3. .............................................................................................. 59

    Figure 3-2. Conservation of intron sequences flanking PTC-introducing exons affected by UPF

    factor knockdowns. ........................................................................................................... 62

    Figure 3-3. PTC upon inclusion alternative exons that show UPF1- or UPF2-dependent changes

    in inclusion level are often flanked by highly conserved intronic sequences................... 63

    Figure 3-4. Conserved PTC-introducing AS events in genes encoding spliceosomal proteins.... 67

    Figure 3-5. SNRPB (also known as SmB/B’) or SMNDC1 (also known as SPF30) over-

    expression leads to increased levels of the respective PTC-containing (PTC+) alternative

    transcript. .......................................................................................................................... 70

    Figure 4-1. Overview of Chapter 4. .............................................................................................. 73

    Figure 4-2. The inclusion of a highly conserved PTC-introducing alternative exon in SNRPB is

    affected by SmB/B′ knockdown. ...................................................................................... 77

  • x

    Figure 4-3. The half-life of the endogenous SNRPB PTC-containing included splice variant (A)

    but not that of the exon-included variant from the SNRPB reporter ‘miniSmB’ (B) is

    increased upon treatment with cycloheximide (CHX) to inhibit NMD............................ 78

    Figure 4-4. Knockdown of SmD1 leads to more skipping of the SNRPB alternative exon in

    miniSmB (A), and knockdown of SmB/B′ (B) or SmD1 (C) affects snRNA levels. ....... 79

    Figure 4-5. Auxiliary cis-acting elements regulating inclusion of the SNRPB alternative exon in

    miniSmB are proximal to the splice sites. ........................................................................ 81

    Figure 4-6. Mutations that strengthen the 5′ss (splice site), but not mutations that strengthen the

    3′ss, reduce the effects of SmB/B′ knockdown on miniSmB AS. .................................... 83

    Figure 5-1. Overview of Chapter 5. .............................................................................................. 87

    Figure 5-2. Quantitative analysis of alternative splicing by RNA-Seq reveals that knockdown of

    SmB/B′ leads to increased skipping of alternative exons. ................................................ 91

    Figure 5-3. Changes in alternative exon inclusion levels measured by RNA-Seq are confirmed by

    RT-PCR assays. ................................................................................................................ 93

    Figure 5-4. Confirmation of the effects of SmB/B′ knockdown on alternative exon inclusion in

    two independent knockdowns with different siRNAs. ..................................................... 94

    Figure 5-5. Characteristics of alternative exons affected by knockdown of SmB/B′. .................. 96

  • xi

    List of Appendices

    Appendices to Chapter 2: Impact of nonsense-mediated mRNA decay (NMD) factors on

    alternative splicing (AS)

    Appendix 1. Reprint: Pan Q, Saltzman AL, Kim YK, Misquitta C, Shai O, Maquat LE, Frey BJ,

    Blencowe BJ. 2006. Quantitative microarray profiling provides evidence against

    widespread coupling of alternative splicing with nonsense-mediated mRNA decay to

    control gene expression. Genes Dev 20 (2): 153-158..................................................... 133

    Appendix 2. Reprint: Saltzman AL, Kim YK, Pan Q, Fagnani MM, Maquat LE, Blencowe BJ.

    2008. Regulation of multiple core spliceosomal proteins by alternative splicing-coupled

    nonsense-mediated mRNA decay. Mol Cell Biol 28 (13): 4320-4330. .......................... 133

    Appendix 3. Correlation of probe intensities (A) or % exon inclusion (B) between Cy3 and Cy5

    fluor reversals for six samples. ....................................................................................... 134

    Appendix 4. Correlation of % inclusion between pairs of AS events with duplicate probes on the

    AS microarray................................................................................................................. 135

    Appendix 5. Correlation between % exon skipping (A) or knockdown-dependent difference in %

    exon skipping (B) measurements by AS microarray or RT-PCR................................... 136

    Appendix 6. Microarray data for 1704 AS events that met our detection criteria...................... 137

    Appendix 7. Annotation for 1704 microarray-monitored AS events that met our detection

    criteria. ............................................................................................................................ 137

    Appendix 8. Significant overlaps in AS events with a consistent change in exon inclusion levels

    when comparing any two UPF KDs. .............................................................................. 138

    Appendix 9. Effects of each UPF factor knockdown on PTC-introducing AS events. .............. 139

    Appendix 10. Frequency of changes in exon inclusion level upon knockdown of UPF1, UPF2, or

    UPF3X for all detectable AS events (A) or for specific categories (B-D). .................... 140

    Appendices to Chapter 3: Conserved AS-NMD in genes encoding core splicing factors

    Appendix 11. Cumulative distribution function (CDF) plots of flanking intron sequence overlap

    with phastCons elements for the ‘No PTC’ group.......................................................... 141

    Appendix 12. Annotation for microarray-monitored PTC-introducing AS events with conserved

    flanking intron sequences. .............................................................................................. 142

    Appendix 13. Annotation of cassette AS events identified in spliceosome-associated genes.... 142

    Appendix 14. Annotation of cassette AS events identified in the control gene set. ................... 142

    Appendices to Chapter 4: Auto-regulation of the core splicing factor SmB/B′ via AS-NMD

    Appendix 15. Reprint: Saltzman AL, Pan Q, Blencowe BJ. 2011. Regulation of alternative

    splicing by the core spliceosomal machinery. Genes Dev 25 (4), 373-384.................... 142

    Appendix 16. Comparison of SmB/B′ and SmN amino acid sequences (A) and mRNA expression

    patterns across 84 tissue and cell types (B)..................................................................... 143

  • xii

    Appendix 17. Abrogation of NMD by treatment of HeLa cells with the translation inhibitor

    cycloheximide (CHX) leads to an increase in the steady-state level of the endogenous

    exon-included PTC-containing SNRPB variant (A), but not the exon-included variant

    from the SNRPB reporter ‘miniSmB’ (B). ...................................................................... 144

    Appendix 18. A deletion adjacent to the 5′ss that strengthens potential base-pairing to U1 snRNA

    abrogates SmB/B′ knockdown-dependent skipping. ...................................................... 145

    Appendix 19. Mutations that strengthen the 3′ss do not abrogate SmB/B′ knockdown-dependent

    skipping........................................................................................................................... 146

    Appendices to Chapter 5: Regulation of alternative splicing by the core spliceosomal

    machinery

    Appendix 20. Data and annotations for 5752 AS events monitored by RNA-Seq that passed our

    filtering criteria. .............................................................................................................. 147

    Appendix 21. Data and annotations for 8626 triplets of consecutive 'constitutive' exons

    monitored by RNA-Seq that passed our filtering criteria. .............................................. 147

    Appendix 22. Gene Ontology (GO) and Pathway Commons enrichment analysis for 235 genes

    containing AS events with a ≥30% change in % exon inclusion upon SmB/B'

    knockdown...................................................................................................................... 147

    Appendix 23. Exon inclusion levels and knockdown-dependent changes for all 27 assayed

    alternative exons agree well with RNA-Seq predictions. ............................................... 148

  • xiii

    Abbreviations Used

    AS alternative splicing

    cDNA complementary DNA

    CTD carboxy-terminal domain

    EJC exon junction complex

    EST expressed sequence tag

    GO gene ontology

    mRNP messenger ribonucleoprotein

    NMD nonsense-mediated mRNA decay

    NT/siNT non-targeting siRNA

    pol II RNA polymerase II

    PTC premature termination codon

    RNA ribonucleic acid

    RNA-Seq high throughput RNA sequencing

    RT-PCR reverse transcription-polymerase chain reaction

    snRNA small nuclear RNA

    snRNP small nuclear ribonucleoprotein

    siRNA short interfering RNA

    UTR untranslated region

  • 1

    Chapter 1

  • 2

    1 Introduction

    A major theme of my thesis research is how gene expression can be regulated by coordination

    between different steps, particularly alternative splicing (AS) and nonsense-mediated mRNA

    decay (NMD). I will therefore begin with an overview of the coordination among different steps

    in mammalian gene expression (Section 1.1). Next, I will discuss AS and its regulation, focusing

    on the relatively uncharacterized roles of the basal splicing machinery and on insights from high-

    throughput analysis methods (Sections 1.2 and 1.3). This is followed by an introduction to the

    NMD pathway, focusing on the recognition of premature stop codons and on AS-NMD (Section

    1.4). Finally, I will discuss how genes with diverse roles in RNA biogenesis and metabolism take

    advantage of their own cellular functions to autoregulate their expression (Section 1.5).

    1.1 Coordination of the gene expression machinery

    Almost all human protein-coding genes are transcribed into a precursor messenger RNA (pre-

    mRNA) that must be extensively processed before it is exported to the cytoplasm and recognized

    by the translation machinery. In the nucleus, pre-mRNA undergoes capping, splicing, cleavage

    and polyadenylation. These processes are integrated with transcription by RNA polymerase II

    (pol II) and also result in the association of proteins with the mRNA to form a messenger

    ribonucleoprotein (mRNP). Extensive crosstalk among these processes plays important roles in

    the fidelity, efficiency and regulation of gene expression (Figure 1-1) (reviewed in Maniatis and

    Reed 2002; Komili and Silver 2008; Pandit et al. 2008; Moore and Proudfoot 2009).

    1.1.1 Interdependence among transcription, mRNA processing and chromatin

    The carboxy-terminal domain (CTD) of the largest subunit of pol II plays a central role in the

    crosstalk between transcription and pre-mRNA processing (reviewed in Perales and Bentley

    2009; Munoz et al. 2010). The mammalian CTD contains 52 heptamer repeats with the

    consensus sequence YS2PTS5PS, and it is required for efficient mRNA processing (McCracken

    et al. 1997b). During the transcription cycle, changes in the phosphorylation pattern of the CTD

    serine residues allow the recruitment of pre-mRNA processing, elongation, and histone-

    modifying factors (reviewed in Buratowski 2009). Early in transcription, the CTD is

    phosphorylated on Ser-5 by TFIIH, and the Ser-5-phosphorylated CTD recruits and activates the

    mRNA capping enzymes (Cho et al. 1997; McCracken et al. 1997a; Cho et al. 1998; Ho and

  • 3

    Shuman 1999) (Figure 1-1A). The nuclear cap-binding complex (CBC, CBP80/20 heterodimer)

    recognizes the capped mRNA 5′-end and promotes efficient splicing, export and translation

    initiation (Section 1.1.2). As elongation proceeds, the CTD becomes highly phosphorylated on

    Ser-2 residues by the positive transcription elongation factor b (P-TEFb) and pol II enters into

    the productive elongation phase (Figure 1-1B) (Marshall et al. 1996) (reviewed in Bres et al.

    2008). The Ser-2-phosphorylated CTD recruits the cleavage and polyadenylation machinery,

    which then stimulates 3′end formation once the poly(A) site is transcribed (Figure 1-1D)

    (Licatalosi et al. 2002; Ahn et al. 2004; Meinhart and Cramer 2004; Ni et al. 2004; Rosonina and

    Blencowe 2004).

    Figure 1-1. Coordination of transcription and pre-mRNA processing machineries.

    See text for details.

    Crosstalk between transcription elongation and splicing is mediated by the CTD as well as by

    factors that associate with the nascent transcript and the chromatin template (Figure 1-1B)

    (reviewed in Perales and Bentley 2009). This ‘coupling’ is functionally important for the

    efficiency of splicing (Das et al. 2006; Hicks et al. 2006) and for regulating the differential use of

    splice sites through alternative splicing (AS) (Cramer et al. 1997; Auboeuf et al. 2002; Kadener

  • 4

    et al. 2002; Nogues et al. 2002; de la Mata et al. 2003; Pagani et al. 2003; Ip et al. 2011).

    Transcription can influence AS through both ‘recruitment’ and ‘kinetic’ coupling (reviewed in

    Munoz et al. 2010). In recruitment coupling, specific splicing regulatory proteins as well as

    factors with dual roles in transcription and splicing regulation (see below for examples) are

    recruited to the transcribing polymerase, often via association with the CTD. In kinetic coupling,

    changes in the pol II elongation rate affect splice site choice by influencing the timing of

    presentation of splicing signals in the pre-mRNA to the splicing machinery. The pol II

    elongation rate may be influenced by promoter identity and associated transcriptional activators

    or co-activators and by elongation factors associated with the CTD.

    Studies of the Ser/Arg-rich (SR) proteins, a class of sequence-specific RNA-binding factors that

    bind pre-mRNA to regulate AS (Section 1.3.4), illustrate the interdependence between

    transcription and AS regulation. The activities of several SR proteins are modulated in a

    promoter- and CTD-dependent manner (Cramer et al. 1999; de la Mata and Kornblihtt 2006). In

    addition, several factors involved in splicing, including the SR protein SRSF2 (also known as

    SC35), enhance transcription through the recruitment or stimulation of elongation factors such as

    the CTD kinase P-TEFb (Figure 1-1B) (Fong and Zhou 2001; Bres et al. 2005; Lin et al. 2008).

    Thus, transcription can affect splicing and, reciprocally, splicing can affect transcription.

    During the transcription cycle, Ser-2 or Ser-5-phosphorylated pol II and associated elongation

    factors recruit chromatin modifying complexes that establish or maintain characteristic patterns

    of histone modifications on active genes (reviewed in Buratowski 2009). In human cells, the 5′-

    ends of active genes are typically marked by histone H3 lysine 4 trimethylation (H3K4me3)

    (Bernstein et al. 2005). This chromatin mark is recognized by CHD1 (chromodomain helicase

    DNA binding protein 1), which can recruit the splicing machinery (U2 snRNP; see Section 1.2.2)

    to facilitate efficient pre-mRNA splicing (Sims et al. 2007). In addition, the histone modification

    H3K36me3 is enriched in gene regions encoding alternative exons regulated by the splicing

    factor PTB (polypyrimidine tract binding protein; see Section 1.3.3). These modified histone

    tails are recognized by MRG15 (MORF-related gene 15), which enhances the recruitment of

    PTB to the pre-mRNA (Luco et al. 2010). Thus, physical crosstalk between chromatin and the

    splicing machinery represents an additional layer of gene regulation (Figure 1-1C) (reviewed in

    Allemand et al. 2008; Luco et al. 2011).

  • 5

    1.1.2 mRNA processing remodels the messenger RNP

    Capping, splicing and polyadenylation in the nucleus result in the association of protein

    complexes with the mRNA, which in turn influence mRNA export, localization, translation and

    stability. The transcription and export (TREX) complex is recruited to the 5′ end of mRNAs in a

    cap- and splicing-dependent manner in human cells (Figure 1-1B) (Masuda et al. 2005; Cheng et

    al. 2006). The TREX subunit ALY (also known as REF, THOC4 or Yra1 in yeast) directly binds

    the mRNA as well as the CBP80 subunit of the cap-binding complex. ALY functions as an

    mRNA export adapter by transferring the mRNA to TAP (TIP-associated protein; also known as

    NXF1, nuclear RNA export factor 1, or Mex67 in yeast). Together with its partner p15, TAP

    interacts with the nuclear pore complex to mediate mRNA export (Hautbergue et al. 2008).

    Additonal RNA-binding proteins, including some SR proteins, can also act as TAP-dependent

    export adapters (Huang et al. 2003).

    Splicing results in the deposition of a multi-protein exon junction complex (EJC) approximately

    20 nt upstream of exon-exon junctions (Figure 1-1B) (reviewed in Le Hir and Andersen 2008).

    The four core factors of the EJC are eIF4A3, Y14, MAGOH (mago-nashi homologue), and

    MLN51 (metastatic lymph node gene 51; also known as Barentz, Btz, CASC3). These four

    proteins along with RNPS1 and UPF3 remain associated with the mRNA during export, until

    they are removed during the first round of translation (Dostie and Dreyfuss 2002; Lejeune et al.

    2002). Additional splicing-related proteins are peripherally associated with the EJC in the

    nucleus but do not remain bound during export. While it was initially believed that all splice

    junctions are marked by the EJC, recent evidence in fly cells suggests that EJC deposition may

    be a regulated process (Sauliere et al. 2010). The EJC factors have multiple roles in RNA

    metabolism, including in mRNA localization (Palacios et al. 2004) and translation (Wiegand et

    al. 2003; Nott et al. 2004). The EJC also communicates the positions of splice junctions to

    cytoplasmic factors involved in nonsense-mediated mRNA decay (NMD), a pathway that

    degrades mRNAs containing premature termination codons (PTC). Specifically, the presence of

    an EJC downstream of a PTC strongly stimulates mammalian NMD (see Section 1.4). In

    addition to these post-splicing roles, new findings in Drosophila show that the EJC functions in

    the splicing of exons flanked by long introns (Ashton-Beaucage et al. 2010; Roignant and

    Treisman 2010).

  • 6

    1.2 Pre-mRNA splicing

    Approximately 92% of human protein-coding genes are interrupted by introns, and on average

    each gene contains 8-9 introns (Fedorova and Fedorov 2005). The excision of introns from pre-

    mRNA, or splicing, is catalyzed by the spliceosome, a large ribonucleoprotein (RNP) complex

    comprising the U1, U2, U4/6 and U5 small nuclear (sn)RNPs and a few hundred protein factors

    (reviewed in Wahl et al. 2009). Both RNA and protein components of the spliceosome play

    important roles in recognition of the core splicing signals and in catalysis. This section outlines

    the recognition of the core splicing signals and subsequent assembly of the spliceosome on the

    pre-mRNA. I will also focus on the core snRNP Sm proteins, which will be relevant in the later

    chapters of my thesis.

    1.2.1 Core and auxiliary splicing signals

    The core splicing signals in the pre-mRNA are short motifs with considerable sequence

    flexibility. The 5′ (donor) and 3′ (acceptor) splice sites (ss) are located at the 5′ and 3′ boundaries

    of the intron, respectively, and the branch point is located upstream of the 3′ss. Consensus

    sequences for the mammalian core splicing signals are shown in Figure 1-2A. The splicing

    reaction involves two successive trans-esterifications. In the first step, the 2′ hydroxyl of the

    branch point adenosine attacks the phosphodiester bond at the 5′ss, generating a free 3′ hydroxyl

    on the 5′ exon and a branched intron-lariat-3′exon as intermediates. In the second step, the free 3′

    hydroxyl of the 5′ exon attacks the phosphodiester bond at the 3′ss, resulting in ligation of the

    exons and release of the intron lariat.

    The short, degenerate core splicing signals that mark the boundaries of introns do not contain

    sufficient information to accurately define the exons in human transcripts (Lim and Burge 2001).

    Introns often contain many ‘pseudoexons’ – intronic sequences flanked by ‘decoy’ consensus ss

    sequences that are not normally recognized by the splicing machinery. Thus additional cis-acting

    regulatory sequences are necessary to distinguish introns and exons (reviewed in Chasin 2007).

    These auxiliary sequences are known as exonic or intronic splicing enhancers when they

    promote splicing (ESE/ISEs), or as splicing silencers when they inhibit splicing (ESS/ISS).

    Splicing enhancers and silencers are usually short, degenerate sequence motifs (5-10 nt) and they

    play roles in the recognition of constitutive exons (Section 1.2.3) as well as in the regulation of

    inclusion of alternative exons (Section 1.3.2).

  • 7

    Figure 1-2. Overview of core splicing signals and early stages of spliceosome assembly.

    (A) Consensus sequences of the mammalian core splicing signals. PPT, polypyrimidine tract; ss,

    splice site.

    (B) Early stages of spliceosome assembly are shown. The U1, U2, and U4/6.U5 snRNPs contain

    the indicated snRNA(s) and associated proteins. Sequences of U1 and U2 snRNAs that base-pair

    with the 5′ss and branch site, respectively, are shown in white text. Ψ, pseudouridine; R, A/G; Y,

    U/C.

    1.2.2 Spliceosome assembly

    The consensus model of spliceosome assembly has been mostly characterized using in vitro

    approaches (reviewed in Matlin and Moore 2007; Wahl et al. 2009). Spliceosome assembly is a

    step-wise process involving recruitment of snRNPs and proteins to the pre-mRNA and dynamic

    rearrangements of RNA–RNA, RNA–protein and protein–protein interactions (Figure 1-2B). In

    the early (E) complex, also known in yeast as the ‘commitment complex’, the 5′ss is recognized

  • 8

    by U1 snRNP, the branch point is recognized by SF1 (Splicing Factor 1; also known in yeast as

    BBP, branchpoint binding protein), and the PPT and 3′ss are recognized by the subunits of the

    U2 accessory factor (U2AF) heterodimer (U2AF65 and U2AF35, respectively). Recognition of

    the 5′ss involves base-pairing between the 5′-end of U1 snRNA and the pre-mRNA, which is

    stabilized by proteins in the U1 snRNP (Zhang and Rosbash 1999).

    The U2 snRNP then replaces SF1 at the branchpoint, forming the A complex (also referred to as

    the pre-spliceosome) (Figure 1-2B). Formation of the A complex is ATP-dependent and involves

    base-pairing of U2 snRNA at the branch site region, which is stabilized by components of the U1

    and U2 snRNPs and by U2AF65 (Barabino et al. 1990; Valcarcel et al. 1996; Gozani et al.

    1998). A bulged duplex formed between the U2 snRNA and the branch site region specifies the

    protruding adenosine as the nucleophile for the first trans-esterification reaction of splicing

    (Query et al. 1994). The bulged adenosine is also recognized by the U2 snRNP protein p14

    (SF3B14) (MacMillan et al. 1994; Schellenberg et al. 2011). While the splice sites are

    recognized in E complex, the pairing of splice sites for catalysis occurs at, or subsequent to, A

    complex formation (Chiara and Reed 1995; Lim and Hertel 2004; Kotlajich et al. 2009).

    The U4/6.U5 tri-snRNP then joins the spliceosome, forming the B complex. This complex

    undergoes extensive remodeling to form the catalytically active spliceosome. The multiprotein

    PRP19/CDC5L complex (also known in yeast as the NineTeen Complex or NTC) and additional

    RNA helicases also associate with the spliceosome and function in spliceosome activation and

    splicing fidelity (reviewed in Valadkhan 2007; Hogg et al. 2010). The remodelling of RNA–

    RNA interactions during spliceosome activation includes disruption of U4–U6 snRNA base-

    pairing to allow base-pairing of U6 snRNA with intronic nucleotides at the 5′ss, release or

    destabilization of the U1 and U4 snRNPs, and rearrangement of interactions between U2 and U6

    snRNA and within U6 snRNA. Following the two trans-esterification reactions of splicing, the

    products are released and the components of the spliceosome are recycled.

    In contrast to the step-wise model of spliceosome assembly characterized in vitro, the isolation

    of a ‘penta-snRNP’ from yeast cells led to the hypothesis that the spliceosome may encounter the

    pre-mRNA in a pre-assembled form in vivo (Stevens et al. 2002). However, it has been suggested

    that the two models might be reconciled if the step-wise assembly characterized in vitro could be

    viewed instead as step-wise rearrangement and activation of the penta-snRNP (reviewed in Brow

  • 9

    2002; Nilsen 2002). Recent work also supports the relevance of the step-wise assembly model in

    vivo. Several groups used chromatin immunoprecipitation to monitor the co-transcriptional

    recruitment of snRNP components and other splicing factors to nascent transcripts of yeast

    intron-containing genes. These studies showed a sequential pattern of snRNP or splicing factor

    recruitment that was consistent with step-wise spliceosome assembly (Gornemann et al. 2005;

    Lacadie and Rosbash 2005; Tardiff and Rosbash 2006). In addition, live imaging of snRNP

    components tagged with fluorescent proteins revealed distinct interaction dynamics of individual

    snRNPs with pre-mRNA, in support of a step-wise recruitment model in human cells (Huranova

    et al. 2010).

    1.2.3 Exon definition

    The splicing reaction takes place between 5′ and 3′ splice sites paired across an intron. However,

    in metazoan genes, where introns are often longer than exons by an order of magnitude or more,

    it is likely that splicing is facilitated by a process termed ‘exon definition’ (Berget 1995). In the

    exon definition model, the factors bound to the splice sites on either side of internal exons

    initially interact and are stabilized across the exon (Figure 1-2B). Early evidence for exon

    definition included the finding that the presence and strength of a 5′ss downstream of an exon

    affects the recognition and splicing of the upstream intron (Nasim et al. 1990; Robberson et al.

    1990; Talerico and Berget 1990; Kuo et al. 1991). In addition, using a reporter containing an

    isolated exon flanked by splice sites, it was found that the 5′ss sequence and U1 snRNP

    promoted UV crosslinking of U2AF65 at the PPT/3′ss, thus providing further evidence for the

    importance of cross-exon interactions (Hoffman and Grabowski 1992). Key mediators of this

    cross-exon bridging activity include proteins in the SR family (Section 1.3.4). These proteins

    interact with exonic splicing enhancers (ESEs) and promote binding of U1 snRNP and U2AF to

    the pre-mRNA through direct interactions as well as through interactions with splicing co-

    activator proteins (reviewed in Blencowe 2000). The RS domains of SR proteins also promote or

    stabilize RNA–RNA contacts between the core splicing signals and the U-snRNAs (Shen and

    Green 2006). Computational analysis of splicing signals in human and mouse also support the

    exon definition model. Compensatory changes in the strength of 5′ and 3′ splice sites are

    observed across exons, but not across introns (Xiao et al. 2007). Furthermore, splice sites, ESEs,

    and ESSs coevolve to preserve the overall exon strength (Xiao et al. 2007).

  • 10

    Our understanding of exon definition complexes is incomplete, since the majority of in vitro

    spliceosome assembly assays have used reporter pre-mRNAs containing two exons separated by

    a single short intron. However, several recent studies have shed light on exon-defined

    complexes. Assembly of spliceosome complexes in vitro on a three-exon pre-mRNA reporter

    revealed that an exon-defined E complex can be chased into an exon-defined A complex in the

    presence of ATP (Sharma et al. 2008). In addition, proteomics analysis indicated that these exon-

    defined complexes were similar in composition to previously characterized intron-defined

    complexes (Sharma et al. 2008). The mechanism for conversion of cross-exon interactions into

    cross-intron interactions is also an area under active investigation. Recently, using an in vitro

    trans-splicing assay, it was shown that the U4/6.U5 tri-snRNP can associate with an exon-

    defined A complex, without requiring prior establishment of cross-intron interactions between

    U1 and U2 snRNP (Schneider et al. 2010). In addition, the establishment of cross-intron

    interactions upstream of an exon did not require disruption of the interactions formed across that

    exon (Schneider et al. 2010). In a related study, conversion of cross-intron to cross-exon

    interactions was investigated using pre-mRNA reporters with multiple introns. Following

    splicing of one intron, U1 snRNP previously engaged in cross-exon interactions on the 3′exon

    remains associated with the mRNA and promotes efficient splicing of the neighbouring intron

    (Crabb et al. 2010).

    1.2.4 Spliceosomal snRNPs and Sm proteins

    The snRNPs are major components of the spliceosome. Each snRNP contains a uridine-rich

    snRNA (U1, U2, U4, U5 or U6) and associated proteins, however U4 and U6 are base-paired in

    a U4/6 di-snRNP (Bringmann et al. 1984; Hashimoto and Steitz 1984) which is also found

    associated with U5 snRNP in a U4/6.U5 tri-snRNP complex (Konarska and Sharp 1987). The

    purification of snRNP components from mammalian cells was fortuitously accomplished using

    serum from a patient with the autoimmune disease systemic lupus erythematosus (SLE) (Lerner

    and Steitz 1979). This SLE serum was known to contain antibodies that react with a nuclear

    antigen present in many mammalian tissues (Tan and Kunkel 1966). The nuclear antigen was

    designated ‘Sm’, for ‘Smith’, in honour of Stephanie Smith, the SLE patient from whom the

    serum was isolated (Tan and Kunkel 1966) (reviewed in Reeves et al. 2003). Using the anti-Sm

    serum, RNPs containing the U-snRNAs and 7 small (12-35 kDa) proteins designated A-G were

  • 11

    immunoprecipitated from mammalian cell extracts (Lerner and Steitz 1979). A subset of these

    proteins that are common to the U1, U2, U4 and U5 snRNPs became known as the Sm proteins.

    The snRNPs contain both common and unique proteins. The seven common Sm proteins (B/B′

    (see Chapter 4), D1, D2, D3, E, F and G) are assembled onto the snRNAs by the SMN complex

    (survival of motor neuron) (reviewed in Neuenkirchen et al. 2008). Formation of this snRNP

    ‘core’ is essential for subsequent steps in the biogenesis of mature snRNP particles. The Sm

    proteins bind a conserved single-stranded ‘Sm site’ with consensus sequence PuA[U3-6]GPu,

    located between two stem-loops near the 3′ end of the ‘Sm-class’ snRNAs (U1, U2, U4 and U5)

    (Branlant et al. 1982; Liautard et al. 1982). Based on crystal structures of the B-D3 and D1-D2

    Sm protein dimers along with previous biochemical data, a model was proposed in which the Sm

    site RNA passes through the central cavity formed by a hetero-heptameric Sm protein ring

    (Kambach et al. 1999). This model was recently confirmed by two crystal structures of the U1

    snRNP assembled from recombinant components (Pomeranz Krummel et al. 2009) or generated

    by limited proteolysis of native snRNPs isolated from HeLa cells (Weber et al. 2010).

    The Sm proteins are essential for the assembly and stability of snRNPs. However, their role in

    the splicing process is not well characterized. In yeast, Sm proteins B, D1 and D3 contact the

    pre-mRNA near the 5′ss in the commitment/E complex (Zhang and Rosbash 1999). These three

    Sm proteins have extensions or ‘tails’ located C-terminal to their conserved Sm domains.

    Splicing assays in yeast strains harboring tail-truncated Sm proteins suggested that the tails of

    Sm B, D1 and D3 contribute to the stability of the U1 snRNA–pre-mRNA interaction, perhaps

    through basic arginine and lysine residues in the yeast Sm protein tails (Zhang et al. 2001). The

    mammalian C-terminal tails are also rich in positively charged residues. The D1 and D3 tails

    contain glycine-arginine (GR) repeats. In contrast, the SmB tail in mammals is quite divergent

    from that of yeast and contains a striking stretch of repeats of 3-4 prolines interspersed with

    glycine, methionine and arginine residues (e.g. GMPPPGMRPPPPGMR). These ‘PGM’ motifs

    in the tail interact with the WW domain of FBP21 (formin-binding protein 21), a spliceosome-

    associated protein implicated in cross-intron bridging interactions (Bedford et al. 1998).

    However, the function of this interaction in splicing has not been studied. In addition, while the

    U1 snRNP crystal structures mentioned above provided insight into recognition of the snRNA by

    the Sm ring, they were less informative regarding the function of the C-terminal Sm tails, since

    these repetitive regions were either omitted from recombinant proteins or found to be disordered

  • 12

    (Pomeranz Krummel et al. 2009; Weber et al. 2010). Overall, while the C-terminal tails of the

    Sm B/B′, D1 and D3 proteins play a role in nuclear localization of the snRNPs (Bordonne 2000;

    Girard et al. 2004), the roles of the mammalian tails in U1 snRNA–pre-mRNA interaction or

    other steps in splicing remain to be studied.

    Additional insights into the function of Sm proteins in splicing might be inferred by analogy to

    the functions these proteins in other RNA–protein complexes. In addition to the spliceosomal

    snRNPs, Sm proteins form a related but distinct heptamer on U7 snRNA. The U7 heptamer

    contains five Sm proteins (B/B′, D3, E, F, and G), along with two Sm-like (LSm) proteins

    LSm10 and LSm11, which replace Sm proteins D1 and D2, respectively. The U7 snRNP

    functions in histone 3′end processing (reviewed in Dominski and Marzluff 2007). A recent study

    found that the U7 snRNP components SmB, SmD3 and LSm10 UV-crosslinked to the histone

    mRNA (Yang et al. 2009). A model was proposed in which these proteins might function as a

    ‘molecular ruler’ to specify the histone mRNA cleavage site at a fixed distance upstream of an

    RNA sequence (the ‘histone downstream element’) that is recognized by base-pairing to U7

    snRNA (Yang et al. 2009). In this model, Sm proteins B and D3 function as part of the heptamer

    to mediate RNA–RNA interactions between the U7 snRNA and the histone mRNA. This

    function is reminiscent of the proposed role of the yeast Sm complex in U1 snRNA–pre-mRNA

    interaction discussed above (Zhang et al. 2001). However, it is not known if the Sm protein–

    RNA interaction occurs via the C-terminal tails, as suggested in the yeast model, or another

    region of the Sm proteins.

    1.3 Regulation of alternative splicing

    Alternative splicing (AS) is the process of differential splice site usage to generate multiple

    mRNA variants from a single pre-mRNA. Upon release of the draft human genome, it was

    estimated that at least 59% of genes undergo AS, based on aligning expressed sequence tags

    (ESTs) and cDNAs to coding genes on chromosome 22 (International Human Genome

    Sequencing Consortium 2001). A higher frequency of AS, affecting 74% of multi-exon genes,

    was then estimated based on data from tissue profiling on exon junction microarrays and

    EST/cDNA evidence (Johnson et al. 2003). More recently, the use of high-throughput RNA

    sequencing (RNA-Seq) has led to an estimate that transcripts from 95% of human multi-exon

    genes undergo AS (Pan et al. 2008; Wang et al. 2008). Alternative splicing affects transcript

  • 13

    diversity in several ways, including cassette-type exons, mutually exclusive exons, alternative 5′

    or 3′ss selection, alternative promoters, alternative polyadenylation, and intron retention. In my

    work, I will focus on cassette-type exons, which are either included or skipped in the spliced

    mRNA, and which represent the most common type of AS (Castle et al. 2008; Wang et al. 2008).

    Although AS is widespread, the functional importance of most splice variants remains to be

    investigated.

    1.3.1 Roles of alternative splicing

    Very soon after the discovery that genes are interrupted by introns, it was proposed that exons

    might be joined in different combinations to generate multiple polypeptides from a single gene

    (Gilbert 1978). This role of AS in expansion of the proteome has been particularly emphasized

    following the sequencing of the human genome (International Human Genome Sequencing

    Consortium 2001), which was found to encode fewer protein-coding genes than anticipated by

    many (reviewed in Aparicio 2000; Pennisi 2003). A primary outcome of AS is the expansion of

    transcriptome complexity. An important consequence is an increase in the diversity of the

    encoded proteome (reviewed in Maniatis and Tasic 2002; Nilsen and Graveley 2010). However,

    an additional outcome of transcriptome expansion by AS is an increase in post-transcriptional

    regulatory potential. For example, differences in the coding region, 5′UTR or 3′UTR between

    mRNA variants produced from the same pre-mRNA can affect translation (e.g. upstream ORFs),

    stability (e.g. microRNA binding sites, AU-rich elements, premature stop codons), and mRNA

    localization, and thus have important consequences for the regulation of gene expression

    (Majoros and Ohler 2007; Tan et al. 2007; Mayr and Bartel 2009; Resch et al. 2009; Bell et al.

    2010; Salomonis et al. 2010) (reviewed in Smith et al. 1989; Hughes 2006). The roles of AS in

    regulating gene expression will be discussed further in Section 1.5 below.

    1.3.2 Mechanisms of alternative splicing regulation

    Alternative splicing can be controlled in a developmental stage- and cell type-specific manner, as

    well as in response to signaling or environmental cues (reviewed in Chen and Manley 2009).

    This AS regulation is achieved through multiple levels of control. For example, transcription

    elongation rate, chromatin modification, EJC deposition (see Section 1.1) and pre-mRNA

    secondary structure (reviewed in Warf and Berglund 2010) can influence splice site choice.

    However, the best-characterized mechanism of AS regulation is through the recognition of short

  • 14

    cis-acting RNA sequence motifs (ESE/S, ISE/S) by splicing-regulatory proteins. Initial studies of

    AS regulation focused on the enhancement or repression of splice site recognition at the early

    stages of spliceosome assembly (Section 1.3.4). In contrast, some regulatory mechanisms affect

    splice site pairing, rather than recognition, or recruitment of the U4/U6.U5 tri-snRNP. These

    diverse mechanisms allow regulation of splice site choice at later stages of spliceosome assembly

    or even during splicing catalysis (Section 1.3.5).

    1.3.3 Families of alternative splicing regulatory factors

    The most extensively studied groups of splicing-regulatory factors are the SR (Ser/Arg-rich),

    SR-related and hnRNP (heterogeneous ribonucleoprotein) families, which I will discuss in the

    next section (1.3.4). Many of these proteins are widely expressed and thought to affect AS

    regulation in a concentration-dependent manner (Mayeda et al. 1993; Caceres et al. 1994;

    Hanamura et al. 1998) (reviewed in Chen and Manley 2009). However, some members of these

    families have tissue-restricted expression patterns. For example, our lab recently identified and

    characterized the first example of a nervous system-specific SR-related protein, nSR100 (also

    known as SRRM4, serine/arginine repetitive matrix 4) (Calarco et al. 2009). In addition, the

    hnRNP family member PTBP1 (polypyrimidine tract binding protein P1; also known as PTB,

    hnRNPI) is widely expressed, while two PTBP1 paralogues, PTBP2 (also known as nPTB,

    brPTB, neural/brain PTB) and ROD1 (regulator of differentiation 1) are expressed in specific

    cell types. Interestingly, regulation of the AS of the genes encoding these proteins plays a role in

    establishing their expression patterns (see Section 1.5) (Wollerton et al. 2004; Boutz et al. 2007b;

    Makeyev et al. 2007; Spellman et al. 2007).

    Several other AS factors with tissue-restricted expression have also been characterized. Members

    of the NOVA (neuron-oncological ventral antigen) and ELAV-like (embryonic lethal, abnormal

    vision-like; also known as paraneoplastic encephalomyelitis antigen Hu) families are expressed

    in neurons, FOX (Feminizing gene On X homolog) and CELF (CUG-binding protein and ETR3-

    like family, also known as Bruno-like) proteins are expressed in the brain, heart or muscle,

    (reviewed in Li et al. 2007) and ESRPs (epithelial splicing regulatory proteins) are expressed in

    epithelial cells (Warzecha et al. 2009). Like many of the SR proteins and hnRNPs, these factors

    bind short RNA motifs in a sequence-specific manner, through RNA recognition motifs (RRMs)

    or hnRNP-K homology (KH) domains (Cook et al. 2011).

  • 15

    1.3.4 Regulation of splice site recognition

    1.3.4.1 SR and SR-related proteins

    The SR proteins contain 1-2 N-terminal RNA recognition motifs (RRMs) and a C-terminal RS

    domain that is rich in alternating serine and arginine dipeptides (reviewed in Lin and Fu 2007;

    Long and Caceres 2009). The prototypical SR proteins function in both constitutive and

    alternative splicing. Based on in vitro splicing assays, these SR proteins appear to be functionally

    redundant in their ability to complement splicing-deficient HeLa S100 extract (Fu et al. 1992;

    Mayeda et al. 1992). However, additional studies indicate that SR proteins bind distinct RNA

    sequences and that they have non-redundant AS functions in vivo (reviewed in Long and Caceres

    2009). For example, depletion of the prototypical SR protein SRSF1 (also known as SF2, ASF)

    in C. elegans by RNAi results in late embryonic lethality (Longman et al. 2000). Similarly, loss

    of SRSF1 in chicken DT-40 cells or in mouse embryos is lethal (Wang et al. 1996; Xu et al.

    2005). Moreover, tissue-specific ablation of SRSF1 in the mouse heart resulted in misregulation

    of an SRSF1-dependent AS event in Ca2+

    /calmodulin-dependent kinase IIδ (CaMKIIδ) and a

    defect in postnatal heart remodelling (Xu et al. 2005). Thus, SR proteins have specific, non-

    redundant functions in the regulation of AS.

    In addition to the prototypical SR proteins, many other ‘SR-related’ proteins also function as

    regulators of splicing and AS. These proteins often contain RS and RRM domains, but in a

    different configuration than the classical SR proteins. Examples of such SR-related proteins

    include TRA2A and TRA2B, which are homologues of transformer-2, an AS regulator involved

    in Drosophila sex determination. Other SR-related proteins contain RS domains alone or in

    combination with other RNA-binding domains (reviewed in Blencowe et al. 1999).

    Though best known as positive regulators of AS, SR proteins can both promote and inhibit the

    inclusion of alternative exons (reviewed in Lin and Fu 2007; Long and Caceres 2009). SR

    proteins function in ESE-dependent splicing in several ways (reviewed in Blencowe 2000;

    Graveley 2000). SR proteins can bind specific ESE sequences and recruit the splicing machinery

    via interactions of their RS domains with snRNP components (e.g. U2AF35 and U170K)

    (Lavigueur et al. 1993; Wu and Maniatis 1993; Wang et al. 1995; Zuo and Maniatis 1996;

    Graveley et al. 2001). Alternatively, some SR-related proteins can function in ESE-dependent

    splicing by acting as splicing co-activators that bridge interactions between ESE-bound SR/SR-

  • 16

    related proteins and snRNPs (Blencowe et al. 1998; Eldridge et al. 1999; Blencowe et al. 2000).

    Binding of SR proteins can also enhance exon inclusion by antagonizing the activity of negative

    regulators bound at nearby silencer elements (Kan and Green 1999). Recent results also show

    that inclusion of an alternative exon can be repressed by strong interactions of SR proteins with

    the flanking constitutive exons (Han et al. 2011). In addition to roles in AS regulation, some SR

    and SR-related proteins function in transcription, 3′end formation, mRNA export and translation

    (reviewed in Blencowe et al. 1999; Long and Caceres 2009).

    1.3.4.2 hnRNPs

    The heterogeneous ribonucleoproteins (hnRNPs) are a diverse group of proteins functionally

    defined by their association with nascent hnRNA (pre-mRNA). The hnRNPs typically contain

    one to four RNA-binding domains (RRMs, quasiRRMs or KH domains), as well as other

    auxiliary domains such as RGG boxes (Arg-Gly-Gly) or Gly-rich domains (reviewed in

    Martinez-Contreras et al. 2007). Many of the hnRNPs that have been implicated in AS regulation

    can inhibit splice site recognition through binding to specific silencer sequences (Caputi et al.

    1999; Chen et al. 1999; Del Gatto-Konczak et al. 1999). Some hnRNPs such as hnRNPA1 may

    also cooperatively multimerize on the pre-mRNA to block the association of other factors at a

    distance (Zhu et al. 2001). The recognition of silencers by hnRNPs can thus block or compete

    with the recognition of either nearby or distal enhancer sequences by positive regulatory factors.

    Alternatively, hnRNPs may block or compete with the binding of snRNP-associated factors such

    as U2AF to the core splicing signals (Lin and Patton 1995; Singh et al. 1995). Some hnRNPs

    also stimulate intron definition through interactions between multiple proteins recognizing sites

    at the boundaries of long introns (Martinez-Contreras et al. 2006). In addition, when intronic

    hnRNP binding sites flank an alternative exon, interaction between the hnRNPs can lead to exon

    silencing by ‘looping out’ the alternative exon and bringing the splice sites of the flanking exons

    into close proximity (Chabot et al. 1997; Blanchette and Chabot 1999). However, at least in one

    case of such a looping mechanism, the binding of U1 snRNP to the 5′ss of the silenced exon was

    not inhibited (Chabot et al. 1997; Blanchette and Chabot 1999). Therefore, this mechanism may

    involve inhibition of splice site pairing rather than recognition, as described in the next section.

  • 17

    1.3.5 Regulation of splice site pairing and catalysis

    In addition to the regulation of splice site recognition at the earliest stages of spliceosome

    assembly, a number of recent studies have revealed that AS can be regulated at later stages,

    including the subsequent steps involved in the pairing of splice sites or the recruitment of the tri-

    snRNP (reviewed in House and Lynch 2008). Moreover, some trans-acting splicing factors can

    regulate AS at both early and late stages of spliceosome assembly. For example, the hnRNP

    PTBP1 can repress alternative exon inclusion by inhibiting early steps leading to exon definition

    (Izquierdo et al. 2005; Sharma et al. 2005). However, in another mechanism, PTB can act after

    exon definition, by binding in an intron and blocking the functional cross-intron pairing of U1

    and U2 snRNPs already associated with the splice sites (Sharma et al. 2008). Repression of

    alternative exon inclusion by hnRNP-L and hnRNP-E2 can also occur through a post–exon

    definition mechanism. In this case, the binding of the hnRNPs to an exon prevents the U1 and

    U2 snRNPs bound at its splice sites from forming productive cross-intron interactions with

    snRNPs at the flanking exons (House and Lynch 2006). Post exon definition mechanisms are

    also not limited to hnRNPs. The SR-related tumor suppressor RBM5 can repress exon inclusion

    by a dual mechanism involving both blocking the transition to intron definition of the snRNP-

    recognized splice sites flanking a repressed alternative exon, as well as facilitating the pairing of

    the splice sites of the flanking constitutive exons (Bonnal et al. 2008). Splice site choice can also

    be regulated during catalysis. In the Drosophila melanogaster sex determination gene Sex-lethal,

    the Sex-lethal protein causes skipping of an alternative exon in its own transcript through an

    interaction with the splicing factor SPF45 that blocks splicing at the second catalytic step

    (Lallena et al. 2002). Together, these studies reveal the diversity of splicing regulatory

    mechanisms.

    1.3.6 Roles of basal splicing factors in alternative splicing regulation

    Studies in yeast and metazoans have shown that the levels of some basal or ‘core’ components of

    the splicing machinery can affect splice site choice. Microarray profiling revealed transcript-

    specific splicing effects in yeast strains harboring mutations in or deletions of core splicing

    components (Clark et al. 2002; Pleiss et al. 2007; Kawashima et al. 2009). In addition, an RNAi

    screen in Drosophila cells identified transcript-specific effects on AS upon depletion of general

    spliceosome factors, including U2AF and components of U1, U2 and U4/U6 snRNPs (Park et al.

    2004). Studies in C. elegans and mammalian cells also suggested that the U2AF subunits and the

  • 18

    U2 snRNP component SAP155 can affect splice site choice (Massiello et al. 2006; Pacheco et al.

    2006; Hastings et al. 2007; Ma and Horvitz 2009). Two very recent studies implicate additional

    core splicing factors in AS regulation and identify associated target sequence features. The

    branchpoint recognition factor SF1 may regulate AS of some transcripts by binding to branch

    site-like sequences (Corioni et al. 2011). Also, transcriptome profiling in zebrafish embryos

    deficient in the U1 snRNP-specific protein U1C revealed altered splice site choice in targets with

    intronic U-rich sequences (Rosel et al. 2011). In a mouse model of spinal muscular atrophy

    (SMA), deficiency of the snRNP assembly factor SMN (Survival of Motor Neuron) resulted in

    tissue-specific perturbations in snRNP levels and splicing defects (Gabanella et al. 2007; Zhang

    et al. 2008; Baumer et al. 2009). Tiling microarray profiling analysis of fission yeast RNA also

    revealed transcript-specific splicing defects of a temperature-degron allele of SMN, and that

    some of the defects could be alleviated by strengthening the pyrimidine tract upstream of the

    branch-point (Campion et al. 2010). In addition to these studies, the work in my thesis will

    provide new evidence for the role of core splicing factors in AS regulation (Saltzman et al.

    2011).

    In summary, the features that underlie the differential sensitivity of introns or alternative exons

    to particular defects in the core splicing machinery are only beginning to be explored. Moreover,

    in contrast to the AS regulatory factors described above, the mechanisms of these effects are

    poorly understood. Some clues may be provided by analogy to the kinetic proofreading model of

    splicing fidelity in yeast. This model broadly predicts that any changes that alter the kinetics of

    transitions in the splicing pathway, including the availability or activity of core splicing factors,

    can alter splice site choice (Yu et al. 2008) (reviewed in Smith et al. 2008).

    1.3.7 Breaking the ‘code’ of cis-acting alternative splicing regulatory sequences

    A goal of the study of AS is to build predictive models for AS regulation, or a splicing regulatory

    ‘code’ (reviewed in Matlin et al. 2005; Blencowe 2006; Wang and Burge 2008). Deciphering the

    rules that control AS will be important for understanding gene expression on a genome-wide

    scale, and for the ability to predict how mutations affect this regulation. However, the nature of

    splicing regulation complicates the path from genomic sequence to AS predictions. For example,

    a particular cis-regulatory sequence can have opposite effects on AS regulation depending on its

    position within an intron or exon, even when the sequence is recognized by the same trans-acting

  • 19

    regulator (reviewed in Chen and Manley 2009). The activity of an AS regulator can also depend

    on local sequence context (Xiao et al. 2009; Motta-Mena et al. 2010) or on its post-translational

    modification state (Feng et al. 2008). Many regulated alternative exons and their flanking introns

    also have binding sites for multiple factors, suggesting they are controlled in a combinatorial

    manner. Nevertheless, significant advances have been made recently in identifying sequence

    features that predict tissue-regulated AS as well as regulation by specific trans-acting factors

    (Barash et al. 2010; Zhang et al. 2010). This progress has been accelerated by integrating

    information from multiple sources, especially sequence conservation across species, splicing

    regulatory motifs identified through bioinformatic and experimental screening approaches, RNA

    target binding data for AS regulators, RNA structural features, and splice variant profiling data

    from microarrays or high throughput RNA sequencing (RNA-Seq).

    1.3.8 Large-scale analysis of alternative splicing regulation

    Many insights into AS and its regulation have been made possible using high-throughput

    methods to study the transcriptome. Technologies used to detect and quantify the levels of splice

    variants in an mRNA sample include microarrays (tiling, exon, exon-junction and exon/exon-

    junction combinations) (Shoemaker et al. 2001; Johnson et al. 2003; Pan et al. 2004) (reviewed

    in Calarco et al. 2007; Hallegger et al. 2010), fibre-optic bead arrays (Yeakley et al. 2002), high-

    throughput RT-PCR (Klinck et al. 2008), and RNA-Seq (Cloonan et al. 2008; Mortazavi et al.

    2008; Pan et al. 2008; Sultan et al. 2008; Wang et al. 2008) (reviewed in Blencowe et al. 2009).

    These methods have been used to profile differences in the mammalian splice variant repertoire

    among tissues, individuals, developmental stages and cell culture models of developmental

    transitions, as well as in cancer versus normal tissues (reviewed in Calarco et al. 2007; Hartmann

    and Valcarcel 2009; Hallegger et al. 2010). High throughput methods have also been used to

    identify functional targets of specific AS regulators by profiling AS following knockdown or

    loss of a particular protein (Blanchette et al. 2005; Ule et al. 2005) (reviewed in Calarco et al.

    2007; Hallegger et al. 2010). Combining this profiling data with factor binding site preferences

    determined by methods such as SELEX (Tuerk and Gold 1990) or RNAcompete (Ray et al.

    2009) can then provide insights into the biological function of an AS regulator. Furthermore, to

    distinguish direct from indirect targets, methods such as UV Cross-linking and

    Immunoprecipitation coupled with high throughput sequencing (CLIP-Seq; also known as high

    throughput sequencing of RNA isolated by CLIP, HITS-CLIP) allow the isolation of RNA

  • 20

    targets directly bound by a protein of interest on a genome-wide scale (Ule et al. 2003) (reviewed

    in Witten and Ule 2011).

    In addition to cataloguing transcriptome complexity, the approaches mentioned above have

    revealed sequence features associated with AS regulation and allowed construction of ‘RNA

    splicing maps’ of the position-dependent effects of AS regulators (reviewed in Witten and Ule

    2011). More generally, while mRNA expression profiling microarrays showed that functionally

    related genes are often co-expressed in mammalian cells and tissues (Eisen et al. 1998; Su et al.

    2004; Zhang et al. 2004), AS microarray profiling studies revealed that functionally related

    genes are also coordinately regulated by AS. These ‘AS networks’ or ‘exon networks’ have

    functional properties reflecting tissue identity, but the groups of genes are often distinct from

    those co-regulated at the transcriptional level (Le et al. 2004; Pan et al. 2004; Fagnani et al.

    2007; Castle et al. 2008). In addition, functionally related genes are often co-regulated by tissue-

    restricted AS factors such as NOVA, nSR100, ESRP and CELF/MBNL (reviewed in Licatalosi

    and Darnell 2010; Calarco et al. 2011). The coordination of gene expression through AS

    networks extends previous models proposing that mRNPs represent “post-transcriptional

    operons” in eukaryotes (Keene and Tenenbaum 2002).

    1.3.9 Overview of large-scale alternative splicing detection methods used in this thesis

    In my thesis work, I used both microarray- and RNA-Seq-based methods to quantify the relative

    abundance of mRNA splice variants. An overview comparing and contrasting these approaches

    is presented in Figure 1-3. In both cases, the experimental workflow begins with isolation of

    polyadenylated (polyA+) RNA from cells or tissues which is then reverse-transcribed to cDNA

    (Figure 1-3A). Fluor-labeled single-stranded cDNA is generated for hybridization to AS

    microarrays (Hughes et al. 2006), whereas fragmented, double-stranded cDNA flanked by

    adapters is generated for RNA-Seq following the Illumina mRNA-Seq protocol. In parallel to

    these steps, a database of cassette-type AS events is generated, by identifying cassette-type AS

    events in cDNA and EST sequences that have been aligned to the genome (Figure 1-3B)

    (performed by Sandy Pan) (Pan et al. 2004; Pan et al. 2005). This AS database is used to design

    oligonucleotide probes for the AS microarray, or as a set of exon-exon junction sequences onto

    which RNA-Seq reads are bioinformatically aligned (Figure 1-3A). The % alternative exon

    inclusion measurements (‘% inclusion’, i.e. the percentage of transcripts in which the alternative

  • 21

    exon is included) calculated using the AS microarray platform or the RNA-seq method are then

    quality-filtered using simple criteria. The resulting AS predictions correlate well with

    measurements made by independent methods such as RT-PCR (Chapter 2, Chapter 5).

    Figure 1-3. Outline of microarray and RNA-Seq AS profiling methods used in this work.

    (A) Left: For AS microarray profiling, fluor-labeled cDNAs are hybridized to the AS microarray.

    The GenASAP algorithm is then used to estimate the % exon inclusion levels and confidence

    ranks from the signal intensities of the scanned microarray images.

    Right: For RNA-Seq AS profiling, 50-nt high-throughput short read sequencing is performed on

    cDNA libraries using the Illumina Genome Analyzer II. The % exon inclusion levels are

    calculated by counting the number of sequence reads that align to the included or skipped

    junctions in the AS database.

    (B) Construction of a database of cassette-type AS events mined from ESTs/cDNAs. These AS

    events are used to design exon and exon-exon junction microarray probes or to align RNA-Seq

    reads to exon-exon junction sequences.

  • 22

    1.3.9.1 Alternative splicing microarray profiling

    The AS microarray platform developed by the Blencowe and Frey labs contains sets of six

    probes for ~3000 AS events (three exon probes: C1, A, C2 and three junction probes C1-A, A-

    C2, C1-C2) (Figure 1-3A) (Pan et al. 2004). Ideally, both splice variants should hybridize to the

    C1 and C2 exon probes, whereas the included variant should hybridize specifically to the C1-A,

    A, and A-C2 probes, and the skipped variant should hybridize specifically to the C1-C2 junction

    probe. Although the probes are designed for optimal specificity, in practice the probe signals do

    not correspond to this ‘ideal hybridization profile’, especially as a result of cross-hybridization of

    the splice variants to the junction probes. In addition, accurate prediction of relative splice

    variant levels for some AS events is complicated by outlier probes, whose signals are not

    consistent with the other five probes for the AS event, as well as by other sources of noise.

    Therefore, a Bayesian learning algorithm called the Generative model for the Alternative

    Splicing Array Platform (GenASAP) is used to accurately predict the AS levels (% inclusion)

    from the microarray data (Shai et al. 2006) (Figure 1-3A). GenASAP uses the microarray data to

    model the hybridization of the included and skipped splice variants to the six probes. This

    significantly improves the accuracy of the % inclusion predictions in comparison to using the

    ‘ideal’ hybridization profile described above. In addition, Ge