transcription and the regulation of gene...

36
Transcription and the Regulation of Gene Expression Objectives: I. Describe Ribonucleic Acid (RNA) A. Describe the structure of RNA, note similarities and differences to DNA, and explain why it is complementary to a strand of DNA. B. Review the six classes of RNA molecules. 1. Describe the function(s) of each of these six classes. C. Given the sequence of bases of a gene’s informational DNA strand be able to write, using all the accepted conventions. 1. The sequence that would be found on the complementary DNA strand. 2. The sequence that would be found on the complementary RNA strand. II. Explain the overall process of transcription. A. Define the terms: 1. DNA Directed RNA Polymerase (RNA Polymerase; RNA Pol). 2. Promoter region 3. Terminator Region 4. Transcription Bubble B. Direction of DNA Reading versus the Direction of RNA Synthesis. III. Describe bacterial (E. coli) transcription in detail. A. Initiation 1. Describe the structure of E. coli DNA Directed RNA Polymerase (RNA Polymerase). 2. Describe the structure of the RNA polymerase Core Enzyme. 3. Describe the structure of the RNA polymerase Holoenzyme. 4. Describe the promoter region of an E. coli gene. a) Pribnow Box. b) Consensus Sequence. 5. Template Strand, Antisense Strand, or Minus (–) Strand. 6. Nontemplate Strand, Sense Strand, Coding Strand or Plus (+) Strand. B. Elongation 1. Nus A C. Termination 1. Intrinsic Termination 2. Rho Dependent a) Rho Protein IV. Describe eukaryotic transcription in detail. A. Initiation 1. Describe the structure of the eukaryotic DNA Directed RNA Polymerases (RNA Polymerases). 2. Describe the structure of eukaryotic RNA Polymerase II (RNA Pol B). a) Describe the structure and function of the Carboxyl-Terminal Domain (CTD). 3. Describe the role of Transcription Factors. 4. Describe the eukaryotic promoter region. a) Core promoter or Basal promoter. (1) TATA Box. ©Kevin R. Siebenlist, 2017 1

Upload: dangkhuong

Post on 27-Apr-2018

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Transcription and the Regulation of Gene Expressionacademic.mu.edu/bisc/siebenlistk/3213transcription.pdf · Transcription and the Regulation of Gene Expression Objectives: I. Describe

Transcription and the Regulation of Gene Expression

Objectives: I. Describe Ribonucleic Acid (RNA)

A. Describe the structure of RNA, note similarities and differences to DNA, and explain why it is complementary to a strand of DNA.

B. Review the six classes of RNA molecules. 1. Describe the function(s) of each of these six classes.

C. Given the sequence of bases of a gene’s informational DNA strand be able to write, using all the accepted conventions. 1. The sequence that would be found on the complementary DNA strand. 2. The sequence that would be found on the complementary RNA strand.

II. Explain the overall process of transcription. A. Define the terms:

1. DNA Directed RNA Polymerase (RNA Polymerase; RNA Pol). 2. Promoter region 3. Terminator Region 4. Transcription Bubble

B. Direction of DNA Reading versus the Direction of RNA Synthesis. III. Describe bacterial (E. coli) transcription in detail.

A. Initiation 1. Describe the structure of E. coli DNA Directed RNA Polymerase (RNA Polymerase). 2. Describe the structure of the RNA polymerase Core Enzyme. 3. Describe the structure of the RNA polymerase Holoenzyme. 4. Describe the promoter region of an E. coli gene.

a) Pribnow Box. b) Consensus Sequence.

5. Template Strand, Antisense Strand, or Minus (–) Strand. 6. Nontemplate Strand, Sense Strand, Coding Strand or Plus (+) Strand.

B. Elongation 1. Nus A

C. Termination 1. Intrinsic Termination 2. Rho Dependent

a) Rho Protein IV. Describe eukaryotic transcription in detail.

A. Initiation 1. Describe the structure of the eukaryotic DNA Directed RNA Polymerases (RNA

Polymerases). 2. Describe the structure of eukaryotic RNA Polymerase II (RNA Pol B).

a) Describe the structure and function of the Carboxyl-Terminal Domain (CTD). 3. Describe the role of Transcription Factors. 4. Describe the eukaryotic promoter region.

a) Core promoter or Basal promoter. (1) TATA Box.

©Kevin R. Siebenlist, 20171

Page 2: Transcription and the Regulation of Gene Expressionacademic.mu.edu/bisc/siebenlistk/3213transcription.pdf · Transcription and the Regulation of Gene Expression Objectives: I. Describe

(2) Initiator element. 5. Regulatory elements.

a) Enhancers. b) Silencers. c) Response elements.

6. General (Core) Transcription Factors a) TATA-Binding Protein (TBP) b) TATA-Binding Protein Associated Factors (TBP-Associated Factors; TAF’s)

(1) Transcription Factor IID (TFIID) c) Transcription Factor IIA (TFIIA) d) Transcription Factor IIB (TFIIB) e) Transcription Factor IID (TFIID) f) Transcription Factor IIE (TFIIE) g) Transcription Factor IIF (TFIIF) h) Transcription Factor IIH (TFIIH)

7. Describe the sequence of events leading to the formation of the closed initiation complex. 8. Describe the sequence of events leading to the formation of the open initiation complex. 9. Describe the sequence of events leading to promoter escape.

B. Elongation. 1. Elongation Factors.

C. Termination 1. Describe the termination process in eukaryotes

V. RNA Processing. A. Describe the steps necessary to convert hnRNA to rRNA. B. Describe the steps necessary to convert hnRNA to tRNA. C. Describe the steps necessary to convert hnRNA to mRNA in eukaryotes.

1. Describe Alternative Splicing and the advantages it gives to eukaryotic cells. VI. Control of Gene Expression General Comments.

A. Describe Constitutive or Housekeeping Genes. B. Describe Regulated or Inducible Genes.

1. Describe Regulator Sequence. 2. Describe Repressor. 3. Describe Inducer.

C. Describe the four general mechanisms for controlling the expression of genes. VII. The E. coli Lac Operon; An Example of Control of Transcription in Prokaryotes.

A. Define the term operon. B. Describe the “structure” of the Lac operon. C. Describe the function of the proteins coded for by this operon. D. Describe how the Lac operon functions in the presence of glucose and absence of galactose E. Describe how the Lac operon functions in the absence of glucose and presence of galactose F. Describe how the Lac operon functions in the presence of both glucose and galactose

1. Describe Catabolite Repression 2. What is cAMP Receptor Protein (CRP)?

a) What is its function? VIII.Control of Transcription in Eukaryotes

©Kevin R. Siebenlist, 20172

Page 3: Transcription and the Regulation of Gene Expressionacademic.mu.edu/bisc/siebenlistk/3213transcription.pdf · Transcription and the Regulation of Gene Expression Objectives: I. Describe

A. Describe the control of transcription at the level of chromatin structure. 1. What is Heterochromatin? 2. What is Euchromatin?

a) How do their structures differ? 3. How does cytosine methylation and the probable formation of Z-DNA play a role? 4. What is chromatin remodeling? 5. Describe the action of the Histone acetyltransferases.

B. Describe the control of transcription at Initiation of Transcription. 1. Proteins involved:

a) Describe the Basal Transcription Factors. b) Describe the DNA Binding Transactivators c) Describe Coactivator Protein complexes.

2. DNA Sequences involved: a) Describe the basal promoter elements.

(1) TATA Box (2) CAAT Box (3) GC Box (?)

3. Describe Upstream Transcription Elements. (1) Enhancers (2) Silencers (3) Response elements

Background

Double stranded DNA is the archive copy of the cellular information. The information stored in DNA is copied or TRANSCRIBED into single stranded RNA when the information is needed by the cell. Single stranded RNA is the working copy of the information. Information transcribed into mRNA molecules is subsequently TRANSLATED into the primary structure of a protein. Information transcribed into rRNA and tRNA molecules is used as part of the protein synthesis apparatus.

A GENE is a sequence of bases on DNA that is transcribed into RNA. The beginning of a gene is marked by a specific sequence of bases, the PROMOTER REGION. The end of the gene, the TERMINATOR REGION, is marked by a specific sequence of bases or a specific set of structures.

5´3´

5´ 3´

3´5´

NucleotideEntryPort

©Kevin R. Siebenlist, 20173

Page 4: Transcription and the Regulation of Gene Expressionacademic.mu.edu/bisc/siebenlistk/3213transcription.pdf · Transcription and the Regulation of Gene Expression Objectives: I. Describe

During transcription, DNA-Directed RNA Polymerase or RNA Polymerase and accessory proteins bind to the Promoter Region of a gene within the DNA double helix. This binding causes a small region of 8 - 20 bases on the DNA molecule to denature/unwind. This region of unwound DNA is called the TRANSCRIPTION BUBBLE. Ribonucleoside triphosphates bind to the exposed bases of the DNA molecule by specific base pair interactions; U → A, A → T, G → C and C → G. RNA contains the bases U, A, G, and C. The enzymatic action of RNA Polymerase catalyzes the formation of the phosphodiester bonds between the ribonucleotides, “zipping up” the backbone of the polymer. The pyrophosphate (P2O7–4), released is immediately hydrolyzed into two phosphate ions (PO4–3) by Inorganic Pyrophosphatase driving the polymerization reaction to completion. As the transcription bubble moves down the DNA strand, the leading edge of the polymerase unwinds the DNA and the trailing edge closes the DNA double helix. During transcription there are only 8 - 20 bases exposed, unwound, at any given time. The transcription bubble moves down the DNA strand until it encounters the Termination Region. At this point the RNA is released, the RNA Polymerase falls off the DNA, and the transcription bubble closes.

Bacterial Transcription - E. coli

E. coli contains a single DNA-Directed RNA Polymerase. This enzyme is a hexamer composed of two α (alpha) subunits, one β (beta), one β´ (beta prime), one ω (omega) and one σ (sigma) subunit.

• The two α subunits are the scaffold to which the other subunits bind. • The β subunit contains the polymerase activity. • The β´ subunit is the clamp that attaches the enzyme to the single stranded DNA. • The function of the ω subunit is not precisely known, it may function as the clamp loader. • The σ subunit recognizes and binds to the promoter region of the gene. There are several different σ

subunits. σ70 (MW 70000) is the most common σ subunit. Other σ subunits are specific for different or unique promoter regions.

The α2ββ´ωσ complex is called the HOLOENZYME. The HOLOENZYME recognizes the promoter region and is necessary for the initiation of transcription. The enzyme complex composed of α2ββ´ω is called the CORE POLYMERASE. The CORE POLYMERASE catalyzes the synthesis of RNA once the promoter has been recognized and the transcription process has been initiated.

The PROMOTER REGION or PROMOTER SEQUENCE on bacterial, E. coli, DNA is two specific base sequences. Centered at about ten bases upstream from the start of the gene (+1) is an A/T rich region (TATAAT) called the PRIBNOW BOX. At about thirty-five bases upstream from the start of the gene is another specific sequence of bases - TTGACA. Forty to 60 bases upstream from the start of the gene is a third AT rich region called the UPSTREAM PROMOTER ELEMENT (UP). UP is not present in all E. coli promoters, it is found primarily in genes that are highly expressed. The σ70 subunit initially binds to this site.

The promoter sequence shown above is the CONSENSUS SEQUENCE for the E. coli promoter site. The

-35 -10

TTGACA TATAATNNAAAAATATTTTNNAAAANNNTT T N N17 N6

+1

©Kevin R. Siebenlist, 20174

Page 5: Transcription and the Regulation of Gene Expressionacademic.mu.edu/bisc/siebenlistk/3213transcription.pdf · Transcription and the Regulation of Gene Expression Objectives: I. Describe

CONSENSUS SEQUENCES were determined by comparing the promoter sequences from all of the E. coli genes and listing the bases most often found at a particular site within the promoter. These sequences define the best, the strongest promoter. A single base change or several changes in these regions result in weaker promoters. The σ70 subunit binds to this consensus sequence, to strong promoters, with high affinity. Base changes in the consensus sequence lower the affinity. The strength of a promoter determines how often a gene is transcribed. Strong promoters mark the beginning of genes that are transcribed often, genes with weaker promoters are transcribed less often.

RNA Polymerase Holoenzyme has significant nonspecific affinity for DNA. It binds to DNA and “walks” up or down the DNA molecule until the σ subunit recognizes a Promoter Sequence through the major groove. The σ subunit of the holoenzyme then binds tightly to the promoter sequence 35 base pairs upstream from the gene start forming a “closed complex”. The DNA is “wrapped around” the β´ and σ

TATAAT

TTGACA

5´ 3´

5´3´

TATAAT

TTGACA

5´ 3´

5´3´

β’

α α

β

ω

TATAAT

TTGACA5´ 3´

5´3´

β’

α α

β

ω

TTGACA5´ 3´ 5´3´TATAAT

β’

α α

β

ω

TTGACA

5´ 3´ 5´3´TATAAT

β’

α α

β

ω

TATAAT

TTGACA

5´ 3´ 5´3´

β’

α α

β

ω

©Kevin R. Siebenlist, 20175

Page 6: Transcription and the Regulation of Gene Expressionacademic.mu.edu/bisc/siebenlistk/3213transcription.pdf · Transcription and the Regulation of Gene Expression Objectives: I. Describe

subunits, the DNA begins to unwind at the Pribnow Box, and an “open complex” is formed. Between 8 and 20 base pairs of the DNA double helix are unwound. Once unwound, ribonucleotides bind to the single strand of DNA by specific base pairing and RNA Polymerase catalyzes the formation of the phosphodiester backbone. As the enzyme moves down the gene, it opens one base pair at a time in front of the transcription bubble and it closes one base pair behind it. There are always between 8 and 20 base pairs unwound.

Transcription in E. coli does not require a helicase enzyme, but it does require the topoisomerases. A topoisomerase eases the positive supercoils in front of the transcription bubble and a second topoisomerase relaxes the negative supercoils caused by closing the bubble.

There is a directionality to TRANSCRIPTION. The DNA is read from the 3´ → 5´ direction, the RNA is synthesized from the 5´ → 3´ direction.

The DNA strand used as template for RNA synthesis is called the TEMPLATE STRAND, the ANTISENSE STRAND, or the MINUS (–) STRAND. The opposite strand of DNA is the NONTEMPLATE STRAND, the SENSE STRAND, the CODING STRAND, or the PLUS (+) STRAND. The RNA molecule synthesized during transcription has a sequence complementary to the TEMPLATE STRAND and identical to the CODING STRAND with the exception that in the RNA U replaces T.

Once the RNA Polymerase is clear of the promoter region of the gene the σ subunit dissociates from the holoenzyme and the core enzyme continues the transcription process. After the σ subunit dissociates, a new and different protein binds to the polymerase. This new protein is called Nus A. Nus A is necessary for efficient elongation and it may play a role in the termination process.

Polymerization continues until the enzyme encounters a Termination Site. In bacteria there are two types of termination. One is called INTRINSIC TERMINATION and the other is RHO DEPENDENT TERMINATION.

TATAAT

TTGACA

5´ 3´ 5´3´

β’

α α

β

ω

β

TATAAT

TTGACA

5´ 3´ 5´3´

β’

α α

β

ω

©Kevin R. Siebenlist, 20176

Page 7: Transcription and the Regulation of Gene Expressionacademic.mu.edu/bisc/siebenlistk/3213transcription.pdf · Transcription and the Regulation of Gene Expression Objectives: I. Describe

The INTRINSIC TERMINATION site is not a specific sequence of bases, rather it is three structural features in the transcription product with base pairing possibilities that results in termination. The termination site contains:

1. Inverted repeats which are usually G/C rich. These repeats form a stable hairpin structure in the transcript 15 to 20 nucleotides before the end of the RNA strand.

2. A non repeating sequence of bases that punctuates the inverted repeats. 3. A run of six to eight A’s in the template strand coding for U’s in the RNA transcript.

As the RNA polymerase passes through the termination site, a G/C rich hairpin structure forms in the transcript. The hairpin causes RNA polymerase to pause over the sequence of A’s. The combination of the hair pin structure and the weaker interactions between the A/U base pairs in the DNA-RNA hybrid duplex results in the A/U base pairing between the transcript and the template being displaced and replaced by A/T base pairing between the two DNA strands. The result is the spontaneous dissociation of the transcript from the DNA, closing of the transcription bubble, and the release of the RNA polymerase core enzyme.

RHO DEPENDENT termination requires the Rho protein.

5´ 3´ 5´3´

β’

α α

β

ω

5´ 3´ 5´3´

β’

α α

β

ω

©Kevin R. Siebenlist, 20177

Page 8: Transcription and the Regulation of Gene Expressionacademic.mu.edu/bisc/siebenlistk/3213transcription.pdf · Transcription and the Regulation of Gene Expression Objectives: I. Describe

RNA transcription does not occur as rapidly as DNA replication. It is slower for two reasons: 1. the RNA polymerase is not as efficient an enzyme as the DNA polymerases, it incorporates bases

into the growing RNA strand at a rate of 50 to 90 nucleotides per second. 2. a helicase specific for the DNA duplex molecule is not involved in unwinding the molecule in front

of the transcription bubble.

The DNA helix is unwound by the RNA polymerase as the transcription bubble advances. The lack of an independent helicase causes transcription to occur in “Fits and Starts”. It occurs rapidly in A/T rich regions along the DNA because these regions are easy to unwind. G/C rich regions slow the process down since these regions on the DNA are more difficult to unwind. Areas that are G/C rich are called “Pause Sites” since transcription slows or pauses at these sites. RHO DEPENDENT termination requires the Rho protein and a “Pause Site”. The pause site for termination is a CA rich region called the Rho Utilization (rut) Element. Rho factor is a hexamer of six Rho protein subunits. Rho factor is an ATP dependent helicase specific for DNA-RNA hybrid duplexes and RNA-RNA duplex molecules. Rho factor binds to a G/C rich region near the 5´ end of the transcript and advances in the 5´→3´ direction until it reaches the transcription bubble. When the RNA polymerase reaches the Rho Utilization Element the polymerase pauses, Rho factor overtakes the polymerase, catalyzes the unwinding of the DNA-RNA hybrid duplex, and the release of the RNA transcript. Once the transcript has been released the RNA polymerase core enzyme “falls off” the DNA molecule and the transcription bubble closes.

Eukaryotic Transcription

Eukaryotic cells contain at least five different RNA polymerases.

1. RNA Polymerase I (RNA Polymerase A) specific for the transcription of the three large pieces of ribosomal RNA. It resides in the nucleolus.

2. RNA Polymerase II (RNA Polymerase B) is specific for messenger RNA. 3. RNA Polymerase III (RNA Polymerase C) is specific for transfer RNA, the smallest ribosomal RNA,

and some of the small RNAs.

There is also a mitochondrial RNA polymerase, a chloroplastic RNA polymerase, and a polymerase for inhibitory RNA molecules. All of these enzymes are large complex enzymes composed of 10 or more subunits of various types. Although the three best characterized polymerases (I, II, & III) differ in overall subunit composition, they do contain some subunits in common. In addition these three polymerases contain subunits with functions similar/identical to the subunits of the E. coli polymerase.

RNA Polymerase II (RNA Pol II) / RNA Polymerase B (RNA Pol B) has attracted the most interest since it catalyzes the transcription of a great many different genes. Yeast RNA Pol II consists of twelve different subunits designated RNA Polymerase B1 (RPB1) through RPB12.

The RPB1 subunit in eukaryotes contains a unique structural feature, the Carboxyl-Terminal Domain (CTD). The CTD of the RPB1 subunit contains from 20 to 50 repeats of the amino acid sequence Pro-Thr-Ser-Pro-Ser-Tyr-Ser. The side chains of five of the seven amino acids in the CTD repeated sequence contain hydroxyl groups that can be and are phosphorylated by several different protein kinases. The CTD is essential for RNA Pol II (RNA Pol B) function. RNA Pol II with a non-phosphorylated CTD can initiate

©Kevin R. Siebenlist, 20178

Page 9: Transcription and the Regulation of Gene Expressionacademic.mu.edu/bisc/siebenlistk/3213transcription.pdf · Transcription and the Regulation of Gene Expression Objectives: I. Describe

transcription. Once transcription has been initiated, the CTD must be phosphorylated to convert the initiation complex into an elongation complex. The CTD is dephosphorylated after termination occurs.

The existence of multiple different RNA polymerases acting on multiple different sets of genes (rRNA genes, mRNA genes, tRNA genes, etc.) implies that multiple different types of promoters exist in the eukaryotic genome. Different promoters are required in order to maintain specificity among the various RNA polymerases. Eukaryotic RNA polymerases do not recognize nor can they directly bind to their promoter sequences. Rather they recognize and bind to Transcription Factors (TF) that specifically bind to the promoter region.

Initiation of Transcription Using RNA Polymerase II

Four elements have been identified in the promoter region of genes transcribed by RNA Polymerase II.

1. The INITIATOR ELEMENT (Inr) 2. The CORE PROMOTER ELEMENTS 3. PROXIMAL PROMOTER ELEMENTS 4. DISTAL PROMOTER ELEMENTS

It should be noted here, and it will be noted elsewhere, that not all of the genes transcribed by RNA Polymerase II (mRNA genes) contain all of the promoter elements to be described. The promoter region of 10,000 eukaryotic genes have been examined and the presence or absence of various promoter elements have been cataloged. A majority of eukaryotic gene promoter regions have yet to be examined.

The INITIATOR ELEMENT (Inr) is located immediately before the transcription start site (+1) or it encompasses the site. Its consensus sequence is PyPyANT/APyPy , where Py is a pyrimidine and N is any nucleotide. The Inr has been identified in 50% of the 10,000 genes examined.

Three of the common CORE PROMOTER ELEMENTS are the TATA Box, the TRANSCRIPTION FACTOR IIB

Subunit Function E coli Homologue

RPB1 polymerase activity β subunit

RPB2 clamp β´ subunit

RPB3 scaffold, assembly α subunits

RPB4 helps to recognizes promoter σ subunit

RPB5, RPB6, RPB8, RPB10, & RPB12

common to all eukaryotic RNA Polymerases

RPB3, RPB4, & RPB7 unique to Pol II

RPB9, & RPB11 ????

©Kevin R. Siebenlist, 20179

Page 10: Transcription and the Regulation of Gene Expressionacademic.mu.edu/bisc/siebenlistk/3213transcription.pdf · Transcription and the Regulation of Gene Expression Objectives: I. Describe

©Kevin R. Siebenlist, 201710

TATA

Inr

GGGCGG

GGCCAATC

Dis

tal Pro

mot

er E

lem

ents

or E

nhance

rs &

Silen

cers

Dis

tal Pro

mot

er E

lem

ents

or E

nhance

rs &

Silen

cers

Res

pon

seEle

men

tRes

pon

seEle

men

t

Hig

hM

obility

Gro

up

Pro

tein

s

Hig

hM

obility

Gro

up

Pro

tein

s

Pro

xim

al

Initia

tor

Ele

men

ts(E

nhance

rs)

Pro

xim

al

Initia

tor

Ele

men

ts(E

nhance

rs)

Cor

e Pro

mot

er E

lem

ents

or

Basa

l Pro

mot

er E

lem

ents

(DNA

Seq

uen

ces)

+1BRE

DPE

-25

12.5

% o

f10

000

+30

15%

of

1000

0

-35

15%

of

1000

0

-70

-110

50%

of

1000

0

Page 11: Transcription and the Regulation of Gene Expressionacademic.mu.edu/bisc/siebenlistk/3213transcription.pdf · Transcription and the Regulation of Gene Expression Objectives: I. Describe

RECOGNITION ELEMENT (BRE), and the DOWNSTREAM PROMOTER ELEMENT (DPE). The TATA box was one of the first promoter elements identified in eukaryotic genes. Its consensus sequence is TATAAA, it is located at –25, and it has been identified in 12.5% of the genes examined. The consensus sequence of BRE is G/CG/CG/ACGCC, it is located at –35, and it has been located in 15% of eukaryotic genes. DPE is also found in 15% of the genes, it has a sequence of A/GGA/T C/T G/A/C , and it is located upstream at +30.

The PROXIMAL and DISTAL PROMOTER ELEMENTS are also called ENHANCERS or SILENCERS. Enhancer or Silencer elements are sequences on the DNA that are recognized by specific DNA binding proteins. These proteins classified as TRANSCRIPTION FACTORS. When an Enhancer (a DNA sequence) is bound by its Transcription Factor (a specific protein) transcription is activated above basal levels. Silencers are sequences on the DNA that bind proteins that inhibit transcription. Promoter Elements (Enhancers or Silencers) that occur within 200 bases of the CORE PROMOTER ELEMENTS are called PROXIMAL PROMOTER ELEMENTS. DISTAL PROMOTER ELEMENTS are Enhancers or Silencers located greater than 200 bases from the CORE PROMOTER ELEMENTS either upstream or downstream from the start site of the gene..

The BASAL APPARATUS, binds to the core promoter and initiates transcription at basal levels. The basal apparatus is RNA Polymerase II and the GENERAL (BASAL) TRANSCRIPTION FACTORS. There are eight general transcription factors.

1. TATA Binding Protein (TBP) 2. TATA-Binding Protein Associated Factors (TBP-Associated Factors; TAF’s) 3. TRANSCRIPTION FACTOR IIA (TFIIA) 4. TRANSCRIPTION FACTOR IIB (TFIIB) 5. TRANSCRIPTION FACTOR IID (TFIID) 6. TRANSCRIPTION FACTOR IIE (TFIIE) 7. TRANSCRIPTION FACTOR IIF (TFIIF) 8. TRANSCRIPTION FACTOR IIH (TFIIH)

TFIIH is a complex of several different protein subunits. The TFIIH complex contains a helicase activity that facilitates DNA unwinding at the promoter and it contains a Protein Kinase that phosphorylates the Carboxyl-Terminal Domain (CTD) of the RPB1 subunit of RNA Polymerase II.

The sequence of events for initiation of eukaryotic transcription at a TATA Box is as follows:

1. The first protein to bind to/at the TATA Box of the promoter is the TATA Binding Protein (TBP). TBP-Associated Factors are recruited to the promoter and bind to / interact with the TBP. If there is no TATA Box in the promoter region of the gene the first factor to bind is TRANSCRIPTION FACTOR IID (TFIID). TFIID is a large complex of proteins that includes TBP and the TBP-Associated Factors, as well as other subunits.

2. These binding interactions are stimulated / stabilized by TFIIA. 3. This initial complex, TBP & TFIIA, or TFIID recruits TFIIB forming a TBP•TFIIA•TFIIB or a

TFIID•TFIIB complex. 4. The dephosphorylated form of RNA Pol II is bound by TFIIF and the TFIIF-RNA Pol II complex

binds to the TFIIB of the complex already assembled. This interaction between the two complexes localizes / positions RNA Pol II at the Inr site (if present) or at the (+1) site.

©Kevin R. Siebenlist, 201711

Page 12: Transcription and the Regulation of Gene Expressionacademic.mu.edu/bisc/siebenlistk/3213transcription.pdf · Transcription and the Regulation of Gene Expression Objectives: I. Describe

777777777777777777777777777777777777777777

TF

IIA

InrTATAAA5´3´ 5´

TBPTBP

InrTATAAA5´3´ 5´

TF

IIA

TATAAA5´3´ 5´

3´TBPTBP

TF

IIA

TBPTBP

TFIIBTFIIB

TFIIBTFIIB

TAF’s

TFIIFTFIIF RNA Pol IIRNA Pol II

©Kevin R. Siebenlist, 201712

Page 13: Transcription and the Regulation of Gene Expressionacademic.mu.edu/bisc/siebenlistk/3213transcription.pdf · Transcription and the Regulation of Gene Expression Objectives: I. Describe

TF

IIA

InrTATAAA5´3´ 5´

3´TBPTBP

TFIIBTFIIB

TFIIFTFIIF RNA Pol IIRNA Pol II

TF

IIA

InrTATAAA5´3´ 5´

3´TBPTBP

TFIIBTFIIB

TFIIFTFIIF

TFIIE

TFIIE

RNA Pol IIRNA Pol II

TFIIETFIIHTFIIH

TFIIHTFIIH

ClosedComplex

©Kevin R. Siebenlist, 201713

Page 14: Transcription and the Regulation of Gene Expressionacademic.mu.edu/bisc/siebenlistk/3213transcription.pdf · Transcription and the Regulation of Gene Expression Objectives: I. Describe

TF

IIA

TATAAA

InrTBPTBP

TFIIBTFIIB

TFIIFTFIIF

TFIIE

RNA Pol IIRNA Pol II

TFIIETFIIHTFIIH

OpenComplex

5´3´ 5´

TF

IIA

TATAAATBPTBP

TFIIBTFIIB

TFIIFTFIIF RNA Pol IIRNA Pol II

5´3´

Phosphorylation of CTD of Pol II byTFIIH and/or Other Protein Kinase

Initiation Complete

Promoter Escape

P

P

P

P

P

P

P

P

P

P

P

P

P

P

©Kevin R. Siebenlist, 201714

Page 15: Transcription and the Regulation of Gene Expressionacademic.mu.edu/bisc/siebenlistk/3213transcription.pdf · Transcription and the Regulation of Gene Expression Objectives: I. Describe

5. TFIIE and TFIIH with its associated proteins then interact with TFIIA-TFIIB-TFIIF-RNA Pol II to form the PREINITIATION COMPLEX or CLOSED COMPLEX.

6. Melting of the DNA duplex (strand separation) around the Inr or around the (+1) occurs to form the OPEN COMPLEX. The helicase activity of TFIIH facilitates DNA unwinding at the promoter and formation of the open complex.

7. The Protein Kinase activity contained on TFIIH and/or some other protein kinase (e.g., the protein kinases activated by Insulin) phosphorylates the Carboxyl-Terminal Domain (CTD) of the RPB1 subunit of RNA Polymerase II. Phosphorylation occurs within the initial 60 to 70 bases transcribed. If phosphorylation does not occur, transcription stalls. This stalled initiation complex will remain bound to the DNA until the CTD is phosphorylated.

8. TFIIA and TFIIB remain bound to the promoter site as RNA Polymerase II moves down the DNA. 9. TFIIE and TFIIH dissociate from the complex, and RNA Polymerase II enters the elongation phase. 10. TFIIF remains associated with RNA Pol II throughout elongation.

Elongation is greatly enhanced by proteins called ELONGATION FACTORS. Some of the elongation factors that have been identified include ELL (Eleven-Nineteen Lysine-rich Leukemia), p-TEFb (this protein also can phosphorylate the CTD), SII (TFSII), and Elongin (TFSIII). These elongation factors suppress the pausing of the TFIIF-RNA Pol II complex during transcription. They also coordinate protein-protein interactions between the CTD of the TFIIF-RNA Pol II complex and the large protein complexes involved in co-transcriptional and/or post-transcriptional modification of the primary transcript.

DNA in eukaryotic cells is wrapped around histones forming nucleosomes. The nucleosomes may aid in forming the fold in the promoter region, in forming the open complex. For transcription to occur in eukaryotes the DNA must be unwound from the nucleosome and the helix must be separated. It appears that the core histone octamer separates into symmetrical halves during transcription. This separation unwinds the duplex DNA from the nucleosome without the complete dissociation of histone proteins from the DNA molecule.

TERMINATION in eukaryotes appears to require one or more protein cofactors similar to Rho and possibly the

TFIIE

TFIIHTFIIH

©Kevin R. Siebenlist, 201715

Page 16: Transcription and the Regulation of Gene Expressionacademic.mu.edu/bisc/siebenlistk/3213transcription.pdf · Transcription and the Regulation of Gene Expression Objectives: I. Describe

sequence AAUAAA near the 3´ end of the transcript. The number of proteins involved, whether additional signaling sequences for termination are required, and the precise mechanism of termination is not known.

RNA Processing

Only bacterial mRNA is functional as it leaves the transcription bubble all of the other RNA molecules are not fully functional when released from the transcription bubble. The PRIMARY TRANSCRIPTS (heteronuclear RNA or hnRNA) must be processed / modified before they adopt their mature structures and functions. In general three processing events occur:

1. Endonuclease and/or exonuclease removal of (extraneous) pieces of RNA from the primary RNA transcripts.

2. Addition of RNA sequences not encoded by the gene. 3. Covalent modifications of certain specific nucleotide bases.

These processing events convert the primary transcript into the final functional form of RNA. All RNAs, especially tRNA and rRNA, assume highly ordered secondary and tertiary structures during and/or after processing. These highly ordered final structural forms are more stable, i.e., they are more resistant to cellular endonucleases and exonucleases, prolonging their life time in the cell. The processing steps required to form the final product depends upon the cell type, bacterial or eukaryote, and upon the gene transcribed, tRNA gene, rRNA gene, or mRNA gene. The primary transcript of tRNA and rRNA require processing in all cell types. Only eukaryotic mRNA requires processing, bacterial mRNA is functional as it leaves the transcription bubble.

Processing of tRNA Primary Transcripts

The tRNA in bacteria is coded for by several operons. An operon is composed of several “structural” genes controlled by a single promoter sequence. These operons contain from 2 to 5 tRNA genes. The primary transcript for eukaryotic tRNA contains a single gene product, a single tRNA. In bacteria the primary transcript must be cut apart to release the individual tRNA molecules and in both cases the mature (final) 5´ end of the molecule must be produced. This cleavage reaction and trimming to the mature 5´ end is performed by an endonuclease, RNaseP. The combined action of the endonuclease RNaseP and exonuclease, RNaseD, form the 3´ end of tRNAs in bacteria and eukaryotes. All tRNA molecules contain the sequence -CCA at the 3´ end. In bacteria some of the tRNA genes code for the -CCA at the 3´ end and the combined actions of the RNaseP and RNaseD cleave the primary transcript back to the final -CCA sequence. The other bacterial tRNA genes and all of the eukaryotic tRNA genes do not contain the -CCA sequence as part of the primary transcript. In these cases the -CCA sequence is added to the 3´ end by the action of tRNA Nucleotidyltransferase. tRNA Nucleotidyltransferase is an unusual enzyme. It binds the 3 ribonucleotides (2 CTP & 1 ATP) in three separate active sites and then catalyzes the formation of the phosphodiester bonds sequentially. tRNAs contain many modified bases and once the molecule is cut to length the base modification reactions occur. Some of the modifications that occur are methylation (1-methylguanosine & ribothymidine), deamination (inosine), sulfation (4-thiouridine), or reduction (dihydrouridine). In the case of pseudouridine (Ψ), the uracil base is removed and reattached to the ribose through carbon 5 of the uracil ring.

©Kevin R. Siebenlist, 201716

Page 17: Transcription and the Regulation of Gene Expressionacademic.mu.edu/bisc/siebenlistk/3213transcription.pdf · Transcription and the Regulation of Gene Expression Objectives: I. Describe

Processing of rRNA Primary Transcripts

Ribosomal RNA molecules in bacteria are produced as a large primary transcript that requires processing. The processing reactions include (1) cleavage by both endonucleases and exonucleases and (2) nucleotide modification, primarily methylation.

In eukaryotes, the processing and assembly of rRNA into ribosomes occurs in the NUCLEOLUS of the nucleus. There are four pieces of rRNA in eukaryotic ribosomes. The smallest piece, the 5S piece, is transcribed independently as a single gene product. Its processing involves trimming the ends and methylation (modification) of specific bases. The remaining 3 pieces are transcribed from a single gene under the control of a single promoter. Processing of this large RNA molecule into its final forms occurs only after some of the ribosomal proteins have bound to the primary transcript. Before the individual rRNAs are cut out and trimmed to length, specific bases are methylated using SAM as the methyl donor. The individual rRNAs are removed from the primary transcript by the action of specific endonucleases then trimmed to length by specific exonucleases.

Processing of Eukaryotic mRNA

Bacterial mRNA requires no processing to be functional. Often the mRNA is being translated into protein as it is leaving the transcription bubble. Eukaryotic mRNA requires extensive processing. Both ends of the primary transcript of eukaryotic mRNA are modified. The 5´ end of the message is “capped” with a 7-methyl guanosine residue. A long string of A’s is added to the 3´ end. Modification of the ends increases the stability of the molecule. The primary transcript of eukaryotic mRNA is composed of regions that contain information necessary for the synthesis of the protein interspersed by regions that do not contain any useful information. These noncoding regions are called INTERVENING SEQUENCES or INTRONS. The regions

©Kevin R. Siebenlist, 201717

Page 18: Transcription and the Regulation of Gene Expressionacademic.mu.edu/bisc/siebenlistk/3213transcription.pdf · Transcription and the Regulation of Gene Expression Objectives: I. Describe

of the primary transcript that contain the information necessary for protein synthesis are called EXONS. During the processing events, the introns are removed, “spliced out” of the primary transcript and the exons joined to form a continuous mRNA molecule with a continuous coding sequence.

Capping

Capping of eukaryotic mRNA occurs early (< 30 bases incorporated) in the translation process. A 7-methyl guanosine CAP is attached to the 5´ end of the molecule. The enzymes necessary for capping assemble into a complex, the Cap Synthesizing Complex, that then associates with the phosphorylated CTD of Pol II.

P

P

P

P

P

P

P

P

P

P

P

P

P

P

1

2

3

CAPSynthesizing

Complex

O

OHO

CH2

PO

O

O

O

POPOPO

O O O

O O O

Ribonucleic Acid

Base

O

OHO

CH2

PO

O

O

O

POPO

O O

O O

Ribonucleic Acid

Base

O

OHO

CH2

PO

O

O

O

HN

N

N

O

H2N N

O

OH

H2C

OH

O P O

O

O

P P

OO

O O

Ribonucleic Acid

BaseO

O

OHO

CH2

PO

O

O

O

HN

N

N

O

H2N N

O

OH

H2C

OH

O P O

O

O

P P

OO

O O

Ribonucleic Acid

BaseO

CH3

PO4–3

GTP

SAM

adoHcy

(1) Phosphohydrolase

(2) Guanylyltransferase

(3) Guanine-7-methyltransferase

P

P

P

P

P

P

P

P

P

P

P

P

P

P

CBC

123

CBC

CAPBindingComplex

PO4–3

PO4–3

©Kevin R. Siebenlist, 201718

Page 19: Transcription and the Regulation of Gene Expressionacademic.mu.edu/bisc/siebenlistk/3213transcription.pdf · Transcription and the Regulation of Gene Expression Objectives: I. Describe

Capping starts when a phosphatase activity that is part of the Cap Synthesizing Complex removes the 5´ terminal (γ) phosphate from the nascent RNA to form a 5´ diphosphate at the 5´ end. In the second step a GTP is made to react with the newly formed terminal 5´ diphosphate. This GTP is attached to the 5´ end of the transcript in a “backwards” fashion. This reaction releases a phosphate from the 5´ end of the transcript and a phosphate from the GTP. The final product is a 5´→ 5´ triphosphate bond. This reaction is catalyzed by the Guanylyltransferase activity of the complex. In the third and final step of capping, a methyl group is transferred from SAM to N-7 of the guanine base by Guanine-7-methyltransferase activity forming 7-methyl guanosine. The Cap Synthesizing Complex then dissociates from the CTD of Pol II and is replaced by the Cap Binding Complex. This complex tethers the completed cap to the CTD of Pol II. The CAP increases the stability of the mRNA molecule, since it is not a substrate for cellular exonucleases. The CAP is also necessary for eukaryotic ribosomes to recognize and bind the mRNA.

Splicing of the mRNA

Eukaryotic mRNA primary transcripts contain anywhere from 1 to 20 Introns and the Introns vary in length from 65 to 20000 bases (see Factor XIIIA subunit and/or Factor XIIIB subunit gene {pdf}). Introns in mRNA all contain certain regular structural characteristics:

1. The 5´ end of the intron is marked with the sequence GU 2. The 3´ end of the intron is marked with the sequence AG 3. There is a “BRANCH POINT” that is part of the intron. The “Branch Point” is a sequence 10 to 40

bases upstream from the 3´ end of the intron. The consensus sequence is Py-N-Py-Pu-A-Py, where Nu is any nucleotide, Py is a pyrimidine, Pu is a purine and the A residue is invariant.

Small RNA (sRNA) are required for the splicing reaction. One of the several different small RNA’s (100 to 200 nucleotides long) are bound by a set of up to 10 different proteins to form specific complexes of sRNA and protein, to form ribonucleoprotein complexes. These complexes are called Small Nuclear Ribonucleoproteins (snRNPs), pronounced “snurps”. There are five different snRNPs - U1snRNP, U2snRNP, U4snRNP, U5snRNP, and U6snRNP. To remove the introns and rejoin the exons of the mRNA, mRNA splicing, requires the formation of the SPLICEOSOME. The SPLICEOSOME is composed of the five snRNPs along with several accessory protein molecules. This multisubunit SPLICEOSOME is assembled on the phosphorylated CTD of Pol II.

The mechanism of the splicing reaction is as follows:

1. The GU sequence at the 5´ end of the intron is recognized and bound by U1snRNP. The sRNA of U1snRNP contains a sequence complimentary to the exon / intron junction at the 5´ end of the intron.

2. The BRANCH POINT consensus sequence is bound by Branch Point Binding Protein (BBP). 3. The BBP is recognized by U2snRNP. U2snRNP binds to the Branch Point displacing BBP. The

sRNA of U2snRNP contains a sequence complimentary to the Branch point sequence except the

GU AG

Py-N-Py-Pu-A-Py

©Kevin R. Siebenlist, 201719

Page 20: Transcription and the Regulation of Gene Expressionacademic.mu.edu/bisc/siebenlistk/3213transcription.pdf · Transcription and the Regulation of Gene Expression Objectives: I. Describe

invariant A. When U2snRNP binds it causes the invariant A of the Branch Point to stick out. 4. A complex of U6snRNP-U4snRNP associated with U5snRNP displaces U1snRNP at the 5´ splice

site.

GU

CBC

P

P

P

P

P

P

P

P

P

P

P

P

P

P

CBC

P

P

P

P

P

P

P

P

P

P

P

P

P

PGU

A

AG

Branch Point-Binding Protein

CBC

P

P

P

P

P

P

P

P

P

P

P

P

P

PGU

A

AG

U2U2

U1U1

U1U1

U1U1

©Kevin R. Siebenlist, 201720

Page 21: Transcription and the Regulation of Gene Expressionacademic.mu.edu/bisc/siebenlistk/3213transcription.pdf · Transcription and the Regulation of Gene Expression Objectives: I. Describe

CBC

P

P

P

P

P

P

P

P

P

P

P

P

P

PGU

A

AG

U1U1

CBC

P

P

P

P

P

P

P

P

P

P

P

P

P

PGU

A

AG

U1U1

U2U2

U6U6U4U4

U5U5

CBC

P

P

P

P

P

P

P

P

P

P

P

P

P

PGUA

AG

U2U2

U6U6

U5U5

U6U6 U4U4

U5U5

U1U1

U4U4

©Kevin R. Siebenlist, 201721

Page 22: Transcription and the Regulation of Gene Expressionacademic.mu.edu/bisc/siebenlistk/3213transcription.pdf · Transcription and the Regulation of Gene Expression Objectives: I. Describe

GU A|

OH

AG

AG

A|U

G-O

HO

AG

A|U

G-O

©Kevin R. Siebenlist, 201722

Page 23: Transcription and the Regulation of Gene Expressionacademic.mu.edu/bisc/siebenlistk/3213transcription.pdf · Transcription and the Regulation of Gene Expression Objectives: I. Describe

5. The sRNA of U6snRNP base pairs with the sRNA of U2snRNP. This interaction between U6snRNP and U2snRNP displaces U4snRNP from the complex, and brings the invariant A in the Branch Point close to the G residue at the 5´ end of the intron.

6. The 2´ hydroxyl group of the invariant A in the Branch Point attacks the phosphodiester bond between the 3´ end of the “leading” exon and the G residue that marks the 5´ end of the intron splice site.

7. The 2´ hydroxyl group of the invariant A is attached to the 5´ end of the intron by a 2´ → 5´ phosphodiester bond

8. The newly created 3´ hydroxyl group on the exon attacks the phosphodiester bond between the G at the 3´ of the intron and the first base of the “trailing” exon.

9. As a result, the ends of the exon is joined and the intron, in the form of a lariat-shaped molecule, is released along with the U6snRNP/U2snRNP/U5snRNP complex.

Poly A Tail Addition

The 3´ end of eukaryotic DNA is also modified. The enzymes involved in synthesizing the POLY A TAIL on the 3´ end assemble into a multisubunit complex that then binds to the phosphorylated CTD of Pol II. This complex, the Cleavage and Polyadenylation Specificity Factor (CPSF), contains an Endonuclease activity, Polyadenylate Polymerase (Poly A Polymerase), and several additional factors. These additional factor aid in the recognition of a specific sequence within the 3´ untranslated end of the message, stimulate

CBC

P

P

P

P

P

P

P

P

P

P

P

AA

UA

AA

P

P

P

CBC

P

P

P

P

P

P

P

P

P

P

Cleavage & PolyadenylationSpecificity Factor (CPSF)

Endonuclease

Polyadenylate Polymerase(Poly A Polymerase)

Specificity & ActivityFactors

P

P

P

P

CBC

P

P

P

P

P

P

P

P

P

P

P

P

P

P

AAUAAA

AAAAAAAAAAAAAAA

AA

AAAAAAAAAAA

AA

AA

A

AAAAAAAA

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

AAAAAAAAAAAAAA

AAUAAA

©Kevin R. Siebenlist, 201723

Page 24: Transcription and the Regulation of Gene Expressionacademic.mu.edu/bisc/siebenlistk/3213transcription.pdf · Transcription and the Regulation of Gene Expression Objectives: I. Describe

cleavage, and regulate the length of the poly A tail.

Near the end of the primary transcript is the recognition sequence AAUAAA. This sequence is bound by CPSF. The Endonuclease then cleaves the transcript 10 to 30 bases downstream (toward the 3´ end) from the AAUAAA recognition sequence. After cleavage by the Endonuclease, Polyadenylate Polymerase adds 80 to 250 A residues to the 3´ end of the mRNA molecule. ATP serves as the source of the A residues. These 80 to 250 A residues are called the POLY A TAIL of the mRNA. The addition of the poly A tail is necessary for the binding of eukaryotic mRNA to ribosomes. It also increases the stability of the mRNA molecule. Cellular exonucleases must cleave through the poly A tail before the mature mRNA molecule is affected.

Alternative Splicing

The final mRNA product produced by a cell can be regulated/changed/modified by utilizing different transcription start sites and/or by selecting different splice sites and/or by using different polyadenylation sites. These processes can give rise to different forms of the final mRNA and therefore to different proteins, different isoenzymes, with out significantly increasing the genetic load of the organism. Alternative splicing pathways are found for the same protein in different tissues of the body or in the same tissues during different periods of development.

Some genes have two different TATA boxes which point to two different Initiator Elements (Inr). These promoters are active in different tissues, at different times during development, or they are under the control of different ENHANCERS and/or SILENCERS. If the two promoters are associated with alternative exons and

Exon A Exon B Exon CGT Intron 1 GT AGIntron 2TATA1 TATA2DNA

Exon A Exon B Exon CGU Intron 1 GU AGIntron 2TATA2

hnRNA if TATA1 Employed

Exon B Exon CGU AGIntron 2

hnRNA if TATA2 Employed

Exon A Exon C

Splicing

Exon B Exon C

Transcription

Splicing

Transcription

©Kevin R. Siebenlist, 201724

Page 25: Transcription and the Regulation of Gene Expressionacademic.mu.edu/bisc/siebenlistk/3213transcription.pdf · Transcription and the Regulation of Gene Expression Objectives: I. Describe

splice sites, alternative processing can occur. The promoter region of a gene of this type is depicted below.

When the first TATA Box is active, exons A, B and C are transcribed and the splicing reaction will join exon A to C, removing exon B and the second TATA Box. When the second TATA Box is active, exons B and C are transcribed and the splicing reaction joins B to C.

Multiple polyadenylation sites is a second mechanism for alternative splicing. The 3´ end of a gene of this type is depicted above.

If the first polyadenylation site is used (the one between exon Y and Z) CPSF binds to the AAUAAA sequence and the Endonuclease cleaves off the Z exon. If the second polyadenylation site is used, the intron between Y and Z is spliced out and the final gene product contains the information from both exon Y and Z.

The Troponin T gene is a good example of alternative splicing. This gene contains 18 exons, 11 of which are found in all mature mRNAs. Exons 1, 2, 3, 9, 10, 11, 12, 13, 14, 15, and 18 are always present and are constitutive exons. Five exons (4, 5, 6, 7, and 8) are combinatorial in that they can be included or excluded in any combination in the mature mRNA. Two exons (16 & 17) are mutually exclusive, one or the other is

Exon Y Exon ZGT AGAATAAA AATAAADNA

Transcription

Exon Y Exon ZGU AGAAUAAA AAUAAA

Exon Y Exon Z AAUAAA

Exon Y GU AAUAAA AAAAAAAAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAA

If 1st PolyadenylationSite Employed

If 2nd PolyadenylationSite Employed

1 2 3 4 5 6

7 8 9 10 11 12

13

15 14 16 17 18

©Kevin R. Siebenlist, 201725

Page 26: Transcription and the Regulation of Gene Expressionacademic.mu.edu/bisc/siebenlistk/3213transcription.pdf · Transcription and the Regulation of Gene Expression Objectives: I. Describe

always present, but never both. Sixty-four different mature mRNAs can be generated from the primary transcript of this gene.

Why Are There Introns?

There are two theories regarding the evolution of introns. They are the EARLY HYPOTHESIS OF INTRONS and the LATE HYPOTHESIS OF INTRONS.

The EARLY HYPOTHESIS is based upon the observation that introns often separate unique structural domains within a protein. It is postulated that within the genome of the primordial organism introns separated unique structural and functional domains. When this early organism required a new protein with a particular function to meet some changing need, the organism duplicated its genome and then by recombination events, it randomly shuffled the exons of its genome. Random portions of the “new” genome were transcribed and the exons joined with the aim of finding a new protein that would help the organism survive in the changing conditions. As evolution preceded and cells diverged bacteria opted for a compact genome and they lost or discarded their introns. Eukaryotes, on the other hand, retained theirs. With this theory, introns, or better yet genes with introns, became the stuff of evolution. New proteins can be developed, can evolve, by shuffling exons around within the genome. Exon shuffling is known to occur in eukaryotes. The production of antibodies within the human organism is an example. In the human genome there are about 500 exons that encode the antibody light chain and about 1000 exons for the heavy chain. These exons are arranged in a few hundred genes. By shuffling exons around during recombination events, transcribing the “new” genes, and then splicing the primary transcripts in different ways millions of antibodies with unique specificities can be produced.

In the LATE HYPOTHESIS introns arose late during the course of evolution, after the divergence of eukaryotes from bacteria. In this hypothesis, introns arose as the result of DNA insertions into protein encoding genes. These introns could have arisen from viruses that had partially inserted themselves into the eukaryotic genome or they could have arose from incomplete crossover events or they could have arose from gene duplication events. Either way introns are still the stuff of evolution.

An overall hypothesis regarding the presence and function of introns is a protective effect. By having introns between the coding sequences, the exons of a gene, the chances of damage, a mutation, in DNA causing a faulty nonfunctional protein are greatly reduced.

Control of Transcription

Within the cell there are a large number of genes that are always expressed, they are constantly being transcribed into RNA and the resulting RNA performs some function in the cell. Some of these genes contain the information for the synthesis of proteins that the cell always requires. The enzymes required for “mainstream” metabolism fall into this class of always needed proteins. Genes that are always expressed are called CONSTITUTIVE or HOUSEKEEPING GENES. The amount of protein synthesized from these genes is dependent upon the strength of their promoter. Genes with strong promoters result in high amounts of mRNA synthesized which when translated produces large amounts of protein. Genes with weak promoters produce lower levels of RNA and therefore lower levels of protein.

©Kevin R. Siebenlist, 201726

Page 27: Transcription and the Regulation of Gene Expressionacademic.mu.edu/bisc/siebenlistk/3213transcription.pdf · Transcription and the Regulation of Gene Expression Objectives: I. Describe

Other genes are transcribed only when needed. These genes are called REGULATED or INDUCIBLE GENES. The transcription of these genes is stimulated “turned on” or inhibited “turned off” by the binding of specific proteins at or near the promoter sequence. The proteins bind to a REGULATOR SEQUENCE on the gene. Some genes are not expressed (turned off) when the protein is bound. In this case the protein is called a REPRESSOR PROTEIN; it represses the transcription of a gene. REPRESSED GENES are transcribed when the repressor protein is released from the promoter or regulator region. Other regulated genes are positively regulated. They are turned on when an ACTIVATOR or INDUCER PROTEIN binds at or near the promoter region of the gene. Repressors and Inducers share some common features with allosteric proteins. Repressors are released from the promoter/regulator region of the gene when an INDUCER molecule binds to it. INDUCER binding decreases the affinity of the repressor for its regulatory site. Likewise, activators bind an INDUCER, this complex has an increased affinity for the promoter region. The complex binds to the promoter region of the gene increasing the rate of transcription.

©Kevin R. Siebenlist, 201727

Page 28: Transcription and the Regulation of Gene Expressionacademic.mu.edu/bisc/siebenlistk/3213transcription.pdf · Transcription and the Regulation of Gene Expression Objectives: I. Describe

There are four general mechanisms for controlling the expression of genes:

1. The gene is usually “turned off”, it is usually REPRESSED. It is turned on only when an activator protein binds to a specific region on the gene. Before the activator protein can bind to the gene a specific signal molecule, an INDUCER, must be bound by the activator. The activator-inducer complex binds at or near the promoter and directs the binding of RNA polymerase.

2. The gene is “turned on” because an activator molecule is bound to the promoter. Due to a change in cellular conditions, a specific signaling molecule is synthesized and binds to the activator protein. This activator-ligand complex dissociates from the promoter region turning the gene off.

3. The gene is always “turned off” because a repressor protein is bound to the gene inhibiting its transcription. Due to a change in cellular conditions, a specific signaling ligand, an inducer is synthesized and binds to the repressor. The binding causes a conformational change in the repressor protein and the repressor-inducer complex falls off of the gene to turn it on.

4. The gene is always “turned on”. When a signal molecule binds to a repressor protein, the repressor undergoes a conformational change and the repressor-ligand complex binds at the promoter region of the gene to turn the gene off.

The E. coli Lac Operon - A Classic Example of Repressor Controlled Gene Expression

E. coli preferentially uses glucose for energy metabolism as well as a carbon source for biosynthetic reactions. In the absence of glucose E. coli can and will use β-galactosides such as lactose. When glucose is present in the medium, the genes necessary for the metabolism of the β-galactosides are repressed, turned off, in E. coli, regardless of whether β-galactosides are present or not. The control of the genes for β-galactoside metabolism is a classic example of REPRESSOR CONTROLLED gene expression.

In E. coli three genes are necessary for the entry and metabolism of the β-galactosides. These three genes are under the control of a single promoter sequence. This arrangement of several structural genes under the control of a single promoter is unique to bacteria and is called an OPERON. The OPERON that controls the genes for β-galactoside metabolism is called the Lac Operon.

Within the Lac Operon there is a promoter sequence, PLac. The PLac promoter is a weak promoter. Following the PLac sequence there are the Lac Z, Lac Y, and Lac A genes. These three genes are transcribed into a single piece of RNA (ter is the termination signal for this operon). The Lac Z gene codes for β-

©Kevin R. Siebenlist, 201728

Page 29: Transcription and the Regulation of Gene Expressionacademic.mu.edu/bisc/siebenlistk/3213transcription.pdf · Transcription and the Regulation of Gene Expression Objectives: I. Describe

Galactosidase. β-Galactosidase catalyzes a two step reaction; it first moves the glycosidic bond in lactose from β1 → 4 to β1 → 6 forming allolactose, it then hydrolyzes allolactose to galactose and glucose. The Lac Y gene codes for Lactose Permease, the transport protein that carries the β-galactosides into the cell. Lac A codes for Thiogalactoside Transacetylase, an enzyme that acetylates β-galactosides that cannot be metabolized by the cell. These acetylated β-galactosides are transported from the cell as waste products.

Near the 3´ end of the Lac Operon there are three regions called OPERATOR REGIONS. These operator regions are designated as O1, O2, and O3.

Upstream from the Lac Promoter, PLac, there is a strong promoter (PI) and the Lac I gene. This strong promoter controls the transcription of the Lac I gene. The Lac I gene codes for a constitutive protein, the Lac Repressor Protein.

In the absence of β-galactosides, the Repressor Protein, which is a tetramer of the gene product, binds to the O1 and either O2 or O3 region of the Lac Operon inhibiting transcription. When the repressor protein binds the two operator regions the DNA between them is folded into a loop. This loop prevents RNA polymerase from transcribing the operon. RNA polymerase can bind to the promoter region but it can not transcribe the genes because the repressor protein is physically blocking its path.

Repression of the Lac Operon is not absolute. Binding of the repressor protein to the operator regions is an equilibrium event and for very brief periods of time the repressor protein spontaneously dissociates from the operator region. This allows the operon to be transcribed and translated a few times resulting in two to five copies of β-Galactosidase, Lactose Permease, and Thiogalactoside Transacetylase being present in the cell at all times. These few copies of the proteins are enough to signal the presence of galactosides in the external media.

When glucose is absent and β-galactosides are present, a small amount of the β-galactosides enter the cell by way of the few copies of the membrane bound Lactose Permease. Once inside the cell β-Galactosidase converts lactose (Gal β1 → 4 Glc) to allolactose (Gal β1 → 6 Glc) followed by the cleavage of allolactose into galactose and glucose. Allolactose (Inducer) binds to the Repressor Protein causing a conformational change in the repressor. The allolactose/repressor complex falls off of the operator regions of the gene and the operon is transcribed. In a matter of minutes the number of copies of the gene products increase a thousand fold. This increase in enzyme concentration allows the E. coli to use lactose and other β-galactosides as a carbon source in metabolism.

R

R R

R

R

R R

R

©Kevin R. Siebenlist, 201729

Page 30: Transcription and the Regulation of Gene Expressionacademic.mu.edu/bisc/siebenlistk/3213transcription.pdf · Transcription and the Regulation of Gene Expression Objectives: I. Describe

When both glucose and β-galactosides are present in the external medium, glucose is the preferred carbon source of E. coli and the Lac Operon is repressed. How? The Lac Operon is also controlled by an activator mechanism. In E. coli the concentration of glucose and cAMP are inversely related. When glucose concentrations are high, intracellular concentrations of cAMP in E. coli are low. An Adenylate cyclase senses the intracellular concentration of an intermediate of glucose catabolism and this glucose catabolite allosterically inhibits Adenylate cyclase. Hence, the current name for the regulatory process, CATABOLITE ACTIVATION. As glucose concentrations decrease the cellular concentrations of cAMP increase. When cAMP concentration is high it binds to a protein called cAMP Receptor Protein (CRP). cAMP Receptor Protein is also known as Catabolite Activator Protein (CAP). The cAMP-CRP complex binds to the promoter region of the Lac Operon and acts as a σ subunit for RNA polymerase. It aids RNA polymerase in recognizing the weak promoter of the Lac Operon and allows the polymerase to transcribe the operon effectively.

In summary, when glucose is high and lactose is low the Lac Operon is turned off because the repressor protein is bound to the operator region and because cAMP levels are low. When both glucose and lactose are high the Lac Operon is still repressed. The repressor protein has dissociated from the operator regions because a small amount of allolactose is present in the cell, but cAMP levels are low. With the low level of cAMP, CRP does not bind to the promoter and RNA polymerase does not effectively recognize the weak promoter of the Lac Operon. When glucose is low and lactose is high the Lac Operon is effectively transcribed. The repressor has dissociated because allolactose is high. With glucose low, cAMP is high. The cAMP binds to CRP and the cAMP-CRP complex binds to the promoter region of the operon. This

cAMP

CR

P

cAMP

CRP

R

R R

R

CR

P

©Kevin R. Siebenlist, 201730

Page 31: Transcription and the Regulation of Gene Expressionacademic.mu.edu/bisc/siebenlistk/3213transcription.pdf · Transcription and the Regulation of Gene Expression Objectives: I. Describe

complex, aids RNA polymerase in recognizing the weak promoter and now the genes of the operon are effectively transcribed.

Control of Transcription in Eukaryotes

Eukaryotes control gene expression at the level of transcription as well as at the level of translation. This discussion is limited to the control of gene expression at the level of transcription. At the level of transcription, gene expression is controlled by CHROMATIN REMODELING and at the INITIATION OF TRANSCRIPTION.

Chromatin Remodeling

The first level of transcriptional control in eukaryotes is at the level of the chromatin structure. Transcription is strongly repressed when the DNA is highly condensed within chromatin. This highly condensed chromatin is termed HETEROCHROMATIN. Transcriptionally active regions of the genome are less tightly packaged. Transcriptionally active regions are often deficient in Histone H1 and these regions are more “loosely wound”. These “loosely wound” regions are called EUCHROMATIN. Regions of EUCHROMATIN are often found within 1000 bp of the start of the gene and usually encompass the promoter region and the sites where various regulatory proteins bind.

Methylation of cytosine residues at position 5 of the ring in alternating C-G sequences is common in eukaryotic chromatin and greatly predisposes these regions of DNA to assume the Z form. Transcriptionally active regions of the genome tend to be under methylated or unmethylated at these alternating C-G sites. These regions are in the B form. This suggests that Z form DNA plays a role in regulating gene transcription. Transcriptionally inactive regions are highly methylated and very likely in the Z form, whereas transcriptionally active regions are under methylated and in the B form.

Transcription associated structural changes in chromatin, CHROMATIN REMODELING, involve a variety of enzymes. Histone H3 is methylated in transcriptionally active regions of the genome. The core histones in transcriptionally active regions are often acetylated by the action of Histone Acetyltransferases (HATs). Addition of acetate to numerous lysine side chains on the core histones, especially histones H3 and H4, reduces the affinity of the histones for DNA making transcription easier. Acetylation may also prevent or promote interactions with other proteins that regulate transcription. There are Histone Deacetylases that removes the acetate from the histones to turn genes “off” by allowing tight heterochromatin to reform.

Chromatin remodeling also requires the participation of protein complexes that actively displace nucleosomes, hydrolyzing ATP in the process.

Control at Initiation of Transcription

First some terminology: PROMOTER ELEMENTS are regions of non-coding DNA which regulate the transcription of nearby genes; older nomenclature CIS-REGULATORY ELEMENTS. TRANSCRIPTION FACTORS are proteins that bind to the promoter elements to bring about the initiation of transcription or the genes that code for these proteins; older nomenclature TRANS-REGULATORY ELEMENTS.

©Kevin R. Siebenlist, 201731

Page 32: Transcription and the Regulation of Gene Expressionacademic.mu.edu/bisc/siebenlistk/3213transcription.pdf · Transcription and the Regulation of Gene Expression Objectives: I. Describe

In eukaryotes, inducible genes are controlled by ACTIVATION. There are very few eukaryotic genes controlled by repression. Controlling genes by activation is a much more efficient system. In a repressor controlled system a unique repressor protein is required for every gene that needs to be controlled. The repressor protein must be coded for by another gene. Repressor controlled systems increase the amount of genetic information in the genome. In an activator controlled system the same activator can be used to control several genes. Different genes can be induced by binding various combinations of proteins to Proximal and/or Distal Promoter Elements, to Enhancer and/or Silencer elements of the gene. By “mixing” and “matching” the activator proteins, a few activator proteins can control hundreds of genes.

TATA

Inr

GGGCG

GGGCCAATC

Enhance

rs &

Silen

cers

Enhance

rs &

Silen

cers

Res

pon

seEle

men

tRes

pon

seEle

men

t

Hig

hM

obility

Gro

up

Pro

tein

s

Hig

hM

obility

Gro

up

Pro

tein

s

Pro

xim

al

Initia

tor

Ele

men

ts(E

nhance

rs)

Pro

xim

al

Initia

tor

Ele

men

ts(E

nhance

rs)

+1BRE

DPE

Sig

nal

Med

iato

rM

edia

tor

TATA

TB

PT

BP

Pol

II

TF

IIH

TF

IIE

TF

IIF

TFIIATF

IIB

TF

IIB

Basa

l Tra

nsc

ription

Fact

ors

{Pro

tein

s th

at

Bin

d t

oCor

e Pro

mot

er E

lem

ents

}

Dis

tal

Upst

ream

Initia

tor

Ele

men

ts

Dis

tal

Upst

ream

Initia

tor

Ele

men

ts

DNA

Bin

din

gTra

nsa

ctiv

ato

rsDNA

Bin

din

gTra

nsa

ctiv

ato

rs

Cor

e Pro

mot

er E

lem

ents

or

Basa

l Pro

mot

er E

lem

ents

(DNA

Seq

uen

ces)

©Kevin R. Siebenlist, 201732

Page 33: Transcription and the Regulation of Gene Expressionacademic.mu.edu/bisc/siebenlistk/3213transcription.pdf · Transcription and the Regulation of Gene Expression Objectives: I. Describe

Four classes of proteins are involved in the activation of transcription.

1. The Core or BASAL TRANSCRIPTION FACTORS such as TATA Binding Protein, Transcription Factor IIA, Transcription Factor IIB, Transcription Factor IIE, Transcription Factor IIF, and Transcription Factor IIH. These protein complexes bind at the TATA box to set the start point of transcription, they position RNA Pol II at the Initiator Element.

2. DNA BINDING TRANSACTIVATORS are the proteins that bind to the Proximal and/or Distal promoter elements, to the upstream or downstream Enhancer, Silencer, and/or RESPONSE ELEMENTS.

3. COACTIVATOR PROTEINS or PROTEIN COMPLEXES such as Mediator or CREB binding protein (CBP). These are large protein complexes (Mediator) or unique protein molecules (CBP) that interact strongly with DNA BINDING TRANSACTIVATORS once the Transactivators have bound to their specific DNA sequences. After the Coactivators bind to the DNA Binding Transactivators the entire large complex binds to the Basal Transcription Factors and the DNA near the promoter region is folded into a loop. This folding may help in opening up the DNA at the Initiator Element

4. HIGH MOBILITY GROUP PROTEINS (HMG PROTEINS) migrate rapidly in an electric field, bind non specifically to DNA and play an important role in chromatin remodeling and transcription activation. The HMG PROTEINS facilitate protein-protein interactions during the initiation process. They facilitate the association of Coactivators with the Transactivators bound to the upstream Enhancers/Silencers and the proteins bound to RESPONSE ELEMENTS. The protein initiation factors bound to the various “promoter sequences” all ultimately bind to Coactivators and the HMG Proteins facilitate this binding interaction. It appears that these interactions with Coactivators loops the DNA. The loop induces torsional stress in the molecule and it is easier to open the DNA up for the initiation of transcription.

Eukaryotic promoters are defined by DNA modules of short conserved sequences. When some of these short conserved sequences occur in close proximity, within 100 bp upstream from the Initiator Element (Inr) or the (+1) site they are often called the PROXIMAL PROMOTER ELEMENTS or PROXIMAL TRANSCRIPTION ELEMENTS.

1. The TATA box points to the Initiator Element (Inr). The TATA box functions in only the “proper”

Core and Proximal Activation Sequences

Sequence Module

Consensus Sequence

Location DNA Bound

Protein Factor

TATA Box TATAAAA -25 ~10 bp TBP

BRE G/CG/CG/ACGCC -35 ?? TFIIB

DPE A/GGA/T C/T G/A/C +30 ?? ??

CAAT Box GGCCAATCT -70 ~22 bp CTF/NF1

GC Box GGGCGG -110 ~20 bp SP1

©Kevin R. Siebenlist, 201733

Page 34: Transcription and the Regulation of Gene Expressionacademic.mu.edu/bisc/siebenlistk/3213transcription.pdf · Transcription and the Regulation of Gene Expression Objectives: I. Describe

orientation. 2. The presence of a CAAT box (GGCCAATCT), usually around -80 signifies a strong promoter. The

CAAT box functions in only the “proper” orientation. 3. One or more copies of the GC box (GGGCGG) is often found around the transcription start of

“Housekeeping Genes”.

In addition to the proximal transcription elements there are sets of various sequence modules that are more distally upstream or downstream from the Inr or the (+1) site of the gene. The proximal promoter and these distal activation sequences collectively define the promoter for a particular gene. These Distal Upstream Activation Sequences are the enhancers and silencers of the gene. Distal Activation Sequences differ from the proximal promoter elements in two ways. First, the location of these elements relative to the start site is not fixed. Second, they can function in either orientation, they can be removed and reinserted in reverse sequence order without any loss in activity. The GC box, Octamer sequence, κB sequence, and ATF sequence are considered Upstream Activation Sequences since they fit the above criteria.

A change in the internal or external environment can stimulate the transcription of a particular set of genes. Genes that respond to the same environmental change are usually under common regulation. Promoter modules in genes that are responsive to common regulation are termed RESPONSE ELEMENTS. Examples include the Heat Shock Element (HSE), the Glucocorticoid Response Element (GRE), the Metal Response

Distal Upstream or Downstream Activation Sequences

Sequence Module Consensus Sequence DNA Bound Protein Factor

GC Box GGGCGGG ~20 bp SP1

Octamer ATTTGCAT ~20 bp Oct-1

Octamer ATTTGCAT ~23 bp Oct-2

κB GGGACTTTCC ~10 bp NFκB

κB GGGACTTTCC ~10 bp H2-TF1

ATF GTGACGT ~10 bp ATF

©Kevin R. Siebenlist, 201734

Page 35: Transcription and the Regulation of Gene Expressionacademic.mu.edu/bisc/siebenlistk/3213transcription.pdf · Transcription and the Regulation of Gene Expression Objectives: I. Describe

Element (MRE), the Tumor Response Element (TRE), the Serum Response Element (SRE), Cyclic AMP Response Element (CRE), and/or Basal Level Elements (BLE).

The response to steroid hormones depends upon the presence of a Glucocorticoid Response Element about 250 bases upstream of the transcription start site. Steroid hormones pass through the cell membrane and once in the cell they bind to a STEROID RECEPTOR PROTEIN. The hormone/receptor complex migrates to the nucleus, binds to the GRE, and stimulates the transcription of a set of genes.

The Serum Response Element is the DNA sequence bound by the SRF/Phosphorylated-Elk-1 complex (remember insulin stimulation of transcription).

Cyclic-AMP-Response-Element Binding Protein (CREB), when phosphorylated by Protein Kinase A in the cytoplasm, migrates to the nucleus and binds to Cyclic AMP Response Element (CRE) and acts as a DNA binding Transactivator. CREB binding protein (CBP) binds to the phosphorylated CREB-CRE complex and acts as a Coactivator. CBP binds to the basal initiation complex stimulating transcription of those genes that contain a CRE as part of their promoter.

Basal Level Response Elements increase constitutive gene expression above basal levels.

Many genes are subject to a multiplicity of regulatory influences. These genes contain an array of different regulatory / response elements.

The Metallothionein gene is a good example. Metallothionein is a metal binding protein that protects the cell against heavy metals by binding heavy metals and removing them from the cell. The protein is always present at low levels and its concentration increases in response to heavy metals or glucocorticoid hormones. The Metallothionein gene promoter contains both MRE and GRE. These elements function independently of one another.

Response Elements to Particular Physiological Challenges

Physiological Challenge

Response Element

Consensus Sequence DNA Bound

Protein Factor

Size (kD)

Heat shock HSE CNNGAANNTCCNNG 27 bp HSTF 93

Glucocorticoid GRE TGGTACAAATGTTCT 20 bp Receptor 94

Cadmium MRE CGNCCCGGNCNC ? ? ?

Phorbol Esters TRE TGACTCA 22 bp AP1 39

Serum SRE CCATATTAGG 20 bp SrF 52

Protein Kinase A Activation

CRE TGACGTCA ? CREB 43

©Kevin R. Siebenlist, 201735

Page 36: Transcription and the Regulation of Gene Expressionacademic.mu.edu/bisc/siebenlistk/3213transcription.pdf · Transcription and the Regulation of Gene Expression Objectives: I. Describe

©Kevin R. Siebenlist, 201736