new transcription: the first step in gene expression · 2019. 9. 10. · substantiated the role of...

35
Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU Dr. Fahd Nasr-All rights reserved 107 Transcription: the first step in gene expression I. Opening remark According to Crick's view of gene expression or the so-called "central dogma in molecular biology", there is a directional flow of information from genes to functions. This includes replication (DNADNA), transcription (DNARNA) and translation (RNAProteins). In addition, there is a directional flow from RNA to DNA by reverse transcription. Although, no pathway exists for the flow of information from proteins to RNA or DNA, it has recently been discovered that inheritance can be mediated by proteins. These unusual proteins that are able to carry and transmit information are called "prions" that mean protein only. The protein only model states that a prion can cause disease and spread it without involving any genetic material. The protein-based inheritance will be treated in a separate section. In this chapter we will concentrate on the transcription process and the regulatory mechanisms in both eukaryotes and prokaryotes. In each cell the expression and the correct use of genomic information require that genes have to be expressed in a coordinated and regulated fashion. The coordination of gene expression determines the set of RNA or transcriptome which in turn specifies the set of proteins or proteome, thus defining the activities that the cell is able to carry out. The transcription process of any gene can be divided into two phases. The first termed initiation of transcription, involves the assembly of a complex of proteins, which includes the RNA polymerase enzyme, at the upstream region or promoter of the gene. This complex will copy the gene into its RNA transcript. In the second phase termed elongation the RNA polymerase moves along the gene and synthesizes the primary transcript that later should undergo some processing events before it can be translated (see chapter 6). Gene expression can be defined by a series of biochemical reactions, involving the attachment of regulatory proteins at the appropriate position within the genome, that make the biological information carried by the genome available to the cell. The simplest model of gene expression was looked on as comprising two distinct processes: transcription and translation. The latter results in the production of a polypeptide whose amino acid sequence is determined by the nucleotide sequence of the RNA transcript. Although this picture is still accurate, the gene expression pathway in higher organisms involves many key points at which information flow may be regulated (Fig. 1). II. How to access the genome Within the living cell, DNA is always associated with a variety of proteins, which are not directly involved in gene expression. Thus, the genes are not readily accessible to the DNA-binding proteins that are responsible of their expression (a separate section will be dedicated to DNA-binding proteins). As previously described, DNA in eukaryotes is associated with chromosomal proteins forming a complex called chromatin. These proteins must be

Upload: others

Post on 15-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: New Transcription: the first step in gene expression · 2019. 9. 10. · substantiated the role of histone acetylation in gene expression. These enzymes, termed acetyltransferase

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

107

Transcription: the first step in gene expression

I. Opening remark According to Crick's view of gene expression or the so-called "central dogma in molecular biology", there is a directional flow of information from genes to functions. This includes replication (DNADNA), transcription (DNARNA) and translation (RNAProteins). In addition, there is a directional flow from RNA to DNA by reverse transcription. Although, no pathway exists for the flow of information from proteins to RNA or DNA, it has recently been discovered that inheritance can be mediated by proteins. These unusual proteins that are able to carry and transmit information are called "prions" that mean protein only. The protein only model states that a prion can cause disease and spread it without involving any genetic material. The protein-based inheritance will be treated in a separate section. In this chapter we will concentrate on the transcription process and the regulatory mechanisms in both eukaryotes and prokaryotes. In each cell the expression and the correct use of genomic information require that genes have to be expressed in a coordinated and regulated fashion. The coordination of gene expression determines the set of RNA or transcriptome which in turn specifies the set of proteins or proteome, thus defining the activities that the cell is able to carry out. The transcription process of any gene can be divided into two phases. The first termed initiation of transcription, involves the assembly of a complex of proteins, which includes the RNA polymerase enzyme, at the upstream region or promoter of the gene. This complex will copy the gene into its RNA transcript. In the second phase termed elongation the RNA polymerase moves along the gene and synthesizes the primary transcript that later should undergo some processing events before it can be translated (see chapter 6). Gene expression can be defined by a series of biochemical reactions, involving the attachment of regulatory proteins at the appropriate position within the genome, that make the biological information carried by the genome available to the cell. The simplest model of gene expression was looked on as comprising two distinct processes: transcription and translation. The latter results in the production of a polypeptide whose amino acid sequence is determined by the nucleotide sequence of the RNA transcript. Although this picture is still accurate, the gene expression pathway in higher organisms involves many key points at which information flow may be regulated (Fig. 1). II. How to access the genome Within the living cell, DNA is always associated with a variety of proteins, which are not directly involved in gene expression. Thus, the genes are not readily accessible to the DNA-binding proteins that are responsible of their expression (a separate section will be dedicated to DNA-binding proteins). As previously described, DNA in eukaryotes is associated with chromosomal proteins forming a complex called chromatin. These proteins must be

Page 2: New Transcription: the first step in gene expression · 2019. 9. 10. · substantiated the role of histone acetylation in gene expression. These enzymes, termed acetyltransferase

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

108

displaced to allow the RNA polymerase and different protein factors to gain access to the DNA.

Figure 1. Outline of gene expression. This schematic representation applies only to protein coding genes because some genes such as ribosomal RNA and transfer RNA genes are transcribed, processed, but the resulting RNAs are not translated. Asterisks represent the key regulatory points at which gene expression may be regulated. First the assembly of the transcription complex is a prerequisite for the transcription to start, failure of this turns the gene off. Second, the RNA transcript should be processed correctly in order for the translation to initiate and proceed. Third, the protein produced by the previous step should be processed and folded accurately to acquire its activity. (a) Transcription factors plus the RNA polymerase enzyme. (b) Ribosome.

5' 3'

Assembly of the transcription complexon the DNA promoter

NH2 COOH

Synthesis of an RNA transcript

RNA processing

Initiation of translation

Protein processing and folding

(a)

(b)

ORFPromoter Terminator

5' 3'

5' 3'

5' 3'

RNA degradation

5' 3'

Translation

Protein turnover

*

*

*

*

*

*

*

DNA

RNA

Page 3: New Transcription: the first step in gene expression · 2019. 9. 10. · substantiated the role of histone acetylation in gene expression. These enzymes, termed acetyltransferase

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

109

The lowest degree of chromatin packaging in the eukaryotic nucleus is the 30nm fiber in which the nucleosomes are organized in a helical fashion with six nucleosomes per turn. From crystal structures it seems that only parts of the DNA are accessible inside the nucleosomes and that during transcription initiation some protein factors should compete with histones to access DNA. How chromatin can influence gene expression? First, the degree of chromatin packaging indicates whether or not the entailed genes are expressed. Second, if a gene is accessible, the transcription is then influenced by the precise positioning of the nucleosomes. During interphase, the eukaryotic chromosomes adopt a less condensed state, compared to the metaphase chromosomes, so that they can no longer be distinguished as individual structures. However, some dark areas, termed heterochromatin, persist and tend to concentrate around the periphery of the nucleus. Within heterochromatin, DNA remains in a relatively compact organization. Presumably, it is the compact organization of heterochromatin that prevents DNA-binding and other expression proteins from accessing the DNA. We recognize two types of heterochromatin: 1. Constitutive heterochromatin that is a permanent feature of all cells contains densely packed fibers. It is usually inactive because it contains no genes. Centromeric and telomeric regions are examples of constitutive heterochromatin. 2. Facultative heterochromatin, which is not a permanent feature of all cells, has only the potential to become condensed. Facultative heterochromatin contains genes that are expressed in some cells at some periods. These genes are inactivated when the corresponding DNA regions are converted to heterochromatin. The remaining areas within the chromatin that stain lightly are less compact than heterochromatin and contain active genes. These areas are called euchromatin. It is assumed that the loose structure of the euchromatin permits expression proteins to get access to the DNA. Electron microscopy indicates that within euchromatin DNA is organized in loops, each of which ranges from 40 to 100kb in length of the 30nm chromatin fiber. These loops are attached to the nuclear matrix via AT-rich regions that were baptized MAR (for Matrix-Associated Regions) or SARs (for Scaffold Attachment Regions). III. Chromatin structure and gene expression III.1. Transcription initiation is a key control in gene expression If the genome is constant, the RNA make-up is context dependent i.e. it varies according to growth conditions and developmental stage. What determines whether or not a gene is expressed is not a trivial question. DNA-binding proteins should first gain access to DNA, failure of which, as a result of a compact structure of the chromatin, the gene in question cannot be expressed. Transcription that refers to the production of RNA molecules in a DNA-dependent manner may be easily divided into three main stages. First, the gene to be turned on must lie in a chromatin region having a low degree of condensation (euchromatin). Second step, also called initiation of transcription, sees the assembly on the promoter region of several proteins including the RNA polymerase, which will next produce a copy or transcript of the

Page 4: New Transcription: the first step in gene expression · 2019. 9. 10. · substantiated the role of histone acetylation in gene expression. These enzymes, termed acetyltransferase

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

110

gene. Third, the RNA polymerase should clear out the promoter and proceed downstream to synthesize a primary transcript using one DNA strand as a template. III.2. How chromatin condensation influences transcription In eukaryotic and prokaryotic systems DNA is packaged into nucleoprotein complexes with a variety of proteins, some of which seem to be involved in the physical organization of the genome. However, it is widely accepted now that these packaging proteins, such as histones, are not neutral structures used to wrap DNA around, rather they are active participants in the gene expression process. The degree of DNA packaging determines whether the gene can be expressed or not. So it is believed that facultative heterochromatin is very compact so that the included genes are inactive because the regulatory proteins cannot access DNA. Moreover, when DNA is accessible, as when chromatin adopts an open configuration (euchromatin) there is still another problem related to the precise positioning of the nucleosomes. The latter must be remodeled to allow regulatory proteins to bind DNA where it should be. Although euchromatin corresponds to active regions within the chromatin, its structure is far from being uniform. Previous studies by electron microscopy suggested that euchromatin is organized, most likely in the form of the 30nm fiber, within loops of 50 to 100kb is length. These loops are attached to the nuclear matrix through AT-rich DNA sequences called Matrix-Associated Regions (MARs) or Scaffold Attachment Regions (SARs). Here, a distinction should be made between a chromatin structural domain, corresponding to one loop, and a chromatin functional domain that refers to a DNA region where genes are expressed. Is there any correspondence between structural and functional domains? Although some MARs were found to occur at the boundary of a functional domain, there is no simple correlation because some MARs are located within genes. III.3. Subtle modifications with huge consequences Histone acetylation. Nucleosomes positions is another obstacle to overcome in order for transcription to occur. Histone acetylation, which is a recently discovered system affecting the fine structure of chromatin, refers to the attachment of acetyl groups (CH3CO-) to lysine residues in the N-terminal regions (forming protruding tails) of histones in the nucleosome core. As histones are charged positively, they do have affinity to DNA and their acetylation is known to reduce this affinity and may also interfere with the interaction between nucleosomes, a requirement for the formation of the 30nm fiber. Many lines of evidences indicate that nucleosomes are unacetylated in heterochromatin and acetylated in euchromatin. The discovery in 1996 of a new class of enzymes capable of adding or removing acetyl groups to histones substantiated the role of histone acetylation in gene expression. These enzymes, termed acetyltransferase and deacetylase, cause subtle changes in chromatin structure, thus influencing gene expression. Several growth-controlling proteins, such as Rb protein, were found to work in conjunction with histone deacetylase. It is worth to mention that Rb is a tumor suppressor gene involved in inhibiting cell division via the inhibition of various genes. When Rb gene is mutated, this may lead to oncogenesis.

Page 5: New Transcription: the first step in gene expression · 2019. 9. 10. · substantiated the role of histone acetylation in gene expression. These enzymes, termed acetyltransferase

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

111

Figure 2. A possible mechanism for CRMs. Chromatin remodeling factors (CRF) include several polypeptides endowed with various functions conferring regulation, efficiency, specificity to the complexes. CRF comprise ATPase subunits that enable it to use ATP hydrolysis to generate energy. The latter is used to mobilize nucleosomes causing changes in their positions relative to DNA. This role is achieved via induction of a conformational change in the nucleosome core so that DNA-hitones contacts are altered in a specific way. The figure displays a DNA region covered with nucleosomes and comprising a gene. The expression of the gene requires the activator 2 (Ac2), whose binding is prevented by bound nucleosomes. In this model activator 1 (Ac1) recruits CRF that facilitate, in an ATP-driven manner, the binding of Ac2 to chromatin, thus turning on the transcription of the gene.

DNA methylation, like histone acetylation, has also been linked to gene expression. In eukaryotic organisms methylation concerns the base cytosine (creating 5-methyl cytosine) that occurs in the CpG sequences, and in plants, in the CpNpG sequences. DNA methylation is usually associated with repression of genes. The question concerning the way by which this happens is solved in part when methyl-CpG-binding proteins (MeCPs) have been isolated. These are believed to bind DNA and block the attachment of transcription factors, so that inhibiting gene activity. Recent findings underlined a new system that affect the fine structure of chromatin. This new system, called chromatin remodeling, involves a set of proteins complexes, which are collectively termed chromatin remodeling machines (CRMs). Like histone acetylation, CRMs induce local changes in chromatin leading to nucleosones repositioning. These changes are believed to enable regulatory protein to gain access to DNA1 (Fig. 2).

1 Tyler, J.K. and Kadonaga, J.T. (1999) The "dark side" of chromatin remodeling: repressive effects on transcription. Cell, 99: 443-446.

Ac1

Ac1CRF

Ac1CRF

Ac2

ATP

ADP + Pi

Off

Off

Off

On

Page 6: New Transcription: the first step in gene expression · 2019. 9. 10. · substantiated the role of histone acetylation in gene expression. These enzymes, termed acetyltransferase

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

112

IV. Different types of RNAs RNAs can be divided into two categories: coding and noncoding RNAs. The coding RNA category contains just one class of molecules called messenger RNAs or mRNAs. These are the products of transcription of protein-coding genes and hence are translated by ribosomes into proteins. The second category of noncoding2 RNAs includes various types of RNAs, all of which are involved in functions that are performed by the RNA molecules themselves. Prokaryotes have two types of noncoding RNAs: ribosomal RNA (rRNA) and transfer RNA (tRNA). In addition, bacteria were found to contain a new class of RNA, termed transfer-messenger RNA (tmRNA), which looks like a tRNA attached to an mRNA. The tmRNA adds short peptide sequence onto incorrectly synthesized proteins, thus targeting them to degradation. In eukaryotes, in addition to rRNA and tRNA, other noncoding RNA classes can be found and seem to be specific to eukaryotes. These are small nuclear RNA (snRNA), small nucleolar RNA (snoRNA), and small cytoplasmic RNA (scRNA). IV.1. Messenger RNA Messenger RNA that results from the transcription of protein-coding gene is directly involved in the synthesis of a polypeptide chain by the protein-synthesizing machinery of the ribosomes. This class forms the most variable component of the transcriptome, being restructured by the changing pattern of gene expression. Indeed, the flow of information hypothesis that was enunciated by Crick in 1958 was extended in 1962 by François Jacob and Jacques Monod to predict that the RNA intermediate in gene expression, which they called mRNA, should have the following features:

1. Its base composition should reflect the base composition of one strand of DNA. 2. It is very heterogeneous with respect to its molecular weight, a property consistent with the fact that proteins have different molecular masses. 3. It should be capable to associate with ribosomes. 4. It should have a short-life or high rate of turnover. In fact, the rapid degradation of mRNA should allow the protein-coding component of the transcriptome to be restructured, controlling thus the rate of protein synthesis. In bacteria, mRNA molecules have half-lives of a few minutes, whereas in eukaryotes most of mRNAs are degraded a few hours after their synthesis. However, in the yeast S. cerevisiae the half-life average of mRNAs is 15 minutes.

In prokaryotes, a single transcription unit may encode several polypeptides (see operon below). In contrast, in eukaryotes each mRNA encodes only one protein. Nevertheless, eukaryotic mRNAs are originally synthesized in the nucleus in the form of larger and more complex precursor termed heterogeneous nuclear RNA or hnRNA. The hnRNA encompass additional sequences that have no protein-coding capacity and which are called intervening

2 In addition to rRNA and tRNA molecules, hundreds of other noncoding RNAs were identified in the last two decades, having different functions ranging from structural, regulatory, to catalytic. For instance, the Xist and roX RNAs have regulatory functions, being involved in chromosome dosage compensation mechanism in mammals and Drosophila, respectively.

Page 7: New Transcription: the first step in gene expression · 2019. 9. 10. · substantiated the role of histone acetylation in gene expression. These enzymes, termed acetyltransferase

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

113

sequences or introns (see chapter 6). These alternate with protein-coding regions termed exons. Introns, which interrupt the continuity of coding information, must be removed before the message can be translated.

Figure 3. The organization and composition of eukaryotic and prokaryotic (Escherichia coli) ribosomes.

IV.2. Ribosomal RNA Ribosomes3, the agents of protein synthesis, are ribonucleoprotein organelles that are composed of two subunits of different sizes termed large subunit and small subunit. Because each subunit is a supramolecular assembly of proteins and RNA, ribosomal subunits are expressed in terms of their ultracentrifugal sedimentation coefficients or S values. Bacterial ribosomes have a sedimentation coefficient4 of about 70S, each of which consists of one 30S and one 50S subunit. In eukaryotes, ribosomes have a sedimentation coefficient of 80S, and each consists of one 40S and one 60S. Ribosomes are about 65% RNA and 35% protein. The ribosomal RNA (rRNA) is currently believed to carry out the ribosome function. Properties of ribosomes and their RNA and protein components are summarized in figure 3. The large subunit contains three rRNAs in eukaryotes (the 28S, 5.8S and 5S rRNAs) but only two in bacteria (23S and 5S

3 Ribosomes in chroloplasts and mitochondria may have different sedimentation coefficients. It is of 60S in mammalian mitochondria, of 70S in chloroplasts of plants, 73S in mitochondria of yeast, or 78S in plant mitochondria. 4 Sedimentation coefficient is usually expressed in Svedberg units (S).

E. coli ribosomes Eukaryotic ribosomes

23S RNA (2904 nuc)5S RNA (120 nuc)

31 proteins

16S RNA (1542 nuc)

21 proteins

28S RNA (4718 nuc)5.8S RNA (160 nuc)5S RNA (120 nuc)

49 proteins

18S RNA (1874 nuc)

33 proteins

70S2520kD

80S4220kD

50S1590kD

30S930kD

60S1820kD

40S1400kD

+

Page 8: New Transcription: the first step in gene expression · 2019. 9. 10. · substantiated the role of histone acetylation in gene expression. These enzymes, termed acetyltransferase

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

114

rRNAs). The small subunit contains only one RNA molecule: an 18S rRNA in eukaryotes and a 16S rRNA in bacteria. Both subunits are associated with proteins. Those of the small subunit are designated S1, S2, S3 etc., whereas those of the large subunit are called L1, L2, L3 etc. Ribosomal RNAs contain a number of specifically modified nucleotides including pseudouridine and methylated bases. Moreover, rRNAs have a high degree of secondary structures that serve as a scaffold for the ribosomal proteins during the assembly process and formation of the complete ribosome structure. The various components of a ribosome are apparently held together by hydrogen bonding and by ionic and hydrophobic interactions, and where magnesium is playing an important role in maintaining ribosome integrity. IV.3. Transfer RNA Transfer RNA (tRNA) is an RNA molecule that acts as an adaptor matching amino acids to their codons on mRNA during protein synthesis. Transfer RNA are small polynucleotides containing 73 to 95 residues (about 4S), many of which are modified (see protein synthesis). The word "transfer" comes from the role of tRNA as a carrier of amino acids. Each of the 20 amino acids of proteins has at least one unique tRNA species dedicated to perform its insertion into growing polypeptide chains. Note that some amino acids are served by several tRNAs. For instance, there are six different tRNAs that act in the transfer of leucine to proteins (this number is usually reduced by wobbling, see Chapter VII). In eukaryotes, there are discrete sets of tRNAs for each protein synthesis compartment e.g. the cytoplasm, the mitochondrion and, in the plant cells, the chloroplast. All tRNA molecules possess a 3'-terminal nucleotide sequence that reads –CCA, and the amino acid is carried to the ribosome attached as an acyl ester to the free 3'-OH of the terminal A residue. These aminoacyl-tRNAs are the substrates of protein synthesis. IV.4. Specific types of RNA IV.4.a. Small nuclear RNAs Small nuclear RNAs (snRNA) form a new class of RNA specific of eukaryotes and found in the nucleus. They are small RNA molecules containing less than 300 nucleotides (usually from 100 to 200 nt), some of which are, like tRNA and rRNA, modified. In the eukaryotic nucleus (e.g. metazoa) 6 different snRNAs are particularly abundant (approximately 5x105 of each per nucleus), and these are U1, U2, U3, U4, U5, and U65 (also called U-RNAs because of their content of uridylic acid). No snRNA exists as naked molecule in the nucleus. Instead, snRNAs function in the nucleus in the form of "small nuclear ribonucleoprotein particles" (snRNPs or snurps) which are about 10S in size. Each snurp usually consists of one snRNA associated with one of more proteins. Some proteins are common to several snurps, and some snurps contain more than one snRNA e.g. U4- and U6-RNAs occur together in the

5 All U-RNAs are strongly conserved during evolution. U1, U2, U3, U4, and U5 RNAs are synthesized by RNA polymerase II and each has a 2,2,7-trimethylguanosine cap at the 5' end. Only, U6 is synthesized by RNA polymerase III, it is capped (structure is unknown) and has a poly(U) sequence at the 3' end.

Page 9: New Transcription: the first step in gene expression · 2019. 9. 10. · substantiated the role of histone acetylation in gene expression. These enzymes, termed acetyltransferase

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

115

same snurp, the U4/U6-snRNP. SnRNPs are involved in a variety of functions6 including the processing of eukaryotic transcripts (hnRNAs or pre-mRNA splicing) into mature mRNAs (U1-U6 snRNPs), histone mRNA 3' end processing (U7 snRNP), rRNA processing (U3, U8, U13-72 snRNPs), etc. IV.4.b. Small nucleolar RNAs In eukaryotes, it is a new class of noncoding RNAs that is localized predominantly in the nucleolus. Small nucleolar RNA or snoRNA act as guides to direct nucleotide modification (pseudouridylation and 2'-O-ribose methylation) in ribosomal RNAs. The snoRNAs fall into two major classes, each characterized by specific conserved sequence elements or boxes, and a set of associated proteins. The C/D box snoRNAs, associated with fibrillarin, guide a site specific 2'-O-methylation, and the H/ACA box snoRNAs, associated with protein GAR1, target specific conversions of uridine to pseudouridine (see processing of pre-rRNA, chapter 6). IV.4.c. Small cytoplasmic RNAs Small cytoplasmic RNAs or scRNAs are diverse group including molecules with a range of functions, some understood and others still unknown. They occur in the cytoplasm of eukaryotic cells in the form of ribonucleoprotein particles (scRNPs or scyrps) including prosomes. Prosome, which is 19S in size, is believed to be involved in the post-transcriptional regulation of gene expression. IV.4.d. Transfer-messenger RNA This RNA type, which is present in most if not all bacteria, looks like a tRNA attached to a mRNA. In E. coli transfer-messenger RNA (tmRNA), a 363 nucleotide RNA molecule (about 10S in size), was identified in a trans translation mechanism during which tmRNA adds 11 terminal amino acid tag at the carboxyl terminus of a number of proteins that are destined for degradation7.

6 Mattaj, I.W., Tollervey, D. and Séraphin, B. (1993). Small nuclear RNAs in messenger RNA and ribosomal RNA processing. FASEB J., 7, 47-53. 7 During translation, ribosomes translate mRNAs from a start codon to a stop codon, so that the resulting protein reflects a strict linear readout according to the rules of the genetic code. It was found that ribosome can switch from one RNA to another, thus adding a short peptide (11 amino acids) to a number of proteins, usually incorrectly synthesized, that are targeted for degradation. The last 10 amino acids of the tag are decoded by the 363-nucleotide RNA (10S) whereas the first of the 11, the junction amino acid, has no apparent codon. The 10S RNA has a tRNA-like structure charged with the amino acid alanine. A model of trans translation was proposed suggesting that ribosomes poised at the 3' end of an mRNA without stop codon bind the 10S RNA charged with alanine (at the A-site of the ribosome), then this amino acid is added to the stalled polypeptide chain. The 10S RNA folds to fit its internal sequence into the mRNA track facilitating cotranslational switching of the ribosome. Note that the last 10 amino acids are translated from the 10S RNA to a normal terminator. Atkins, J.F. and Gesteland, R.F. (1996). A case for trans translation. Nature, 379, 769-771.

Page 10: New Transcription: the first step in gene expression · 2019. 9. 10. · substantiated the role of histone acetylation in gene expression. These enzymes, termed acetyltransferase

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

116

IV.4.e. Other non-coding small RNAs Several types of small RNAs have recently been documented and found to be involved in many cellular mechanisms such as RNA processing and gene silencing. These small RNA species, of approximately 22 nucleotides, are called micro RNAs (miRNAs), small interfering RNAs (siRNA), and small temporal RNAs (stRNAs). The function of miRNAs is still unknown but studies have demonstrated that miRNAs assemble with proteins in a new type of RNP. Many lines of evidence suggest that miRNAs might be involved in the regulation of the expression of other RNAs8. stRNAs have been found to recognize complementary sequences in the 3'-untranslated regions (3'-UTRs) of target mRNAs and prevent the accumulation of nascent polypeptides9. In contrast, siRNAs are involved in a process termed gene silencing or RNA silencing10. RNA silencing refers to related processes that operate to inhibit gene expression in many living systems. RNA silencing is called post-transcriptional gene silencing (PTGS) in plants, RNA interference11 (RNAi) in vertebrates and invertebrates and quelling in fungi. A unifying feature of RNA silencing states that siRNAs recognize homologous sequences and silence gene expression by degradation of the mRNA.

Box 1- Small RNAs and DNA methylation in gene silencing12 Discovery and overview Gene silencing refers to the use of double-stranded RNA (dsRNA) in order to shut down the expression of a specific target gene showing homology with the dsRNA used. It was discovered in many organisms including plants, vertebrates, invertebrates and fungi. Gene silencing was first reported in 1990 when experiments were designed to deepen the purple color of petunias using a pigment-producing gene under the control of a powerful promoter. Instead of the expected deep purple color many of these flowers appeared variegated or even white. The observed phenomenon was named "co-suppression", since the expression of both, the introduced gene and the

8 Mourelatos, Z. et al. (2002) miRNPs: a novel class of ribonucleoproteins containing numerous microRNAs. Genes and Dev., 16: 720-728. 9 Reinhart, B.J. et al. (2000) The 21-nucleotide let-7 RNA regulates developmental timing in Caenorhabditis elegans. Nature, 403: 901-906. 10 The RNA silencing system has not been invented by humans, rather it refers to cellular processes that are used by plants, animals and fungi to control gene expression at the post-transcriptional level. As a possible mechanism of RNA silencing many lines of evidence suggested the recruitment of an enzyme that cuts the dsRNAs into short fragments containing 21-25bp. Then the two strands of each fragment separate to release the antisense strand that in turn binds to its complementary RNA region within the mRNA. This is believed to mediate sequence-specific degradation of RNA transcripts. The RNA silencing system is being elucidated and seems to be highly conserved throughout evolution and serves many functions. For instance, studies in worm and fruit fly suggested that RNA silencing protects the genome from the harmful mutagenic potential of the retrotransposons. The latter are mobile DNA elements that use RNA intermediates to move from one site to another site within the genome, thus inactivating essential genes. In plants, RNA silencing acts as a plant immune system by degrading the genome of RNA viruses. It is clear that the molecular machinery of RNA silencing will target not only foreign RNA but also any RNAs in the cell that display similarity to the foreign RNA. 11 Fire, A. et al. (1998) Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans. Nature, 391: 806-811. Sharp, P.A. (2001) RNA interference-2001. Genes and Dev., 15: 485-490. 12This section is based on a research project entitled "Gene silencing and RNA interference” prepared under my supervision by Nawal Abdelrazzak and Rasha Lakkis during the academic year 2002-2003. Their contribution was highly appreciated. I acknowledge here their commitment and motivation.

Page 11: New Transcription: the first step in gene expression · 2019. 9. 10. · substantiated the role of histone acetylation in gene expression. These enzymes, termed acetyltransferase

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

117

homologous endogenous genes were suppressed. Later, gene silencing was found in nematodes, particularly in Caenorhabditis elegans, where it was given the name of RNA interference (RNAi). This system was found to work in other organisms such as Arabidopsis thaliana, Drosophila melanogaster , Neurospora crassa, mice and humans. The same process was also observed in fungi and was termed quelling. Antisense RNA and dsRNA The RNA antisense technology has been widely and successfully used in many living organisms to shut down the expression of specific genes. The biology of this system is simple: a short DNA fragment is usually cloned in the opposite orientation (with respect to the open reading frame) under the control of an inducible promoter so that expression yields antisense RNAs; these form duplexes structures with the normal mRNAs and thus block the expression of the corresponding genes. The mechanism of antisense RNA effect is not well understood, however it is generally accepted that the physical block of the transcripts would affect either transcription, mRNA processing, RNA stability, transport into the cytoplasm or even translation. In such a system the sense RNA turned out to be as effective as their antisense counterparts. They found that preparations of sense RNA were contaminated with antisense RNA so that sense and antisense molecules base pair according to Watson-Crick rules to form double-stranded RNA (dsRNA). It was obvious, for some reason, that dsRNA is an efficient system to inhibit gene expression. This RNA silencing system has been called post-transcriptional gene silencing (PTGS) in plants and RNA interference (RNAi) in vertebrates and invertebrates. The principle of RNAi is very simple, it relies on the presence of dsRNA in the cell in which the target gene is found. The dsRNA could be produced in one of several ways: transcription of an inverted repeat to generate long hairpin containing RNAs, transcription of both sense and antisense strands by opposing promoters, or co-transcription of a sense transgene with an antisense transgene13. This dsRNA, when found in the cell is subjected to nuclease activity where small dsRNA are produced, each composed of about 21-25 nucleotides in length. Later, these dsRNA fragments are denatured enough to allow recognition of target mRNA. After that, they will be degraded as well as the target mRNA by the action of RNAses, thereby blocking gene expression. Gene silencing and DNA methylation Another mechanism of gene silencing based on DNA methylation and chromatin structure, shows that in DNA there are C rich regions called CpG islands (in animals) which are preferred sites for methylation. When methylated, they induce a compact chromatin structure preventing gene expression. Histone deacetylation is strongly related to this process because it leads to gene silencing by converting an open chromatin configuration to a compact one. Some factors such as trichostatine A appears to work on closed chromatin structure and reactivate methylated DNA by formation of diffusible chromatin structure14. PTGS as a defense antiviral mechanism Plants can use gene silencing as an immune system against viruses whose genomes will be degraded in the host cytoplasm. Eventually, the dsRNA of the virus will be degraded, and infection will not occur. It was discovered that once a cell is infected with a certain virus, it would develop immunity against any virus having a least one gene showing homology with a previous viral gene15. Consistent with the wide spread occurrence of PTGS in virus infected cells, it has been shown that some viruses are able to overcome or prevent PTGS. The first example came in 1997 from the analysis of potyviral synergistic interactions and lead to identification of HC-protease (HC-pro) as a suppressor of PTGS.

13 Carthew, R. (2001) Gene silencing by double-stranded RNA. TRENDS in Genet., 17:244-248. 14Curradi, M. et al. (2002). Molecular Mechanisms of Gene Silencing by DNA Methylation. Mol. Cell. Biol., 22:

3157-3173. 15Voinnet, O. (2001) RNA silencing as a plant immune system against viruses. TRENDS in Genet., 17:449-458.

Page 12: New Transcription: the first step in gene expression · 2019. 9. 10. · substantiated the role of histone acetylation in gene expression. These enzymes, termed acetyltransferase

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

118

RNAi as a biological tool Discovery of RNAi adds an additional promising tool to molecular biologists. Introducing dsRNA corresponding to a particular gene will knock out the cell's own expression of that gene. This can be done in particular tissues at a chosen time, thus providing an advantage over conventional gene “knock out” where the missing gene is carried in the germ line and thus whose absence may kill the embryo before it can be studied. The disadvantage of introducing dsRNA fragments into a cell is that gene expression is only temporally reduced. However, a research group has reported the successful introduction into mammalian cells of a DNA vector that can continuously synthesize a siRNA corresponding to the pertinent gene. RNAi/PTGS mechanism RNAi is a phenomenon that provides organisms with a defense against mobile DNA elements, which cause mutation when they insert themselves within or close to a gene. A common feature of RNAi and its counterpart PTGS, is the presence of short fragments of about 21-25 nt. in length and the decrease of mRNA level. The later is not due to any decrease in transcriptional rate, but rather to the degradation of mRNA of both endogenous and exogenous homologues. In transgene induced PTGS, RNA Dependent RNA Polymerase (RdRP) -a protein encoded by the SGS2/SDE1 gene- is required where one possible function is the synthesis of cRNA from aberrant RNA templates leading to the formation of dsRNA. However, it appears to be dispensable for virus-induced local gene silencing where dsRNA is produced by the viral RdRP16. ATPases and helicases are needed to amplify the RNAi signal. Consistent with the presence of 21-23n fragments (called siRNAs), a gene called Dicer whose product -an RNase type III enzyme- can produce small RNA molecules from dsRNA, was shown to participate with RNAi in Drosophila. Dicer is called so because it chops RNA into small pieces of uniform size. The first step is terminated after Dicer had been cutting the dsRNA. The resulting siRNA now basepairs with homologous sequences of the single-stranded mRNA by RNAi specific enzymes possibly still including Dicer. The short RNA molecules then guide the RNA degrading enzymes. The RNAi nuclease activity showed a dependence on the size of the RNA substrate. Substrates that contained a substantial portion of the targeted region were degraded efficiently, whereas those that contained shorter stretch of homologous sequence were recognized inefficiently. For both the sense and antisense strands, transcripts that had no homology with the transfected dsRNA were not degraded17. DNA methylation Transfection experiments in Xenopus oocyte microinjections, performed with in vitro methylated DNA, demonstrated that methylation inhibits gene expression. Conversely, modified silent genes in cultured cell lines can be activated upon treatment with 5-azacytidine, a demethylating agent. DNA methylation can inhibit gene expression at different levels. Even though direct interference of modified CpG can block binding of transcriptional factors, this level of repression does not represent the main mechanism by which methylation-mediated gene silencing is exerted. In most cases, DNA methylation seems to repress gene expression by recruiting binding proteins specific for methylated DNA. The importance of the number of methyl cytosines shows up when the ability of the repressive effect to spread over a long distance is investigated. Few methylated sites cannot seed the formation of the repressive chromatin structure, nor can they guarantee its stability. Experiments show that histone deacetylation is of fundamental importance to the silencing mechanism only when the number of modified sites doesn’t reach the threshold sufficient for an effect over a long distance. When only a limited number of modified dinucleotides are close to a promoter, they recruit Methyl Binding Domain (MBD) proteins and their associated histone deacetylation activity. Histone deacetylation occurs in a small number of nucleosomes, and transcription repression is observed. In this situation, trichostatin A treatment

16Matzke, M.A. et al. (2001) RNA based silencing strategies in plants. Curr. Opin. Gen. Dev., 11:221-227. 17Hammond, S.M. et al. (2000) An RNA-directed nuclease mediates post-transcriptional gene silencing in

drosophila cells. Nature, 404: 293-296.

Page 13: New Transcription: the first step in gene expression · 2019. 9. 10. · substantiated the role of histone acetylation in gene expression. These enzymes, termed acetyltransferase

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

119

allows bypassing the main mechanism by which methylated DNA silences gene expression, and therefore inhibition is relieved. The repression mechanism is significantly different when the number of methylated sites is increased and reaches the threshold that leads to diffusion of gene silencing along the DNA fiber. The contribution of histone deacetylation to transcriptional inhibition is of secondary importance; in fact, even in the presence of trichostatin A, transcriptional levels remain significantly lower than in the unmethylated controls. While methylated transfected genes can be reactivated by trichostatin A, naturally densely methylated endogenous genes cannot be reinduced with trichostatin A alone. microRNAs It has become increasingly apparent that non-coding RNAs are impressively diverse. As more data cumulate an increasingly significant fraction of the genome does not seem to encode proteins, rather non-coding RNAs are specified. For this, 'junk DNA' is no longer junk; it is now considered as a new class of non-coding RNA genes that are scattered throughout the animal genomes. Several microRNAs similar to siRNAs have recently been discovered while preparing cDNA libraries of siRNA from size selected (~22 nt) RNAs expressed in worms, flies and human cells. miRNAs are non-coding RNAs that are quite numerous. The best-studied miRNA genes are the C. elegans lin-4 and let-7, which were identified by the genetic analysis of developmental timing in the nematode and found to encode 22 nt regulatory RNAs18. Fourty different miRNAs ranging in size from 16 to 24 nt were identified in humans. The fly, worm and human miRNAs exhibit varied expression patterns, from uniform during development, to relatively stage-specific and/or tissue specific. This suggests a variety of roles for miRNA genes, including the regulation of developmental timing, spatial patterning of cell fates, or cellular and organismal physiology. Medical applications of RNAi In the first week of June 2002, researchers have shown that siRNAs can target the receptors through which HIV-1 virus gains entry into human cells, thus making the cells less susceptible to the HIV virus. Moreover, siRNAs can even reduce the replication of the virus once infection is underway by silencing the HIV itself19. This was proved experimentally, by observing reduction of virus production by 25 folds after 5 days of introducing the siRNAs as compared to controls. Obviously RNAi is not only of academic interest, it could be useful as a drug discovery tool. "Vitravene" is an RNAi drug for treating cytomegalovirus –induced retinitis in AIDs patients- produced by Isis Pharmaceuticals Inc. and there are more than 20 antisense-based drugs in the clinic of Biotec companies. RNAi is most likely to be used in the limitation of infectious diseases or carcinogenicity, but it also has potential applications for some dominant genetic disorders. RNAi may have other medical applications. For example, it was recently reported that sequence-specific 22 nt dsRNAs could rescue mammalian cells from the toxicity caused by an expression of an expanded polyglutamine transcript similar to that found in spinobulbar muscular atrophy. Eventually, given the rate of progress of studies on RNAi, it may not be too long before this and other potential medical uses of RNAi improve human health.

V. Enzymology and mechanisms of transcription V.1. RNA polymerases The enzymes that are responsible for transcription of DNA into RNA are termed DNA-dependent RNA polymerases. This indicate that the enzymatic reaction catalyzes the polymerization of ribonucleotides into RNA in a DNA-dependent complementary fashion i.e.

18Ambros, V. (2001) MicroRNAs: Tiny Regulators with Great Potential. Cell, 107: 823-826. Pasquinelli, A. (2002) MicroRNAs: deviants no longer. TRENDS in Genetics, 18: 171-173. 19 Pomerantz, R.J. (2002) RNA interference meets HIV-1: Will silence be golden? Nat Med., 8: 659-60. Cullen, B.R. (2002) RNA interference: antiviral defense and genetic tool. Nat Immunol., 3: 597-9.

Page 14: New Transcription: the first step in gene expression · 2019. 9. 10. · substantiated the role of histone acetylation in gene expression. These enzymes, termed acetyltransferase

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

120

the RNA polymerase uses the nucleotide sequence in a DNA template to dictate the sequence of nucleotides in RNA. Like DNA polymerases, RNA polymerases link ribonucleoside 5'-triphosphates or NTPs (ATP, CTP, GTP, and UTP) in the order specified by base pairing with the DNA template. RNA polymerase moves along the DNA template strand in the 3'5' direction, joining the 5'-phosphate of an incoming ribonucleotide to the 3'-OH of the previous residue. Thus, the RNA chain grows 5'3' during transcription like DNA polynucleotides during replication. Transcription of eukaryotic nuclear genes requires three different RNA polymerases, called RNA polymerase I, RNA polymerase II, and RNA polymerase III. Each is a multisubunit protein (8-12 subunits) with a molecular mass in excess of 500kD. Structurally, these RNA polymerases are quite similar to one other, the three largest subunits being closely related and some of the smaller subunits are quite distinct. Indeed, each RNA polymerase transcribes a different set of genes as described in table 1. In prokaryotes there is a single species of DNA-dependent RNA polymerase that synthesizes all types of RNAs.

Table 1. Functions of the three DNA-dependent RNA polymerases20 in eukaryotes. In addition to protein-coding genes, RNA polymerase II works on a set of genes specifying snRNAs. RNA polymerase III transcribes tRNA genes and other genes for small RNAs. Polymerase Gene transcribed RNA polymerase I 28S, 5.8S and 18S ribosomal RNA genes RNA polymerase II Protein-coding genes, and most snRNA genes RNA polymerase III Genes for tRNA, 5S rRNA, U6-snRNA, snoRNAs and scRNA

V.2. Transcription in prokaryotes V.2.1. Structure and function of the E. coli RNA polymerase The RNA polymerase of E. coli, called RNA polymerase holoenzyme, is a complex multimeric protein of about 465kD. The holoenzyme (the whole enzyme in its active form) consists of a core enzyme associated with another protein, a sigma factor (). Its subunit composition is ', in which the largest subunit ' (160kD) functions in DNA binding and subunit (150kD) binds NTPs and interacts with (82kD). Together ' and subunits form the catalytic site. Sigma subunits (they are related proteins) function in recognizing promoter sequences on DNA that identify the location of transcription initiation sites. Finally, the two subunits (each is 36.5kD) are essential for the assembly of the enzyme and activation by some regulatory proteins. When the subunit dissociates from the holoenzyme, the remaining structure is called core polymerase ('), which is catalytically competent but unable to initiate transcription.

20 Usually we do use simply the term RNA polymerase to designate the DNA-template-dependent RNA polymerase. This should not be mistaken with the RNA-template-dependent RNA polymerase that is involved in the replication or transcription of some viral RNA genomes. Moreover, there is also template-independent RNA polymerase such as the poly(A) polymerase, which has an important function in gene expression in eukaryotes.

Page 15: New Transcription: the first step in gene expression · 2019. 9. 10. · substantiated the role of histone acetylation in gene expression. These enzymes, termed acetyltransferase

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

121

V.2.2. Binding of the RNA polymerase to promoter sequences The sequence that is recognized and bound by the RNA polymerase is called promoter. The name comes from the fact that this locus promotes the expression of the downstream gene and that when this locus is inactivated by mutations, the gene is not expressed. RNA polymerase binds nonspecifically to DNA with low affinity and then migrates along it searching for a promoter. The process of transcription begins when the subunit of the holoenzyme recognizes a promoter sequence, then the holoenzyme and the promoter form a closed promoter complex. Once the closed promoter complex is established the RNA polymerase holoenzyme unwinds about 12 bp (from –9 to +3), forming a very stable complex called open promoter complex. Finally, the RNA polymerase achieves the so-called promoter clearance by moving away from the promoter. V.2.3. Properties of the prokaryotic promoters Prokaryotic promoters vary in size from 20 to 200bp, but typically consist of a 40bp region located on the 5'-side of the transcription start site. Transcription start site or start point is the first nucleotide, usually a purine (often an adenine residue in the sequence CAT), to be transcribed. This nucleotide is designated +1, nucleotides downstream this position (on the 3' side i.e. reading in the 5'3' direction from +1) are numbered +2, +3, +4, etc, and nucleotides upstream this position (on the 5' side i.e. reading in the 3'5' direction from +1) are numbered -1, -2, -3, etc. Note that there no zero in this numbering. In E. coli, most transcription is carried out by 70 subunit (encoded by the gene rpoD). The holoenzyme containing 70 binds to a region extending from about -50 to +2021. Comparison of the sequences of many promoters recognized by 70 has revealed two segments called consensus sequence elements, each of which is 6-nucleotide long. These two elements are the Pribnow box or -10 box, which is near -10 and has the hexameric consensus TATAAT, and the -35 box, which is centered approximately 35bp upstream of +1 and has the hexameric consensus TTGACA. These are consensus22 sequences and describe the average of all promoters. For instance, promoters recognized by 70 in E. coli differ to some extent from the theoretical consensus promoter and may also differ from one another in functional efficiency (strong versus weak promoters). Promoters of other RNA polymerase holoenzymes in bacteria also typically have two short conserved sequences upstream of the +1 position. However, the actual sequences vary from one class of promoter to another. For example, 32 in E. coli (encoded by the gene rpoH), which is involved in the expression of heat shock genes, recognizes a consensus sequence CCCTTGAA in the -35 region, and CCCGATNT (N is any nucleotide) in the -10 region.

21 One definition of promoter is based on DNaseI protection experiments. After binding of a RNA polymerase holoenzyme to a DNA duplex in vitro, the resulting DNA:protein complex is treated with DNaseI that degrades any DNA not protected. Therefore, the DNA fragment left after exhaustive DNaseI digestion would define the holoenzyme binding site, which is by definition the promoter. The second definition of promoter is mutational because any nucleotide change that inactivates the promoter will block gene expression. 22 A consensus sequence can also be defined as the bases that appear with the highest frequency at each position when a series of sequences believed to have a common function are compared.

Page 16: New Transcription: the first step in gene expression · 2019. 9. 10. · substantiated the role of histone acetylation in gene expression. These enzymes, termed acetyltransferase

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

122

V.2.4. The steps of transcription in prokaryotes In the first step of transcription the RNA polymerase holoenzyme (core enzyme associated with a factor) binds to the promoter sites of the gene to be expressed. The core enzyme itself has a high affinity for double-stranded DNA and can bind at any site to form a stable closed promoter complex in which the DNA strands are not unwound. Interaction of a factor with the core enzyme confers specificity for a particular class of promoters while greatly reducing affinity for other DNA sequences. The holoenzyme binds strongly to its corresponding promoter to form a closed complex, that is then converted to an open promoter complex in which a short region bound by the holoenzyme become unwound. The subsequent three steps of the transcription mechanism are: initiation of polymerization, elongation and termination. V.2.4.a. Initiation of polymerization The RNA polymerase has two binding sites for NTPs: the initiation site and the elongation site. The initiation site binds the purine nucleotides preferentially (see above). The first nucleotide binds at the initiation site and base-pairs with the +1 base on the DNA template strand exposed within the open promoter complex. The second incoming nucleotide binds at the elongation site and base-pairs with the +2 base. A few nucleotides are then incorporated during the translocation or movement of the RNA polymerase along the template strand, each nucleotide being added to the 3'-OH of the preceding nucleotide. Once an oligonucleotide of 6 to 10 nucleotides has been formed, the factor dissociates from RNA polymerase indicating the completion of the initiation step. Transcription initiation is inhibited by Rifamycin B and its analog Rifampicin, two specific inhibitors of prokaryotic RNA polymerases. These two inhibitors have different mode of action. Rifamycin B binds to the subunit of RNA polymerase and prevents binding of the incoming nucleotide at the initiation site. In contrast, Rifampicin blocks the translocation of RNA polymerase along the DNA template strand. Note that Rifamycin and Rifampicin inhibit initiation but have no influence on elongation. V.2.4.b. Transcription elongation Elongation is carried out by the RNA polymerase core. The RNA polymerase moves along the dsDNA, locally unwinding the strands to expose ssDNA template, supervising the correct base-pairing between incoming NTPs and the 3'5' template, and linking the nucleotides in the 5'3' direction (antiparallel to the template) with elimination of PPi or pyrophosphate. Chain elongation does not proceed at a constant rate, rather it varies between 20 to 50 nucleotides per second depending on the region being transcribed; regions that are GC-rich cause the RNA polymerase to slow down due to the greater difficulty to unwind G:C base pairs compared to A:T. As the RNA polymerase moves along the template, the DNA duplex is unwound ahead of it and recloses after the polymerase has passed by. About 12bp of the growing RNA remain base-paired to the DNA template at any time, with the RNA strand becoming displaced as the DNA duplex rewinds behind the advancing RNA polymerase. In this way, a short and transient RNA-DNA hybrid duplex is formed in the region of the RNA

Page 17: New Transcription: the first step in gene expression · 2019. 9. 10. · substantiated the role of histone acetylation in gene expression. These enzymes, termed acetyltransferase

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

123

polymerase-DNA complex. Cordycepin, which is the name given to 3'-deoxyadenosine, is an inhibitor of chain elongation in prokaryotes. This nucleoside can be phosphorylated in vivo to yield 3'-deoxyadenosine 5'-triphosphate that can be added to the growing RNA chain but, since it lacks 3'-OH, aborts further elongation. V.2.4.c. Chain termination Elongation continues until a specific terminator or termination sequence is reached. Termination in bacteria varies in efficiency and in mechanism of action, secondary structure within the transcript itself being important for transcription termination. In a rho-independent (simple) terminator the termination region of the RNA transcript contains a GC-rich palindromic sequence, which can give rise to a stem-and-loop or hairpin structure, ending in a series of consecutive uridine residues at the 3' end. The hairpin structure causes the RNA polymerase to pause at the oligo-U region, and the sequence of the A:U base pairs, which is relatively unstable, may facilitate the release of the transcript and/or dissociation of the transcription complex. In a rho-dependent (complex) terminator, the transcript may contain a hairpin structure, but there is generally no oligo-U sequence or any apparent consensus sequence. In this case termination requires the participation of a protein, termed rho (or ) factor. The factor, which is apparently active in hexameric form (hexamer of 50kD subunits), is an ATP-dependent helicase that catalyzes the unwinding of the RNA:DNA hybrid duplexes. The precise mode of action of factor is still unknown. However, it has been proposed that the factor recognizes and binds a GC-rich site in the RNA transcript and that termination occurs at a relatively fixed distance downstream of the recognition site. Note that the recognition site must lack secondary structure and be unoccupied by translating ribosomes for factor to bind. Then, the factor moves along the growing RNA until it reaches the transcription bubble where it catalyzes the release of the nascent RNA transcript. V.3. Transcription in eukaryotes As we mentioned above, eukaryotic cells contain three species of DNA-dependent RNA polymerases, each synthesizes a different class of RNA. RNA polymerase I is localized to the nucleolus and transcribes the major ribosomal RNA genes. RNA polymerases II and III are both localized to the nucleoplasm and transcribe mainly protein-encoding genes and tRNA genes, respectively. All three RNA polymerases are large, complex multimeric proteins (500 to 700kD), consisting of 10 or more types of subunits. Despite their difference in the overall subunit composition, all three RNA polymerases share several smaller subunits. Furthermore, all three polymerases have two large subunits that are similar to the large and ' subunits of E. coli RNA polymerase, suggesting that the catalytic site of RNA polymerase is conserved throughout evolution. In addition, the three classes of RNA polymerases can be distinguished by their sensitivity to -amanitin, a bicyclic octapeptide produced by the poisonous mushroom Amanita phalloides, which blocks RNA chain elongation. RNA polymerase II is very sensitive

Page 18: New Transcription: the first step in gene expression · 2019. 9. 10. · substantiated the role of histone acetylation in gene expression. These enzymes, termed acetyltransferase

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

124

to this compound, whereas RNA polymerase III is less sensitive and RNA polymerase I is resistant. The presence of three different RNA polymerases in the eukaryotic nuclei that act on different sets of genes implies that at least three classes of promoters should exist to provide specificity. All three RNA polymerases recognize and bind to their promoter via DNA-binding proteins, collectively called transcription factors. These mediate recognition and accurate initiation of transcription at specific promoter sequences (see next section). V.3.1. Properties of the eukaryotic promoters In eukaryotes, the term promoter is used to describe all the sequences that are required for initiation of transcription of a gene. These sequences includes the core promoter, also called the basal promoter, which is the site where the initiation complex is assembled, and one or more upstream promoter elements, which are located upstream of the core promoter. The initiation complex can assemble on the core promoter in the absence of upstream promoter elements, but this occurs inefficiently. This implies that DNA-binding proteins that bind at the upstream promoter elements play an important role in promoting, in case of activators, gene expression. Each of the three classes of eukaryotic RNA polymerase recognizes a different type of promoter. As a consequence, it is the type of promoter that defines which genes are transcribed by which RNA polymerase (Fig. 4).

Figure 4. Structures of RNA polymerases I, II and III. DNA duplex is represented in brown. Promoter sequences are indicated by dark boxes. The RNA polymerase III promoter structure refers to the 5S rRNA genes. Other genes transcribed by RNA polymerase III display different promoter structures. UCE, upstream control element; -25, TATA box of the RNA polymerase II promoter; Inr, initiator sequence of the RNA polymerase II promoter.

UCE

-200 -150 -100 -50 +1 +50 +100 +150 +200

-45 +20

-25 InrUpstream elements

Core promoter

RNA Pol I

RNA Pol II

RNA Pol III

Page 19: New Transcription: the first step in gene expression · 2019. 9. 10. · substantiated the role of histone acetylation in gene expression. These enzymes, termed acetyltransferase

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

125

V.3.2. Transcription initiation by the RNA polymerase I RNA polymerase I promoters consist of a core promoter spanning the transcription initiation site, between nucleotides -45 and +20, and an upstream control element (UCE) about 70 to 100bp further upstream. These promoters show the least variability, each one being used to transcribe the genes for ribosomal RNA. The transcript of a rRNA gene contains the sequences for both large and small rRNAs, which are released post-transcriptionally by processing. The RNA polymerase I initiation complex involves four protein complexes in addition to the polymerase itself. One of these proteins, UBF1, is a dimer of identical proteins that interacts with both the core promoter and the UCE. A second protein complex, called SL1 in humans and TIF-IB in mice, consists of 4 proteins, one of them, termed TBP (for TATA-binding protein) is a factor needed for transcription initiation by RNA polymerases II and III. SL1, together with UBF1, directs RNA polymerase I and the last two complexes, TIF-IA and TIF-IC, to the promoter. It was thought that the initiation complex was built up in a stepwise way, however, recent data suggest that RNA polymerase I binds the four protein complexes before promoter recognition, the entire assembly attaching to the promoter in a single step23. V.3.3. Transcription initiation by the RNA polymerase II V.3.3.a. RNA polymerase II promoter structure RNA polymerase II promoters are variable and can expend for several kilobases upstream of the transcription initiation site. The core promoter consists of two elements that are common to many promoter of protein-encoding genes. The first promoter element is the TATA box or -25 box (consensus sequence is 5'-TATAWAW-3' where W is A or T) usually located at about position -25bp; TATA box has an important role in indicating the initiation site. The second promoter element is the initiator sequence or Inr (consensus sequence -3YYCARR+3 where Y represents any pyrimidine and R any purine), which is located around +1 start point. Some promoters recognized by RNA polymerase have only one of these two elements of the core promoter, and some have neither. The latter can still be transcribed, probably through interaction between the RNA polymerase II and an internal sequence (located within the gene) called MED-1. In addition to the core promoter, genes that are transcribed by RNA polymerase II have various upstream promoter elements of different functions. V.3.3.b. The structure and function of the RNA polymerase II RNA polymerase II acts at different kinds of promoters to transcribe a great diversity of genes. Moreover, RNA polymerase should only transcribe those genes whose products are appropriate to the needs of the cell in its ever-changing metabolism and growth. Because the yeast Saccharomyces cerevisiae has become an excellent model for the eukaryotic molecular genetics, its RNA polymerase II has been extensively studied and characterized. The yeast

23 Seither, P. et al. (1998). Mammalian RNA polymerase I exists as a holoenzyme with associated basal transcription factors. J. Mol. Biol., 275, 43-53.

Page 20: New Transcription: the first step in gene expression · 2019. 9. 10. · substantiated the role of histone acetylation in gene expression. These enzymes, termed acetyltransferase

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

126

RNA polymerase II consists of at least ten different polypeptides, designated Rpb1p24 through Rpb10p. Rpb1p and Rpb2p (220kD and 150kD) functions are homologous to those of the prokaryotic RNA polymerase ' and subunits, respectively. Rpb1p has a DNA-binding site and Rpb2p binds nucleotide substrates, and both contribute to the catalytic site. Rpb3p (45kD) is the functional homolog of the prokaryotic subunit. Two Rpb3p subunits exist per enzyme where they are essential for assembly of polymerase. Rpb4p (32kD), which resembles the prokaryotic factor in amino acid sequence, in involved in promoter recognition. Some of the remaining subunits are common to all three eukaryotic RNA polymerases. The Rpb1p in yeast subunit has an unusual structure feature not found in prokaryotes: its C-terminal domain (CTD) contains 27 repeats of the amino acid sequence PTSPSYS25 whereas in other eukaryotic RNA polymerases II this heptapeptide is repeated as many as 52 times. This domain may project 50nm from the globular enzyme. The CTD is essential for the activity of RNA polymerase II. Indeed, only RNA polymerase II whose CTD is not phosphorylated can initiate transcription. However, transcription elongation proceeds only after protein phosphorylation within the CTD, suggesting that phosphorylation triggers the conversion of the initiation complex into an elongation complex. V.3.3.c. Formation of the preinitiation complex As shown above, the initiation in eukaryotes adds more complexity to the process of transcription when compared to that in prokaryotes. First, eukaryotic initiation involves more proteins, and second, eukaryotic polymerases do not directly recognize their core promoter. RNA polymerase II can only initiate transcription in combination with ancillary transcription factors, together they form the basal transcription apparatus. The initial contact with RNA polymerase II promoter is made by the General Transcription Factor (GTF) TFIID, which is the TATA-box binding factor. TFIID is made up of the TATA-Binding Protein (TBP) and at least 12 TBP-Associated Factors (TAFs). TFIID recognizes the TATA component of the RNA polymerase II core promoter and may recognize the Inr sequence but this is unlikely. After binding of the TFIID, the preinitiation complex is formed by binding of additional GTF (see table 2 for the description of the function of each GTF), which are in the order TFIIA, TFIIB, TFIIF/RNA polymerase II, TFIIE and TFIIH (Fig. 5). Some of the GTFs have DNA-binding properties and other interact purely by protein-protein contacts. There are two key steps in this sequence of events. First, binding of the RNA polymerase II in association with TFIIF to the promoter (we think that TFIID binds to the promoter and recruits the RNA polymerase to the complex). Second, the attachment of the last GTF, TFIIH, that is a multifunctional protein with roles in transcription, DNA repair and regulation of the cell cycle. In transcription, TFIIH has two functions: it acts as a helicase that

24 RPB stands for RNA polymerase II. Sometimes, the RNA polymerase I, II and III are called RNA polymerase A, B, and C, respectively. 25 Note that the side chain of this heptapeptide is highly hydrophilic due to the presence of 5 residues out of 7 in this repeat having an –OH group. This endows the CTD with many sites for phosphorylation.

Page 21: New Transcription: the first step in gene expression · 2019. 9. 10. · substantiated the role of histone acetylation in gene expression. These enzymes, termed acetyltransferase

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

127

unwinds DNA duplex, thus converting the closed promoter complex into an open promoter complex, and as a kinase that phosphorylates the C-terminal domain of RNA polymerase II. Once phosphorylated, the RNA polymerase II becomes capable of leaving the preinitiation complex and synthesizing RNA.

Figure 5. Transcription initiation. Assembly of the RNA polymerase II preinitiation complex at the TATA-box promoter element. Binding of TFIID, a multisubunit protein consisting of TBP and TAFs, to TATA-box is stimulated by TFIIA. TFIID bound to TATA-motif recruits TFIIB. In association with TFIIF, RNA polymerase II (in the nonphosphorylated form) joins the TFIID/A/B complex. Then, the TFIIE and TFIIH associate to establish a competent transcription preinitiation complex. Note that the drawing is only schematic with regards to the size and position of different components within the preinitiation complex, however protein-protein contacts that are known to occur are shown26. The CTD of the RNA polymerase II must be phosphorylated before the RNA polymerase can leave the promoter (the CTD is shown as a spike projection from the RNA polymerase II). When the preinitiation complex forms, melting of the DNA duplex around Inr generates the open promoter complex and transcription begins.

26 For further details about the initiation by the RNA polymerase II please refers to Roeder and other relative papers. Roeder, R.G. (1996). The role of the general initiation factors in transcription by RNA polymerase II. Trends Biochem. Sci., 21, 327-335.

TATA or -25 Inr

Upstream elements

DNA

Recognition of TATA by TFIID

TAFTBP

TAFsTBP

Formation of the preinitiationcomplex

A

B

F

E

H

TFIIA

TFIIB

TFIIF/RNApolymerase II

TFIIE

TFIIHTAFTBPA B

F

EH

CTD

Page 22: New Transcription: the first step in gene expression · 2019. 9. 10. · substantiated the role of histone acetylation in gene expression. These enzymes, termed acetyltransferase

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

128

Table 2. Functions of the general transcription factors that are interacting with the RNA polymerase II. General transcription factor Function TFIID (TBP component) Recognition of the TATA box, forms a platform for TFIIB binding TFIID (TAFs) Recognition of the core promoter and regulation of TBP binding TFIIA Stimulatory, stabilizes TFIID binding TFIIB TFIID recognition; Intermediate in recruitment of RNA polymerase II and influence the selection of +1 site TFIIF Recruitment of the RNA polymerase II TFIIE Intermediate in recruitment of TFIIH and modulates the various activities of TFIIH. TFIIH Helicase activity essential for the formation of an open promoter complex and kinase activity required for the promoter clearance by the RNA polymerase II

V.3.4. Transcription initiation by the RNA polymerase III RNA polymerase III promoters are variable in structure reflecting the diversity of genes that are transcribed by this enzyme. Initiation of transcription at the different types of RNA polymerase III promoters requires different sets of general transcription factors or GTFs. However, each type of initiation process involves the TFIIIB protein, which includes TBP. RNA polymerase III interacts with transcription factors TFIIIA, TFIIIB and TFIIIC. TFIIIA and/or TFIIIC bind to specific recognition sequences (the core promoter, figure 4) that in some instances are located internally i.e. within the coding regions of the genes. TFIIIB associates with TFIIIA or TFIIIC already bound to the DNA and in turn facilitates the association of RNA polymerase III to establish an initiation complex. In those RNA polymerase III promoters that contain a TATA-box, TFIIIB probably binds directly to the DNA via the TBP subunit. VI. Regulation of transcription in prokaryotes In bacteria, genes encoding the enzymes of a particular metabolic pathway are often grouped adjacent to one another in a cluster. These clusters in association with their regulatory sequences, which control their transcription, are called operons. Within operons, all the structural genes (so named because they encode a product) can be expressed in a coordinated fashion through the production of a single polycistronic mRNA specifying all the enzymes of the metabolic pathway. In addition, a regulatory sequence lying upstream of this transcription unit determines whether it is transcribed. This regulatory sequence, called operator, is located next to the promoter. A regulatory protein interacts with the operator to control the expression of the operon by governing the accessibility of the RNA polymerase to the promoter. Attention should be paid to the fact that many prokaryotic genes do not contain operators, thus they are regulated in ways that do not involve protein-operator interaction.

Page 23: New Transcription: the first step in gene expression · 2019. 9. 10. · substantiated the role of histone acetylation in gene expression. These enzymes, termed acetyltransferase

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

129

VI.1. Operons are regulated by induction and repression The bacterium E. coli contains all the genetic information it needs to metabolize, grow, and reproduce. It can synthesize every organic molecule it needs from glucose and a number of inorganic ions. Many of the genes in E. coli are expressed constitutively i.e. they are always turned "on". Others, however, are active only when their products are needed by the cell, so their expression must be regulated. For instance, if the amino acid tryptophan (Trp or T) is added to the culture, the bacteria soon stop producing the five enzymes that were previously needed to synthesize Trp. In this case, the presence of the products of enzyme action represses enzyme synthesis. Conversely, adding a new substrate to the culture medium may induce the formation of new enzymes capable of metabolizing that substrate. If we take a culture of E. coli that is grown on glucose and transfer some of the cells to a medium containing lactose a revealing sequence of events takes place. At first the cells are quiescent: they do not metabolize the lactose, their other metabolic activities decline, and cell division ceases. Soon, however, the culture begins growing rapidly again with the lactose being rapidly consumed. During the quiescent interval, the cells began to produce three enzymes that they had not been producing before. These are a permease that transports lactose across the plasma membrane from the culture medium into the interior of the cell, a beta-galactosidase which hydrolyzes lactose into glucose and galactose (once induced by the presence of lactose, the quantity of beta-galactosidase in the cells rises from virtually none to almost 2% of the weight of the cell), and a transacetylase whose function is still uncertain. VI.2. The lac operon VI.2.a. Structure and function of the lac operon The capacity to respond to the presence of lactose depends on the growth conditions. Only when lactose in added to the medium, the genes for the three induced enzymes are expressed. In fact, the most direct way to control the expression of a gene is to regulate its rate of transcription; that is, the rate at which RNA polymerase transcribes the gene into molecules of mRNA. Each of the three enzymes synthesized in response to lactose is encoded by a separate gene. The three structural genes are arranged in tandem on the bacterial chromosome: the lac operon (Fig. 6). In the absence of lactose, the repressor protein encoded by the lacI gene binds to the operator O and prevents transcription. The latter is not completely blocked because the repressor occasionally detaches allowing a few transcripts to be made. Because of this basal level of transcription the bacterium always possesses a few copies of each of the three enzymes coded by the operon, probably less than 5 of each. This means that when the bacterium encounters a source of lactose it is able to transport a few molecules into the cell and split these into glucose and galactose. An intermediate in this reaction is allolactose an isomer of lactose. Binding of allolactose (and its relatives) to the repressor causes it to change conformation and leave the operator. This enables RNA polymerase to bind to Plac promoter (that extends from -45 to +18) and transcribe the three structural genes of the operon. The single mRNA molecule that results is then translated into the three proteins. The lac repressor binds to a specific

Page 24: New Transcription: the first step in gene expression · 2019. 9. 10. · substantiated the role of histone acetylation in gene expression. These enzymes, termed acetyltransferase

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

130

sequence of 35bp (extends from -7 to +28) called the operator. Most of the operator sequence lies downstream of the promoter. When the repressor is bound to the operator, RNA polymerase can no longer proceed downstream to transcribe the pertinent gene(s).

Gene lacI lacZ lacY lacA mRNA nucleotide 1080 3069 1251 609 Polypeptide amino acid

kD 360 38.6

1023 116.4

417 46.5

203 22.7

Protein Structure kD

Tetramer 154.4

Tetramer 465

Monomer 46.5

Dimer 45.4

Function Repressor -Galactosidase Permease Transacetylase

Figure 6. The lac operon. The operator sequence lies immediately upstream of the three structural genes or downstream of the promoter of the lactose operon. In the original model for lactose operon regulation, the lactose repressor binds to the operator and prevents the RNA polymerase from gaining access to the promoter. Thus, the three structural genes are switched off (see text for further details). The operon consists of two transcription units. In unit (1), there is a regulator gene, lacI, with its own constitutive promoter, PlacI. This gene encodes a 360-amino acid, 38.6kD polypeptide that forms a tetrameric lac repressor protein. The second unit (2) comprises three structural genes, lacZ, lacY and lacA, under the control of Plac, and the operator O. The first two genes, lacZ and lacY are separated by 52bp and the second two, lacY and lacA, by 64bp. lacY encodes an integral membrane protein, a permease, which is responsible of transporting -galactoside into the cell. The lacA structural gene encodes a 22.7kD polypeptide that forms a dimer displaying thiogalactoside transacetylase activity in vitro, transferring an acetyl group from acetyl-CoA to the C-6 OH of thiogalatosides, but the role of this protein in vivo remains unknown. Moreover, mutations that inactivate lacA show no identifiable phenotype. It is likely that transacetylase acts to detoxify toxic analogs of lactose through acetylation.

The lac repressor represents only a tiny fraction of the proteins in the E. coli cell. The gene encoding the lac repressor is called the lacI gene. It happens to be located just upstream of the Plac promoter. However, its precise location is probably not important because it achieves its effect by means of its protein product, which is free to diffuse throughout the cell. And, in fact, the genes for some repressors are not located close to the operators they control. The lac repressor is made up of four identical polypeptides (thus a "homotetramer"). Part of the molecule has a site (or sites) that enable it to recognize and bind to the 35bp of the Plac operator. Another part of the repressor contains sites that bind to lactose. When lactose unites with the repressor, it causes a change in the shape of the molecule, so that it can no longer remain attached to the DNA sequence of the operator. Thus, when lactose is added to the

Transcription generates two units

PlacI

LacI

Plac O

DNALacZ LacY LacA

mRNA

Unit 1 Unit 2

Page 25: New Transcription: the first step in gene expression · 2019. 9. 10. · substantiated the role of histone acetylation in gene expression. These enzymes, termed acetyltransferase

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

131

culture medium, it causes the repressor to be released from the operator. Thus, the RNA polymerase can now begin transcribing the 3 structural genes of the operon into a single molecule of messenger RNA27. When fully induced approximately 5000 copies of each protein product are present in the cell. When lactose supply is used up and allolactose is no longer present, the repressor reattaches to the operator and the operon is switched off.

VI.2.b. Positive Control of Transcription: CAP The absence of the lac repressor is essential but not sufficient for effective transcription of the lac operon. The activity of RNA polymerase also depends on the presence of another DNA-binding protein called catabolite activator protein or CAP. This accessory protein acts as a positive regulator. The name CAP derives from the phenomenon of catabolite repression in E. coli. Catabolite repression is a global regulatory system that coordinates gene expression with the total physiological state of the cell. As long as glucose is available, the bacterium catabolizes it in preference to any other carbon and energy source e.g. lactose or galactose. Catabolite repression ensures that the operons concerned with the degradation of substrates other than glucose remain repressed until the supply of glucose is exhausted. Catabolite repression overrides the influence of any inducers that might be present. Like the lac repressor, CAP has two types of binding sites: One binds the nucleotide cyclic AMP (cAMP), and the other binds a sequence of 21bp (extending from -72 to -52) upstream of the promoter. However, CAP can bind to DNA only when cAMP is bound to CAP, so that catabolite repression is mediated by cAMP levels, which in turn are regulated by glucose. Transport of glucose into the cell is concomitant with the deactivation of the E. coli adenylyl cyclase, leading to lower cAMP. As a consequence CAP fails to bind DNA and thus RNA polymerase cannot begin its work, even in the absence of the repressor. Thus, the lac operon is under both negative (the repressor) and positive (CAP) control. Although the presence of lactose removes the repressor, the presence of glucose lowers the level of cAMP in the cell and thus removes CAP. Without CAP, binding of RNA polymerase is inhibited even though there is no repressor to interfere with. The molecular basis for the interplay between the two control circuits is shown in figure 7. CAP, also referred to as cAMP receptor protein or CRP, consists of two identical polypeptides (210 amino acids, 22.5kD) forming a homodimer. Toward the C-terminal, each has two regions of alpha helix with a sharp bend between them. The longer of these is called the recognition helix because it is responsible for recognizing and binding to a particular sequence of bases in DNA (Fig. 8). The N-terminal domain binds cAMP. Two molecules of cAMP are bound per CAP dimer generating the CAP-(cAMP)2 complex.

27 Hardly does transcription begin before ribosomes attach to the growing mRNA molecule and move down it to translate the message into the three proteins. This mechanism is characteristic of prokaryotes, but differs in three respects from that found in eukaryotes: (1) In general, related genes in eukaryotes are not linked in operons. (2) Messenger RNAs in eukaryotes contain the transcript of only a single gene (with a few exceptions). (3) Transcription and translation are not physically linked in eukaryotes as they are in prokaryotes; transcription occurs in the nucleus while translation occurs in the cytosol.

Page 26: New Transcription: the first step in gene expression · 2019. 9. 10. · substantiated the role of histone acetylation in gene expression. These enzymes, termed acetyltransferase

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

132

Figure 7. Mechanism of catabolite repression and CAP action. (A) Glucose only is available: cAMP levels are low, CAP cannot bind to its DNA site (CAP site), lac repressor is active and binds to the operator preventing the RNA polymerase gaining access to the promoter. Consequently the lac operon is switched OFF. (B) Lactose only is available: allolactose, an isomer of lactose, is produced and binds to lac repressor causing a conformational change in the protein. The latter dissociate from the DNA allowing the RNA polymerase to bind to Plac promoter. As glucose is absent, cAMP levels are high and CAP is active. The latter assists closed promoter complex formation by RNA polymerase holoenzyme and the lac operon is ON. (C) Neither glucose nor lactose is available: CAP is active and binds to its DNA site but lac repressor binds to the operator and prevents the attachment of RNA polymerase to the promoter. Thus, the lac operon is OFF. (C) Both lactose and glucose are present: cAMP level is low, and without CAP, binding of RNA polymerase is inhibited, the lac operon is switched OFF. Note that the shapes of the repressor, CAP, and the RNA polymerase structures shown in this figure are purely schematic.

VI.3. Positive versus negative control systems Negative and positive control systems are fundamentally different. Genes under negative control are transcribed as long as the repressor is absent or inactive. In this case transcription activation is regarded as anti-inhibition i.e. the reversal of negative control. In contrast, genes under positive control are expressed only if an active regulatory protein is present. Both control circuits regulate the transcription of the lac operon. The action of the lac repressor is negative, it inhibits transcription by binding to the operator sequence. Thus, Transcription of the lac operon can only be attained by releasing the negative control of the lac repressor28. This can be achieved by adding lactose to the growth medium. On the other hand,

28 On the same line, the synthesis of tryptophan from precursors available in the cell requires 5 enzymes. The genes encoding these are clustered together in a single operon with its own promoter and operator, the tryptophan operon (trp operon). In this case, however, the presence of tryptophan in the cell shuts down the operon. When Trp is present, it binds to a site on the Trp repressor and enables the Trp repressor to bind to the operator. When Trp is not present, the repressor leaves its operator, and transcription of the 5 structural genes begins. The

PlacI

LacI

Plac O

DNALacZ LacY LacA

LacI Plac O LacZCAPSite

(A)

(B)

(C)

(D)

R R

R R OFF

ON

OFF

OFF

C C RNA polymerase

C CR R

R R

Page 27: New Transcription: the first step in gene expression · 2019. 9. 10. · substantiated the role of histone acetylation in gene expression. These enzymes, termed acetyltransferase

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

133

regulation of the lac operon by the activator CAP is positive. In presence of high levels of cAMP (due to the absence of glucose), CAP becomes active and transcription of the lac operon by the RNA polymerase holoenzyme can thus be stimulated. Operons can also be classified as inducible or repressible, or both, depending on how they respond to the small molecules that mediate their expression. Inducible operons e.g. the lac operon, are transcribed only in the presence of small molecule co-inducers (lactose in this case). Repressible operons e.g. the trp operon, are expressed only in the absence of their co-repressors (tryptophan in our example).

Figure 8. Model of CAP. The two monomers are identical. Each monomer recognizes a sequence of nucleotides in DNA by means of the region of alpha helix labeled F. Note that the two recognition helices are spaced 34Å apart, which is the distance that it takes the DNA molecule (on the left) to make precisely one complete turn. The recognition helices of each polypeptide of CAP are, of course, identical. But their orientation in the dimer is such that the sequence of bases they recognize must run in the opposite direction for each recognition helix to bind properly. This arrangement of two identical sequences of base pairs running in opposite directions is called an inverted repeat or palindromic sequence.

VI.4. The ara operon VI.4.1. Structural and regulatory genes The arabinose operon, abbreviated ara, is concerned with the metabolism of the plant pentose L-arabinose. It enables E. coli to use this pentose as a sole source of energy and carbon via conversion to D-xylulose-5-P. When used without qualification the term ara operon refers to araBAD operon that encode the three enzymes responsible of the metabolic conversion of L-arabinose to D-xylulose-5-P: araA gene encodes L-arabinose isomerase, araB encodes L- usefulness to the cell of this control mechanism is clear. The presence in the cell of an essential metabolite, in this case tryptophan, turns off its own manufacture and thus stops unneeded protein synthesis. As its name suggests, repressors are negative control mechanisms, shutting down operons in the absence of a substrate (lactose in the lac operon) or the presence of an essential metabolite (tryptophan in the trp operon).

Page 28: New Transcription: the first step in gene expression · 2019. 9. 10. · substantiated the role of histone acetylation in gene expression. These enzymes, termed acetyltransferase

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

134

ribulokinase and araD encodes ribulose-5-P epimerase (Fig. 9). In addition to araBAD, E. coli genome contains two other loci involved in the uptake of L-arabinose: araFG and araE. The araFG operon specifies a high-affinity transport system (with araF encoding the binding protein and araG the membrane component), whereas araE gene encodes a protein involved in an independent low-affinity transport system. All three operons are controlled by the same regulatory protein AraC, the gene of which is located next to the araBAD operon (Fig. 9). As a consequence araBAD, araFG and araE form a regulon. Hereafter we will concentrate on the araBAD operon. The mode of action of AraC is quite unusual in that it acts as positive and negative regulatory protein. In the absence of L-arabinose, AraC behaves as a repressor, binds to specific sites in the promoter region and inhibits the transcription of the araBAD operon (negative promoter control). However, when arabinose is available it causes a conformational change in the AraC protein converting it to an activator promoting the expression of the operon (positive promoter control).

Figure 9. The araBAD operon structure. The araC gene is transcribed from promoter Pc away from the araBAD genes. AraC has four binding sites: araO2, araO1, araI1 and araI2. The araO1 site contributes minimally to the regulation of araBAD operon being involved in the autoregulation of araC gene. The araB, araA and araD genes encodes the three enzymes involved in the conversion of L-ababinose into D-xylulose-5-P (see text).

araB araA araD araC

araO 2 Pc/araO 1 araI 1

araI 2

CAP site

Pol site

Left unit Right unit

AraC L-ribulokinase L-arabinose isomerase

Ribulose- 5 -P epimerase

L-arabinose L- ribulose L-ribulose- 5 -P D-xylulose- 5 -P

ATP ADP + Pi

DNA

RNA

Page 29: New Transcription: the first step in gene expression · 2019. 9. 10. · substantiated the role of histone acetylation in gene expression. These enzymes, termed acetyltransferase

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

135

Figure 10. Positive and negative circuits control the expression of araBAD operon. (A) In the absence of arabinose, AraC dimer binds to sites araO2 and araI1 leading to the formation of a DNA loop between these sites, thus preventing RNA polymerase binding to its promoter sites. As a result, the araBAD genes are not expressed independently of the levels of cAMP (e.g. as a direct consequence of the presence or absence of glucose). (B) When arabinose becomes available it is transferred into the cell through two transport systems (see text). Arabinose binds AraC inducing a conformational change that causes AraC to leave araO2 and bind the araI2 half site. Binding of arabinose to AraC and the consequent association of AraC-(arabinose)2 to araI sites is not sufficient to turn on araBAD expression. (C) In the absence of glucose and presence of arabinose, the levels of cAMP are high so that CAP (a positive regulator) becomes active. Binding of CAP-(cAMP)2 to CAP site is crucial. CAP-(cAMP)2 interacts with AraC-(arabinose)2 and together assists the binding of RNA polymerase to the promoter leading to an active transcription initiation complex, which transcribes the araBAD genes into one polycistronic mRNA.

VI.4.2. Regulation in details The araBAD operon has three binding sites for AraC: araO2, araO1 and araI, which is split into two half-sites araI1 and araI2. As shown in figure 9, the gene araC is expressed from its promoter Pc when the levels of AraC are low (araC gene is subjected to an autoregulation via araO1 site). When arabinose is absent, AraC binds to two sites, araO2 and araI1, thus causing the formation of a DNA loop between them and the araBAD operon is not expressed. When arabinose is present and the levels of cAMP are high (absence of glucose) the AraC protein bound to araO2 dissociates and binds to the unoccupied araI2 half-site. This change is

araB araA araDaraC

araO2 Pc/araO 1araI1

araI2

CAPsite

Polsite

DNA

AraC

AraC

(A)

No expression

X

AraCAraC(B)

No expression

X

Pc/araO 1

CAPsite

AraCAraC(C)

Transcription on

Pc/araO 1

CAPsite CAP CAP

RNA Pol

Page 30: New Transcription: the first step in gene expression · 2019. 9. 10. · substantiated the role of histone acetylation in gene expression. These enzymes, termed acetyltransferase

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

136

favored by arabinose, which binds to AraC (one molecule per AraC monomer) and behaves as an allosteric effector altering the conformation of AraC. The conformational change is necessary but not enough to trigger the expression of araBAD; the AraC dimer should first interacts with CAP-(cAMP)2 bound to CAP site in order to assist the binding of RNA polymerase to its promoter site and activate the transcription of araBAD operon. Again when arabinose is metabolized, AraC, in the arabinose-free conformation, is released from araI2 half-site and binds to araO2, thus re-establishing the loop and shutting down the expression of araBAD. This indicates that araBAD is under negative and positive controls by AraC protein: AraC act as a repressor in the absence of arabinose and as an activator in its presence (Fig. 10). VI.5. The trp operon Tryptophan operon (trp) maps at 27.5 min on the E. coli chromosome. It contains five adjacent genes involved in the biosynthesis of tryptophan (Fig. 11). The operon is subject to both negative promoter control and attenuator control and has the structure:

promoter/operator-leader/attenuator-trpE-trpD-trpC-trpB-trpA-t-t' where trpE to trpA genes are structural genes encoding enzymes for the conversion of chorismate to tryptophan, t is a rho-independent terminator and t' is a rho-dependent terminator. An unlinked gene, trpR, codes for an apo-repressor. When activated by tryptophan, the co-repressor, the active repressor TrpR binds to the operator trpO and inhibits (by reducing not abolishing) the transcription of the trp operon. The operator contains a palindromic sequence that is internal to the promoter and occupies positions −23 to −3. TrpR represses the expression of its own gene trpR (autoregulation). Further reduction in trp genes transcription is achieved by the mechanism of attenuation. The latter regulates the transcription termination wihtin the leader region in order to determine whether trp genes are to be expressed or not. In the presence of tryptophan the attenuator in the leader region of the nascent mRNA adopts the stem-and-loop structure characteristic of a rho-independent terminator so that transcription stops before the first structural gene trpE.

Table 3. The amino acid sequences of the leader peptides in seven different operons regulated by attenuation in E. coli. The amino acid one-letter abbreviation code was used. Note that for each leader peptide the amino acid(s) synthesized by the pathway is highlighted. The ilv operon encodes the enzymes for the biosynthesis of three amino acids: leucine (L), isoleucine (I), and valine (V).

Operon Amino acid sequence of the leader his i lv leu pheA thr trp

NH2-MTRVQFKHHHHHHHPD-COOH NH2-MTALLRVISLVVISVVVIIIPPCGAALGRGKA-COOH NH2-MSHIVRFTGLLLLNAFIVRGRPVGGIQH-COOH NH2-MKHIPFFFAFFFTFP-COOH NH2-MKRISTTITTTITITTQNGAG-COOH NH2-MKAIFVLKGWWRTS-COOH

On the other hand, when tryptophan is absent, the ribosome that is translating the leader region (remember that transcription and translation are coupled in E. coli) stalls over two

Page 31: New Transcription: the first step in gene expression · 2019. 9. 10. · substantiated the role of histone acetylation in gene expression. These enzymes, termed acetyltransferase

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

137

contiguous tryptophan codons (table 3); this allows the generation of a secondary structure called attenuator, thus preventing the formation of the stem-and-loop of the terminator. As a result, transcription proceeds downstream into the trp genes.

Figure 11. Organization of the trp operon of E. coli. (A) The five structural genes E, D, C, B, and A are expressed as one polycistronic RNA, then it is translated into five different polypeptides. The first two, anthranilate synthase component I and anthranilate synthase component II (encoded by genes E and D), form a tetramer containing two subunits of each (I2II2). The anthranilate synthase tetramer catalyzes the first and second step of the tryptophan biosynthetic pathway (B). The third polypeptide encoded by the gene C is multifunctional protein endowed with two enzymatic activities: N-(5’-phosphoribosyl)- anthranilate isomerase and Indole-3-glycerol phosphate synthase. This protein catalyzes the third and fourth steps of the tryptophan biosynthetic pathway. The last two polypeptides, encoded by genes B and A, are the tryptophan synthase subunit and tryptophan synthase subunit. Together they form a tetramer comprising two subunits of each (22). The tryptophan synthase tetramer catalyzes the fifth and last step of the pathway, thus producing the L-tryptophan.

Transcription

PTrp ,O trp TrpL DNATrpE

mRNA

TrpD TrpC TrpB TrpADNA

Attenuator

Leader

Anthranilatesynthase I

Anthranilatesynthase II

N-(5’-phosphoribosyl)- anthranilate isomeraseIndole-3-glycerol phosphate synthase

Tryptophan synthase subunit

Tryptophan synthase subunit

Anthranilatesynthase I2II2

Tryptophansynthase

TrpE TrpD TrpC TrpB TrpA

E1 E2 E3

Chorismate Anthranilate N-(5’-phosphoribosyl)-anthranilate

Enol-1-o-carboxy phenylamino-1-deoxyribulose phosphateIndole-3-glycerol-pL-Tryptophan

Glutamine Glutamate+pyruvate

PRPP

P P

E1 E1

E2

E2E3

CO2L-SerineGlyceraldehyde-3-P

(A)

(B)

tt'

Page 32: New Transcription: the first step in gene expression · 2019. 9. 10. · substantiated the role of histone acetylation in gene expression. These enzymes, termed acetyltransferase

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

138

VI.5.1. Negative regulation by Trp repressor The trp operon is repressible i.e. it is expressed only in the absence of repressor when tryptophan is limiting. Trp repressor is encoded in the trpR operon, which is not linked to trp operon. When tryptophan is present, Trp repressor a dimer of 108-amino acid polypeptide chains, binds two tryptophans and then binds to the operator. The latter is located within the trp promoter. Binding of Trp repressor to the operator precludes RNA polymerase and prevents transcription of trp operon. When tryptophan is exhausted Trp repressor can no longer binds to the operator, it is released thus allowing the RNA polymerase to binds its promoter and transcribe the five genes of trp operon as one polycistronic mRNA.

Figure 12. Regulation of the trp operon via attenuation. (a) The attenuator in the leader region of the mRNA can adopts two alternative secondary structures. The availability of tryptophan causes the ribosome to translate the leader region so that the terminator structure 3:4 can form leading to premature termination of transcription (b). Conversely, when tryptophan is scarce ribosome stalls over repeat 1 allowing the repeats 2 and 3 to base pair forming the antiterminator hairpin structure 2:3 (c). As the repeat 3 is taken, formation of the terminator 3:4 is prevented and transcription by RNA polymerase continue past the leader region into the trp structural genes. Hence, attenuation is determined by the availability of tryptophan in the cell.

VI.5.2. Regulation of trp operon by attenuation Trp repressor reduces, but it does not abolish, the transcription of trp operon. Further inhibition of trp operon expression is achieved via a process termed attenuation, which regulates transcription after it has begun. The trp leader located between the trp promoter and the structural genes comprises a sequence called attenuator. Analysis of the attenuator region has showed the presence of 4 inverted repeats termed 1,2, 3, and 4 (Fig. 12). In the presence of

1 2 3 4

1 2 3 4

1 4

2 3

In the presence of tryptophanFormation of terminator 3:4No expression of trp operon

In the absence of tryptophanFormation of anti-terminator 2:3Expression of trp operon is on

Leader region

(b)

(a)

(c)

Page 33: New Transcription: the first step in gene expression · 2019. 9. 10. · substantiated the role of histone acetylation in gene expression. These enzymes, termed acetyltransferase

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

139

tryptophan the attenuator in the leader region of the mRNA adopts the stem and loop structure characteristic of a rho-independent terminator (hairpin 3:4), and transcription ceases before RNA polymerase reaches the trpE gene. On the other hand, when tryptophan is depleted, a ribosome, bound to the nascent mRNA and translating the leader region, stalls over two contiguous trp codons. This allows the repeats 2 and 3 to form a hairpin preventing the formation of the 3:4 terminator. As a consequence the RNA polymerase proceeds downstream to transcribe the trp structural genes. VII. Regulation of transcription in eukaryotes In bacteria, it is possible to make a clear distinction between constitutive and regulatory forms of transcription initiation. Constitutive expression depends on promoter structure and determines the basal rate of transcription initiation. The regulatory form depends on the activity of regulatory proteins and affects the rate of transcription initiation if the basal rate is inadequate to cope with the prevailing conditions. In eukaryotes, the situation is substantially more complicated. In fact, in addition to the regulation of metabolic activity and cell division much more complex patterns, concerning embryonic development and cell differentiation, are coordinated through transcriptional regulation. The human cell, a eukaryotic cell, has 1400 times as much DNA as an E. coli cell. According to the latest estimates, a human cell contains approximately 30,000 genes. Some of these are expressed in all cells all the time; these so-called housekeeping genes29 are responsible for the routine metabolic functions (e.g. respiration) common to all cells. Some genes are expressed as a cell enters a particular pathway of differentiation. Some are expressed all the time in only those cells that have differentiated in a particular way. For instance, a plasma cell expresses continuously the gene for the antibody it synthesizes. Some genes are expressed only as conditions around and in the cell change. For example, the arrival of a hormone may turn on (or off) certain genes in that cell. There are several methods used by eukaryotes to regulate gene expression: (1) Altering the rate of transcription of the gene; (2) Altering the rate at which RNA transcripts are processed while still within the nucleus (see chapter 6); (3) Altering the stability of mRNA molecules; that is, the rate at which they are degraded. In fact, mRNA stability plays a greater role in eukaryotic gene expression; unlike prokaryotic mRNAs, eukaryotic mRNAs show a wide range in relative half-lives. The longer-lived an mRNA is, the greater the potential of its genetic information to be persistently expressed. (4) Altering the efficiency at which the ribosomes translate the mRNA into a polypeptide. VII.1. Regulation of transcription initiation involves cis and trans components Eukaryotic genes have promoters and other regulatory elements analogous to those found in prokaryotic genes, but the structural genes of eukaryotes are rarely organized in clusters similar to operons. Each eukaryotic gene typically possesses a discrete set of regulatory sequences appropriate to the requirements for its expression. Certain of these

29 Housekeeping genes encode proteins commonly present in all cells and essential to normal function. These genes are typically transcribed at more or less steady levels.

Page 34: New Transcription: the first step in gene expression · 2019. 9. 10. · substantiated the role of histone acetylation in gene expression. These enzymes, termed acetyltransferase

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

140

sequences provide sites of interaction for general transcription factors, whereas others endow the gene with a great specificity in expression by providing targets for specific transcription factors. VII.1.a. General features of the protein-encoding genes Protein-encoding genes have exons whose sequence encodes the polypeptide, introns that will be removed from the mRNA before it is translated (see chapter 6), a transcription start site or start point (+1), and a promoter. The latter comprises the basal or core promoter located within about 40 bp of the start site and an upstream promoter or upstream control elements, which may extend over as many as 200 bp farther upstream. Further, additional regulatory sequences, called enhancers and silencers, assist transcription initiation. Moreover, adjacent genes (RNA-encoding as well as protein-encoding genes) are often separated by an insulator which helps them avoid cross talk between each other's promoters and enhancers (and/or silencers). VII.1.b. Core promoter versus upstream promoter or upstream control elements The basal promoter contains the TATA box and the initiator sequence Inr. It is bound by TFIID, which is a complex of at least 10 different proteins including TATA-binding protein (TBP), which recognizes and binds to the TATA box, and other protein factors, which bind to TBP (see above). The basal or core promoter is found in all protein-encoding genes. This is in sharp contrast to the upstream promoter whose structure and associated binding factors differ from gene to gene. The upstream promoter includes consensus sequences that help define various RNA polymerase II modules and the transcription factors that bind to them. For instance, and in addition to the core promoter, one or more copies of the sequence GGGCGG (called GC-box and recognized by the transcription factor SP1) have been found upstream from the transcription initiation sites of so-called housekeeping genes. Note that, many different genes and many different types of cells share the same transcription factors - not only those that bind at the basal promoter but even some of those that bind upstream. What turns on a particular gene in a particular cell is probably the unique combination of promoter sites and the transcription factors that are chosen. VII.1.c. Enhancers, silencers and insulators Some transcription factors (Enhancer-binding protein) bind to regions of DNA that are thousands of base pairs away from the gene they control. Binding increases the rate of transcription of the gene. Enhancers are also called Upstream Activation Sequences or UASs. Enhancers can be located upstream, downstream, or even within the gene they control. Furthermore, enhancers are bidirectional in that they function in either orientation. How does the binding of a protein to an enhancer regulate the transcription of a gene thousands of base pairs away? The answer is likely that enhancer-binding proteins - in addition to their DNA-binding site, have sites that bind to transcription factors (TF) assembled at the promoter of the

Page 35: New Transcription: the first step in gene expression · 2019. 9. 10. · substantiated the role of histone acetylation in gene expression. These enzymes, termed acetyltransferase

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

141

gene. This occurs via a looping mechanism30. In contrast, silencers are control regions of DNA that, like enhancers, may be located thousands of base pairs away from the gene they control. However, when transcription factors bind to them, expression of the gene they control is repressed. One problem is raised by the fact that enhancers can turn on promoters of genes located thousands of base pairs away. What is to prevent an enhancer from inappropriately binding to and activating the promoter of some other gene in the same region of the chromosome? One possibility is an insulator. Insulators are stretches of DNA (about 42bp) that are located between the enhancer(s) and promoter (or silencer(s) and promoter) of adjacent genes. Their function is to prevent a gene from being influenced by the activation (or repression) of its neighbors. VII.2. The activity of transcription activators is controlled The activities of individual transcription factors must be tightly controlled to ensure that the appropriate set of genes is expressed by a given cell type. The activity of a transcription factor can be regulated by controlling either its synthesis or its ability to activate or repress transcription. If the only level of control is over the synthesis of the transcription factor, then rapid changes in expression patterns would not be possible, because it takes some time to accumulate a transcription factor or to destroy it. However, this type of control is likely to be associated with transcription factors responsible for maintaining stable patterns of gene expression e.g. differentiation and development processes. Conversely, if the activity of the transcription factor is directly controllable e.g. availability of external signaling compounds, then rapid changes in the expression patterns of targeted genes occur. This type of control depends on the ability of external signal to gain access to the genome and influence gene expression. Two different ways are possible: in the first, direct activation occurs when the extracellular signal enters the cell. In the second, indirect activation occurs when the extracellular signal is unable to cross the membrane and instead binds to a cell surface receptor, which in turn transmits or transduces the signal to the cell.

*****

30 Since transcription initiation should respond to a variety of regulatory signals, different protein species are required for correct gene expression. These regulatory proteins act in sensing the prevailing conditions and communicate this information to the genome by binding at specific nucleotide sequence. As DNA is one dimension polymer, there is not enough room for a lot of proteins to bind at the core promoter (near the transcription initiation site) or immediately upstream. DNA looping provide a new dimension in regulation since it permits additional proteins to convene at the initiation site and to exert their influence on creating and activating an RNA polymerase II initiation complex. Further, DNA looping in greatly influenced by negative supercoiling.